Multiple vCenter Servers, SSO, and How To Design for Failure

Internally within my company, there is usually a lot of discussion about how a customer goes about managing multiple vCenter servers. With the vSphere 5.1 addition of Single Sign-On (SSO), it dramatically complicates the design itself. This topic won't mean much for SMBs because you should be pretty well off with a single vCenter and SSO instance. This is primarily going to be focused on large enterprise designs.

WARNING::: these are MY design considerations and recommendations, use at your own risk. All depicted diagrams do not represent the entirety of a solution. many components are probably missing, so use your imagination.

In the olden days, multiple vCenter environments and geographically dispersed sites could easily be seen in a single vSphere Client view by configuring them in Linked Mode. Linked Mode is great for having fewer panes of management. There are a lot of cons with this route as well:

A degradation hit in your scalability. (as of vCenter 5.1)
- Maximum of 3000 hosts
- Maximum of 30,000 powered on VMs
- Maximum of 50,000 registered VMs
Upgrading one site may bring another offline because of build or version incompatibilities
Time synchronization is critical (but this should usually be critical in any production application)

Now why isn't vCenter Linked Mode a good route for continuing down this path? The addition of 5.1's SSO brings in a critical requirement: Make sure that all vCenter Servers in a Linked Mode group are registered to the same vCenter Single Sign On server. Read more at Linked Mode Prerequisites for vCenter Server

Ouch, that stings a bit. This is going to be a problem because we now have to bring SSO design into the mix. In a single datacenter instance, we can probably get away with having a single SSO server. That's not much of an issue.

If we try to use this for a geo-dispersed design and install SSO at a single site, we are putting every single site at risk. There is also going to be a latency issue with one site trying to contact the SSO server at the primary site. I know what you are probably saying to yourself, "Well c'mon Kenny, that's why they created Mult-site single sign-on!". Well, if you take the time to read Installing vCenter Single Sign On in a multisite deployment (2034074) you will see that "Multisite Single Sign On deployment is designed only for faster local access to authentication-related services. It does not provide failover between Single Sign On servers on different sites. When the Single Sign On instance on one site fails, its role is not taken over by a peer Single Sign On instance on another site. All authentication requests on the failed site will fail, even if peer sites are fully functional." So really, there is no HA availability in this scenario. Lets also be aware there are still multiple issues around Trusted ADs with Mulit-Site SSO and VMware is continuing to evolve this piece of software in a later release.

So, if you desperately think you require Linked Mode functionality, take into consideration the issues facing vSphere SSO. If it's all happening at a single datacenter, site, or on a campus environment, you may be fine. For those looking at dispersed sites, a link failure could disable all access to a vCenter server since it cannot communicate with SSO.

Or if you have a multi-site single sign on solution, an issue with one SSO server isn't going to mitigate the access loss to the vCenter server at that failed site. The other piece to remember about multi-site SSO is database availability and synchronization across geographies. The problem at this point in time is that database clustering is not supported, so you are still left with a single point of failure.

How else can we make things more simplified from a managerial footprint? You can register multiple vCenter servers against a single SSO server. When using the webclient, any vCenter server that is registered to that central SSO server will be viewable and navigable. See a screenshot by William Lam ibelow. Hmm.. that brings up an interesting design consideration. If Linked Mode was only being considered to have fewer panes of management, why do I even need to configure it any longer? Good question. IMO, Linked Mode is dead. You can accomplish the same premise of having fewer vCenter panes of management by registering every vCenter with SSO. The only stipulation is that this is only compatible with the vSphere Web Client. I know many of us today still use the C# client based on it's speedy responsiveness and more granular messaging, but this is a reason to atleast have both open at the same time.

That's pretty snazzy, but now we need to revert to putting our design hat on. Do we want to sacrifice design for functionality? This might be great on a campus network, but for geo-graphically dispersed sites, this may not be the answer we want. In addition, you have to think about how one might effect the other when it comes to upgrades. You may have multiple vCenter servers and each one running a different build or version based on the requirements of workloads running. If everything is reliant on a centralized server, you are going to be in a bit of trouble to make sure someone is doing the testing the make sure you have the correct upgrade procedures and compatibilities between different versions.

The flip side of this is how do we go about making our infrastructure rock solid? The downside of this that we must give up on ease of management. It's pretty simple. Break it down and create silos. For each site, have a separate vCenter and SSO instances. This insures that a failure of one site or one component will not interfere with another sites availability. In all reality, it's not a huge deal because web browsers have the capability of having multiple tabs open at the same time. So in essence, you are really only losing a single "tab" of management and a consolidated search database.

To make it even more rock solid, how do you protect vSphere services from one another? We have learned so far that SSO is a requirement and it's very necessary to utilize it. Another piece I forgot to mention is the SSO "HA" mode. Where you basically configure two SSO servers behind a load balancer utilizing a single back end database. So running it in HA mode can be done at your own discretion if it fits the design. SSO requires a database, and we can create an additional database server for SSO alone to satisfy rock solid requirements. This means that if you need to take a vCenter server offline, or migrate databases or anything else that would impact the functionality, you know that nothing is going to be interrupted.

Lets discuss a much more narrow design. We have a single campus with a total of 250 hosts, and 8000 virtual machines. There are 4 pods located in the main datacenter, and there is another pod located across campus which functions for DR. How do we design this? If we wanted to make this stupid simple, we could easily place all 5 pods in a single vCenter instance. This gives us a simple pane of glass of the entire campus. 1 vCenter server can serve all of those hosts and virtual machines and a single SSO instance gives us that simple view. Yet, if we lost a connection to the Remote Campus, then we can't centrally manage that pod. One thing I always stress is that you want to relieve the burden of management if possible. Consolidating infrastructure into as few vCenters as possible will drastically simplify your time spent playing "which tab belongs to what".

But what happens when we really need that DR pod because of a fire in the datacenter? It's going to be a pain to get vCenter, SSO, and our databases up and functional on the remote pod. So to satisfy this design, we will have a vCenter, SSO and (multiple) Database instance on each site and utilize VMware Site Recovery Manager to initiate the failover and restarts of critical virtual machines. This creates two "tabs" of management, but we know that our infrastructure is secure. The database design is not depicted here but you could essentially end up having 3 database servers because vCenter, SSO, and SRM all need a database to store data. The database design is dependent on the customer requirements.

If we wanted to take this design further, we could add VPLEX in the mix. VPLEX could abstract a pod in the main datacenter, as well the the DR pod. These two pods could create a "management cluster" that hosts critical infrastructure virtual machines and the entire loss of one site would enable an easy vSphere HA restart of the VMs on another site. This takes us back down to a single pane of management as well.

If you want to do some more reading, there is some great stuff out there on SSO and more design considerations:

Follow Me Icons

Search

BSA 728x90 Center Banner

Multiple vCenter Servers, SSO, and How To Design for Failure

Related Items

Related Tags