When a customer is deploying a Vblock Infrastructure Platform with VMware View or VMware vCloud Director, there is a consistent message that always arises, where do they run the virtual management infrastructure stack? View and vCloud come up on a regular basis, but this article focuses on any application that utilizes the Vblock as resources for a particular stack of virtual machines that comprise a Virtual Management Infrastructure Stack. This article is heavy on VMware design and I will be discussing what options are available to you to run a Virtual Management Infrastructure Stack and will make you think about your overall comfort with a solution.
To set the scene, take a look at these diagrams. I have created both a View and vCloud diagram that depict what we are trying to accomplish. In every management stack, there are a multitude of different VMs that make an application function.
In a VMware vCloud Director environment for Vblock, you will usually have these VMs (not all are required):
- 1 vCenter Server for Management Components
- 1 vCenter Server for vCloud Resources (optional but best practice)
- 2 Database Servers SQL/Oracle (1 required, 2 optional)
- 1 vShield Manager VM (can run in Fault Tolerant mode)
- 2-X vCloud Director Nodes. The amount of nodes depends on size of vCloud environment and level of redundancy
- 2-X Virtual Supervisor Modules for Nexus 1000v. The amount of VSMs depends on vCloud Size
- 1 VMware vCenter Chargeback Server
- 1 vCenter Orchestrator Server
- 1 RabbitMQ Server
- 1 Load Balancer
- and more…
In a VMware View environment for Vblock, you will usually have these VMs (not all are required):
- 1 vCenter Server for Management Components
- 1 vCenter Server for View Desktop Resources with View Composer Installed (can be concentrated into 1 vCenter, but 2 is best practice)
- 2 Database Servers - SQL (1 required, 2 optional)
- 2-X VMware View Connection Broker Servers. The amount of servers depends on size of View environment and level of redundancy
- 2-X VMware View Security Broker Servers. The amount of servers depends on size of View environment and level of redundancy
- 2-X Virtual Supervisor Modules for Nexus 1000v. The amount of VSMs depends on size of View environment
- 1 ThinApp Guest for ThinApp'ing Applications
- 1 vCenter Orchestrator Server
- 1 Load Balancer
- and more...
As you can see, these applications can have a lot of moving pieces depending on how simple or complex the solution.
One of the more common requests that I see, is customers wish to run these components in VCE's Advanced Management Pod (AMP). The AMP is an engineered solution from VCE that is responsible for hosting the management applications in a Vblock. There are two different versions of the AMP. The HA-AMP, which consists of 2 Cisco C200 servers and a VNXe 3100 Array where VMs are hosted on the VNXe. The Mini-AMP is a single Cisco C200, no array, and VMs are hosted on local storage to the server. Each Cisco C200 server is equipped with 48GB of RAM, and the VNXe 3100 has 6 2TB NL-SAS drives in a Raid 5 configuration.
For the context of this conversation, we will be using the HA AMP for all discussions because it's a VCE best practice to use the HA-AMP to overcome any hardware failures. The Mini-AMP is an option for different scenarios.
Here is the profile for all virtual machines that reside on the AMP:
- vCenter Server - 2vCPU - 4GB RAM
- vCenter Update Manager Server - 1vCPU - 4GB RAM
- SQL Server - 1vCPU - 8GB RAM
- Array Management & Licensing Server - 1vCPU - 4GB RAM
- ESRS - 1vCPU - 3GB RAM
- UIM - 2vCPU - 16GB RAM
- Nexus 1000v VSM 1 - 1vCPU - 1.5GB RAM
- Nexus 1000v VSM 2 - 1vCPU - 1.5GB RAM
If we count correctly, that is a total of 42GB of RAM allocated to these VMs. VCE engineered the AMP to host only the virtual machines that were essential to the management applications of the Vblock and should be considered out-of-band management. Do NOT make the mistake that you can run a bunch of virtual machines in the AMP because you have additional capacity. When you install vCloud Infrastructure VMs or View Infrastructure VMs on the AMP, you are going beyond what is currently supported by VCE because the AMP was never intended to accommodate these extra workloads. The AMP is however capable of handling an additional small workload such as a vCenter Operations Server or a vCenter Chargeback Server.
The current amount of VMs required to run on the AMP amount to 8 and there is a total of 42GB of RAM assigned to all of these VMs. These VMs are spread out among 2 hosts, but can be consolidated into 1 host in a mini-AMP option. Windows 2008 R2 VMs use large pages in memory for better performance and therefore do not utilize VMware's Transparent Page Sharing (TPS) technology to reduce the amount of RAM consumed. The only time a 2008R2 VM will be able to utilize TPS is when there has been an overcommitment of memory on the host. These 2MB large pages will begin to break down in 4KB chunks to relieve RAM contention. When this happens, the 2008R2 VMs experience a degradation in performance (source). The VMs are spread across both hosts because if an HA event occurs, the VMs will have the resources to failover and power on. The additional VMs needed for a vCloud environment such as 1x vCenter, 2x vCloud Cells, 1x vShield Manager using FT, Chargeback, vCO, etc, will far outweigh the resources the AMP hosts contain. If we take the example above, this would account for an additional 22GB of RAM total for the vCloud Components. The same amount of additional VMs goes for VMware View as well. In the event of an HA failure, there would be VMs that couldn't power on. The only way to make sure that the VMs needed are powered on is by setting a multitude of HA rules. Even at that, it's very much a possibility that the VMs you don't want on, will continue to run in an HA event as well. For instance, since UIM isn't critical during normal operations, that will be one of the VMs you don't want powered back on in a HA event. If this VM lives on Host A, and Host B fails, UIM will not power itself off to accommodate the more important VMs that failover to Host A. The AMP was never meant to host more VMs than what is currently associated to it because of the resources needed to run the VMs in an optimal performing state in the case of a failure. Therefore, in an HA event, it's possible that your vCloud or View infrastructure VMs will not be able to power on or recover.
The C200's are equipped with 2 on-board 1GB NICs that are responsible for all VMware networking. The 2 1GB NICs are sufficient for the small amount of VMs that are currently assigned to the AMP. These VMs are management VMs and do not require any intense I/O workload. That being said, all ESXi Management, vMotion, Fault Tolerance, IP Storage, and Virtual Machine Networking all have to traverse 2 1GB NICs. IP storage alone can saturate a 1GB link, and when you add in vMotion, that can saturate the second link. The standard AMP build uses VMware Standard vSwitches and cannot rely on Network IO Control (NIOC) for vNetwork Distributed Switch (vDS) operations. For vCloud, vShield Manager can be used in Fault Tolerance mode and the FT logging that goes between the VMs can be network intensive. In a vCloud environment, you will most definitely saturate the links because the AMP is now the broker for all remote console connections which can be very traffic intensive. In a VMware View environment, the Connection Brokers are your gateway to the hosted VMs, and having a couple hundred connections creates heavy network I/O. VMware's best practice is to have a minimum of 6 1GB NICs to accommodate redundancy of physical network adapters, redundancy of traffic flows, and ensure proper network I/O bandwidth for a Tier 1 cluster. If you add in an application stack such as vCloud or View, you are going to saturate the bandwidth needed and will be unable to accommodate SLAs and network bandwidth. It's not a simple VCE exception to add additional NICs because depending on the type of Vblock, there may or may not be ample connections for all 1GB NICs in the Catalyst 3560s.
The AMP was designed to be a management domain, or out-band management, and therefore if the AMP is lost the Vblock is still up and operational. If you happen to lose vCenter and the VSMs, the Vblock will still be operational. In a vCloud or View environment, the infrastructure VMs, such as vCenter, become Tier 1 applications. The AMP was never intended to host a Tier 1 application. If the AMP is hosting Tier 1 applications and a network isolation occurs or a complete hardware failure happens, the entire functionality of the Vblock(s) dedicated to that application as resources are unusable.
To add on the networking piece, the AMP comes equipped with a VNXe 3100. The VNXe 3100 is an iSCSI and NFS only capable IP storage device. The VNXe 3100 in the AMP only comes with 2TB NL-SAS drives, and performance won't be equal to what you would see with 15K SAS drives. There is absolutely nothing wrong with IP Storage, but do you trust your Tier 1 application on a VNXe 3100 when you have a VNX or VMAX sitting next to it? The VNXe 3100 is only capable of 1GB IP Storage connectivity. This is a choice the customer must make on their level of comfort with performance and the redundancy associated with the different EMC storage platforms.
As your vCloud or View environment begins to scale, you will need to add additional vCloud Nodes or Connection Brokers to accommodate the additional workload. The AMP will not be able to meet a customer's ability to scale. Again, the AMP was designed to house the components required for Vblock management. Adding in additional C200's requires an engineering exception based on available rack space and network connections.
If you still feel you are going to test the limits and run the vCloud or View components in the AMP, please note that this will NOT break or invalidate seamless support. All you are doing is running an unsupported configuration that doesn't adhere to best practice standards. At the end of the day, it's your kit and you can do whatever you want with it, but don't come running for help when something bad happens and I get to say the 4 words I hear constantly from my wife, "I told you so."
You're probably wondering, why can't I just add another server to the cluster, change the amount of NICs, and run storage on the Vblock? You must remember that VCE is built on standards. VCE standards give a customer a guaranteed predictable amount of performance while reducing the risk associated with a "build your own" infrastructure. The time to market VCE is able to achieve is because of the streamlined build process and exceptions to the rule introduce latency. Every piece of the Vblock, including the AMP, was designed and engineered for specific application workloads. The Advanced Configuration Tool (ACT) is a Vblock bill of materials (BOM) calculator that outputs every single piece of equipment that is necessary to build a Vblock. There is no other single tool like it in the industry. Anyone can spit out a BOM from a single manufacturer, but VCE's ACT is the only tool that builds out all 3 parent's products including the cables necessary for communication, plus cable length to boot. Try doing that with anything else out on the market. VCE's standards ensure a guaranteed predictable performance in a time to market that no one else can match.
Now that I've gotten my VCE pitch out of the way, let's focus on the options that are possible for running Tier 1 application management stacks while using Vblock(s) as a resource.
In all of these following options, it is recommended that vCenter, vCenter Update Manager, SQL, and Nexus 1000v VSMs be moved to the respective locations as depicted below. These components that normally run in the AMP have become Tier 1 applications in relation to VMware View and vCloud Director technologies. The loss of any of these VMs can be detrimental to the operations of the application.
The first option is the most simple and will work well for any VMware View or vCloud Director environment running in a very small production environment or perhaps a POC or Pilot. Run the Tier 1 Stack of VMs inside the Vblock on the blades with the same VMs used as resources. This simple design removes the complexity out of any other solution and is also very cost effective because you are not spending capital for an additional infrastructure. There are some key design decisions to take into account though. You may want to use a resource pools for your vCloud VMs and assign them with the label "High". Since every virtual machine in a vCloud environment will be thrown into a resource pool, the mathematics of making sure your resource pools are not unbalanced is relaxed. In a View environment, you may not want to use resource pools. In a VMware View environment, you are cramming as many VMs as possible on to a server. If you create a resource pool for your infrastructure VMs, and then another resource pool for your desktop VMs, it will be very unbalanced. See the resource pool pie chart paradox. Perhaps you can get away with creating a resource pool for your infrastructure VMs and leaving the desktops at the root, but this is going to be a game of checks and balances to make sure your Infrastructure VMs have enough resources to function. If you opt for putting all Infrastructure VMs and View Desktops at the root level of the hosts, you may experience problems with your Infrastructure VMs being constrained for resources. This option takes away the math of resource pools, but you are putting your View environment at risk.
Putting the Infrastructure Stack with your regular workloads can be OK for small production environments or pilots, but that is why VMware recommends the approach of having a separate physical infrastructure for these workloads. By segregating off these important pieces, you are not putting your VMware View or vCloud infrastructure at risk of resource constraint. Let's examine what's possible with a Vblock.
The most simple way to accomplish having a separate infrastructure is by utilizing blades within the Vblock. VMware recommends nothing less than 3 hosts in a single cluster to account for N+1 reliability. This of course, may be a money constraint for many and sometimes, two hosts may be a suitable option for beginning your deployment and expanding later on. When you purchase Cisco B-Series blades in a Vblock, they must be purchased in quantities of two. The rationale behind this purchasing is that you will always need to buy enough for failover capacity. The choice of purchasing the required amount of blades for your management infrastructure is all dependent on your current size and where you want to scale. It's also wise to start off with 2 blades, and as your environment grows, the management infrastructure grows with it so more blades can be added to the management cluster. In addition, this scenario can be Highly Available by placing blades in different chassis to overcome the rare event of a chassis failure. To keep costs at a minimum, the management infrastructure blades can be Cisco B200 M2s with 48GB of RAM, while your View or vCloud environment could be B200 blades with 192GB of RAM. Running your View or vCloud Infrastructure in a set of dedicated blades can be more costly than some options because blade technology isn't cheap and there are required software components of PowerPath V/E and Nexus 1000v that must be added on. On the other hand, this solution is very clean, gives infrastructure VMs the underlying performance necessary, and doesn't require any engineering exceptions by VCE. This is a great way to begin your journey into vCloud Director. As your vCloud continues to grow and you would like to create a dedicated management infrastructure outside the Vblock, the infrastructure VMs can be migrated off and the old blades can now be re-purposed inside of vCloud Director as a Provider vDC. To help soften the cost, the Mini-AMP can be purchased because VMs such as vCenter, SQL, VUM, and N1KV VSMs which normally live on the AMP need to be migrated into this management cluster. When taking this approach, be sure to size your Compute needs accordingly.
The next design is going to be more complex because there are a lot of options available. As we showed in our vCloud on Vblock whitepaper, you can create a separate cluster of C200 hosts that will run your View or vCloud infrastructure stack. These sets of hosts can be a good approach because it's cost efficient since you aren't using up costly blade slots and it increases your HA failover domain. The design decisions from here continue to grow. Networking decisions have to be made next. It's possible to go with a 1GB Multi-NIC approach or perhaps a 10GB NIC approach. These connections can be taken into the pair of Cisco Nexus 5548 switches which are 10GB capable. When going with this approach, don't forget to include the right kind and amount of SFPs that plug into the 5548s. You also have the option of going with PALO adapters and controlling the networking capability inside of UCSM. Depending on the type of Vblock you purchase, there may or may not be available ports in the Cisco 6140 Fabric Interconnects. In either option, 6140s or 5548s/9148s, the amount of 10GB/FC licenses need to be accounted for. Next is storage. Since you are going this route, you will more than likely want to utilize the Vblock storage capabilities because of the underlying performance and redundancy. If you went with the 1GB or 10GB NIC approach, you can run NFS or iSCSI based storage to your hosts. If you prefer Fiber Channel, HBAs can be inserted into the Hosts and into the Cisco MDS 9148s to get Fiber Channel access to the Vblock array. If you went with a 1GB NIC approach, you may not have very many PCI/PCI-e slots in the hosts to support a multi-nic and fiber channel design. The 1GB NIC approach may not be suitable because there won't be enough access port on the AMP 3560s, and using the 5548s for 1GB connections isn't cost effective. If the PALO adapter was your choice, then you can use Cisco's ability to virtualize HBA adapters and get FC access to your hosts as well as run NFS storage.
Now that you have thought out your multi-host design and network decision, where are we going to place of all this? A Vblock has been pre-engineered so all the components go into the same place every single time. There is limited space in a Vblock if you go with all the bells and whistles. These components do not need to live inside Vblock racks and can live anywhere in your datacenter. It's up to you as the customer to know the lengths and types of cables that are needed to establish connectivity. If you would like to make this a seamless solution and have everything fit inside the Vblock racks, the only way you can free up space is by purchasing the mini-AMP. Since you are moving vCenter, SQL, VUM, and the VSMs out of the AMP and into your management cluster, the amount of VMs needed in the AMP have been drastically reduced. Removing a C200 and the VNXe 3100s in an HA AMP option down to a single C200 mini-AMP will free up 3U of cabinet space. Depending upon how much storage your purchase, and you haven't populated the second, third, or 4th cabinet with DAEs, there will be available space there as well.
This solution is very complex because there are a bunch of moving parts and takes design considerations in effect. Not only that, since this isn't a standard VCE solution, this will have to be a custom quote that can be delivered from VCE, Cisco, or a VCE certified partner. You may also find out that VCE will not take responsibility for the racking and stacking of these components. A VCE qualified partner may need to be brought in to perform something like this. It goes without saying that this route will require an exception from VCE engineering. There are efforts underway to make a Tier 1 Infrastructure POD a standard SKU in VCE, but for the time being, this is the route that must be taken.
If you're View or vCloud infrastructure is very large, you may consider purchasing a smaller Vblock to host your Tier 1 Application Stack. It may sound crazy, but there are customers doing this today because it's a simple yet effective way to take care of it.
The final way to accomplish hosting your Tier 1 Application VMs are to utilize an existing infrastructure. If you feel that you are buying the Vblock specifically for View or vCloud resources, your existing infrastructure may be able to handle the management stack. This is a definite YMMV because you are going to know your environment better than anyone else. This approach is not common right now because customers are looking to VCE for new deployments, rollouts, and standardization of their datacenter. This approach will also cause a bit of confusion during the VCE logical build process because it's not a standard way to factory build a Vblock.
I hope this article has cleared up some confusion and has set you on the right path to deciding how to host your VMware View or vCloud Director Infrastructure.
To read more about the design of multiple vCenters in any of these options above, please read the vCloud on Vblock white paper