Here was a VMworld session that didn't get picked, so here we go...
The biggest difference between vSphere and vCloud is that whatever you are trying to accomplish in vCloud Director all depends on the design. With vSphere, there are pretty standard practices to designing a standard layout. There are options in vSphere design to meet certain criteria, but you ultimately can't design for vCloud Director until you have a strong understanding of the effects of vSphere design.
After designing for a few vCloud Director environments, I wanted to create a list that anyone can reference so you can nail down the top design criteria. Without further ado in no particular order..
1. Is vCloud Director actually necessary?
I'm not going to lie, there is a bunch of hype out there about vCloud Director. Since the inception of Project Redwood, it was touted as the generation of VMware's cloud offering. Every product from every vendor is working on vCloud integration, with partners, contractors, and vendors are pushing for it's adoption, and VMware itself seeing vCloud being the next vSphere. But what does that mean for you? I would imagine atleast 90% of IT shops today have some sort of virtualized environment and that's the stepping stone. If you are thinking of adopting vCloud, you have to ask yourself, "what am I really trying to accomplish?"
The answer to this question is going to be unique for everyone. Are you a service provider, an enterprise customer, or a SMB user? Are you just looking for a portal with a self-service catalog? Are you trying to create multi-tenant networks?
Lets break this down into what vCloud Director "really" is in Kendrick Coleman's definition:
vCloud Director is an abstraction and pooling of vCenter and vSphere resources that allow organizations in a multi-tenant to deploy vApps/VMs from a portal with a self-service catalog into a completely isolated and secured layer 2 network accomplished through series of automated and orchestrated tasks.
Whew! that's a long sentence. Now you know what vCloud Director actually offers, what actually stuck out to you that you want for your organization?
- Pooling of multiple vCenter and vSphere resources
- This statement should come with an asterisk next to it. As you can tell, this is probably for large vSphere farms with different types of infrastructures. Whether you have brand new servers with killer storage arrays and a mixture of other low end arrays and old servers, vCloud Director can suck all of that up and you can divide them into pools of resources. If you have a vSphere environment with 3-6 hosts, you will more than likely end up with either a single infrastructure offering or will be playing "resource pool control freak" down the line.
- Do you require logical multi-tenancy?
- Unless you are a Fortune 500 company or a Service Provider, I don't see this one very often. Does your enterprise require that HR, Engineering, Development or other departments have separation of IT resources for chargeback and billing purposes? Or does IT have control over the entire infrastructure, regardless of who owns it. This is a change in thinking that will have to eventually start if you want to move to a "cloud-like" infrastructure. Just because that's the way it's always been, doesn't mean that's the way it will always have to be.
- Do you need a portal where users can access or request IT services?
- Enabling end-users is always a key requirement in IT and it helps move forward. We usually try to make things simple for our end users. If you have looked at the vCloud Director UI for an end user, it's not that simple. I've done a few engagements where I had to demonstrate to a customer after vCloud was installed, how an end-user deploys a VM. Needless to say, it's not what they were expecting and by their standards, it was very complex. Depending on the ability of your end-users, you may have to get another product like NewScale to build a simple UI and perhaps hook back into vCloud for other things mentioned in this list, or just into vSphere without vCloud at all. Another thing to mention is that the vCloud portal can only deploy VMs into vCloud Director. What if in addition to virtual machine provisioning you also wanted to provide bare metal provisioning, or poke holes in a firewall somewhere, or allow a user to request a new IP phone for their desk? vCloud isn't going to be able to accomplish this. You are going to need integration to vCenter Orchestrator to do some of this.
- What kind of applications do your users need to deploy from a self-service catalog?
- This feeds into the previous question about the portal. What do you want to offer? Without vCloud Director, you can offer pretty much anything because it can virtual or physical (heck you can offer Snicker bars if you want). vCloud Director on the other hand can offer up virtual machines, but in certain ways that are unique. First off, you can create multiple global catalogs. Perhaps one global catalog has standard images of Win2K8R2, WinXP, WinXP_x64, Win7, Win7_x64, Ubuntu, SuSE, etc. Then another global catalog can offer ISOs of applications such as SQL, Office, Exchange, etc. Another global catalog can be sets of virtual machines and applications packaged as a vApp, ie vApp1=DB, App, WebServer & vApp2=vCenter on 2K8, SQL on 2K8, and 2 ESXi Hosts for a nested deployment of vSphere. That's pretty cool for a simple mix of global catalogs instead of having to configure multiple drop-down boxes in a customized portal. The other unique feature is giving control to Organizational Admins. Every Organization (multi-tenant) can have its own self-service catalog as well. So if a user in the Development Org has a new beta code and they want to give other developers access to try it out, they can upload that vApp into the Development Catalog to allow other developers to deploy it and test it out. This unique feature enables end-users without the constant need of having IT intervention.
- Do you need isolated and secure networks?
- Of course, we probably think we do, but again it will depend on your requirements. A few IT shops I've been to and implemented didn't understand the implications that vShield Edge devices throw into the mix. This requirement is mainly a big pro in Service Provider environments where we can guarantee that Pepsi and Coke won't be able to see each others traffic. In an enterprise environment, that usually isn't the case. Does it really matter if a HR server and a Engineering server can ping each other? Alot of this is accomplished today through AD and GPO policies or at a L3 switch or router with ACLs. You also need to think about communication between VMs that aren't in an Organizational vDC, which we will get at later.
- Do we need automation and orchestrator capabilities?
- Of course we do, we all do. That's a dumb question. But what about workflows with email approvals and such? some of that isn't a part of vCD and you may need vCO with a web facing portal instead.
2. How big is my cloud and how do I manage it?
This is always a unique situation depending on how big you plan on implementing vCloud Director. vCAT recommends to have a "Management Infrastructure". Did you read that correctly? Not just a management cluster, but a whole entire infrastructure with servers, network, and storage. Let's be honest there aren't going to be many shops ready to dedicate an entire infrastructure to just vCD Management, but perhaps it could already be a running vSphere farm where AD/DNS, SQL, and more all live where you can start adding in vCD Cells, Chargeback and more. Just make sure you have the capacity.
The second option is to create a management cluster, aside from your PvDC resources. This PvDC cluster can still utilize the same networking and storage infrastructure, but you have separation at the cluster level. How many servers do you need? A VMware best practice is going to be 3 servers to satisfy HA, DRS, and N+1. If this is a POC, that can get expensive real quick. So maybe you will just opt for a 2 server cluster until you are ready for production when you can add another server or two. Heck, since it is a POC, why not just use 1 server to host all the VMs? Of course, if that server happens to go down for any reason then you are SOL, but it's a POC and not production.
The third option is a bad option and you should avoid this. Don't have a management infrastructure and just host the management VMs in the same cluster as a provider vDC. This is NOT recommended because you are utilizing a class of service. This class of service then has a misrepresentation of resources to vCloud Director. When you add a provider vDC as a cluster to vCD a resource pool is created. So if you have VMs outside the cluster, while also having vApps/VMs inside the resource pool then you are facing the resource pool pie paradox that has been talked about many times in Duncan and Franks HA/DRS Deepdive books.
The amount of servers is going to be dependent on the amount of VMs you plan on hosting there such as: 2 vCenter servers (perhaps adding in heartbeat as well for a total of 3 or 4), vCenter Chargeback that can expand to multiple collectors, vShield Manager running in FT mode, Multiple vCloud Director Cells, a SQL server or two and perhaps more if you want to implement clustering services, redundant AD/DNS, a load balancer for the cells, vCenter Orchestrator, and plenty of other things. The amount type of servers is all going to depend on your design and availability. David Hill has a great blog post on this called the vCloud EcoSystem so you can see how many databases you may really need.
You must also keep in mind how important the management infrastructure is. No management infrastructure = no vCloud Director. The vCloud gurus Colotti, Epping, and Hill came up with an idea for failing over the management infrastructure Overview of Disaster Recovery in vCloud Director. Look for better integration to come in the future, but this is the today.
3. How many vCenter Servers do I really need?
The amount of vCenter servers will depend on the design and reliability. You can of course implement vCenter Heartbeat with vCloud and it's actually a very good idea. For the longest time, vCenter has been a center of management and function for a cluster of ESXi hosts. The loss of vCenter didn't mean downtime though because VMs continued to operate and even HA is reliant on hosts after initial configuration. vCloud Director provisions it's resources by talking to vCenter APIs. So what happens if you lose a vCenter server? You should be able to guess that any provider vDCs assigned from that vCenter server will be inaccessible and no provisioning or management tasks can happen unless you have direct IP access to your VM or vApp inside of that vSphere cluster. vCenter becomes a critical part of vCloud design.
Without diving into this too much, I wrote a blog post on this titled vCenter and vCloud Management Design - Management Separation. You should have 2 vCenters in a vCloud environment, one functions as the management vCenter while the other manages vCloud resources. This design is critical because:
- Separation of management domains. It's important to know that vSphere and vCloud are different animals. Just because you are a vSphere admin, it doesn't make you a vCloud admin. By separating the two environments, you are letting vSphere admins access VMs that are outside the Cloud, and manage VMs that are considered vCloud Infrastructure.
- vCenter becomes abstracted. ESXi abstracts the hardware layer, and vCenter is the central management point. vCloud Director abstracts the resources that belong to vCenter and present those to vCloud as Provider Virtual Datacenters.
- Saves vSphere Admins from themselves. Have you've ever watched what happens when you add a vCenter server to vCloud Director? vCloud Director takes charge. It does it's own thing by creating folders, resource pools, port groups, appliances, etc. Everything that is created by vCloud has a set of characters that proceed it to become unique identifiers. If a vSphere admin has access to a Distributed Virtual Switch, and notices some random portgroup ending with HFE2342-FEF2123NJE-234, he is probably tempted to delete it. If a user goes crazy and starts deleting objects directly from vCenter without vCloud's knowledge, its havoc.
- Relieve Stress on vCenter. As Duncan pointed out below in the comments, if a tenant of the cloud is issuing a bunch of requests, it could possible render the vCenter server unusable. By separating out the workload among 2 vCenter functions, you will not impact a vCenter server responsible for management functions.
On the flipside, there is a cost effective way to create a vCloud environment using a single vCenter server, so checkout the blog post for that.
Another thing to keep in mind is the scale of your cloud. If you have a lot of provider vDCs that span beyond multiple vCenter Servers, all of these can be presented to vCloud. In fact vCloud Director can manage up 25 vCenter servers in version 1.5.
4. Is my vSphere host NIC and storage design good enough for vCloud?
It may or may not. For the management infrastructure, a typical vSphere design will be satisfactory. When you look at your vCloud resources on the other hand you need to keep some thoughts in mind. It doesn't matter if it's a 1Gb or 10Gb infrastructure, you need to keep a few things in mind:
In a typical vSphere environment we have port groups that are assigned a VLAN. With vCloud, these are called External Networks/Portgroup. These port groups are for traffic entering and leaving the vCloud environment.
Org Networks will be created automatically by vCloud Director, so you need to anticipate where they are going and what NICs they will be using for traffic. This is for Internal Networks and External Routed Networks vShield Edge devices are deployed during the creation of External Routed Networks or Fenced Networks. Whether it's a 1Gb or 10Gb infrastructure you need to know what happens when networks are deployed and which NICs are being utilized.
I've already written a blog post on this called vSphere and vCloud Host 10Gb NIC Design with UCS & More. This post dives into VCDNI/VXLAN, vDS Design, 10GbE options, and more.
5. How do I decide levels of Provider vDCs?
This is where a lot of decisions need to happen. Where are you going to determine your level of service for provider vDCs? How are you going to determine the difference between Platinum, Gold, and Silver tiers? How will you determine your Day +1 operations when it's time to either expand the Platinum tier, or what next gen hardware becomes the new Platinum tier? There's a bunch of different ways to determine levels of Provider vDCs:
- Storage Tiers - SSD vs FC vs SATA vs EMC FastVP vs XYZ
- Storage Performance - RAID5 vs RAID6 vs RAID1+0
- SLA/Backup/DR/Replication - 1 hour vs 1 day vs 1 week RPO/RTO vs None at all
- Compute - Ultra High Performance B440 vs High Performance B230 vs Standard B200
- Pod - Old Hitachi Pod vs Vblock 1 vs Vblock 700MX
Ultimately, your decision on a provider vDC strategy will be determined by your consumers and what they require. After you figure out how you want to charge for different provider vDC levels, then it becomes time to figure out sizes. Fast Provisioning (Linked Clone) operations require that block storage be limited to 8 hosts in a cluster. NFS on the other hand can handle up to 32 hosts per cluster and still support Fast Provisioning. When configuring a Provider vDC within vCloud for a production environment, you should always have provider vDCs based upon the root cluster and the datastores attached to that host. It is possible to have Provider vDCs based upon Resource Pools and specific datastores within a cluster, but we are getting back to the resource pool pie paradox where we have to be a "resource pool control freak" to make sure everything is correctly balanced. It is a standard recommended best practice to always assign a cluster of servers as a tier of service. You can read more about this in a related article Rethinking Your vCloud Director Provider vDC Strategy with Vblock
6. If I'm an enterprise customer, how does the organizational networking relate to what I'm doing today?
Coming from the enterprise space myself, this is always a conversation that many businesses try to understand. Once you fully grasp what vCloud Director offers, you can see that the program was written to tailor Service Providers or IT Shops looking to become a SP for their business units. For many enterprise shops, IT is still in control of all the resources, therefore they ultimately have to be the ones in charge on the back end. That's not what vCloud Director was built for. vCD was built to allow business units to consume IT services and that's where it stops. IT can't be responsible at this point for patching the server or upgrading software because whoever deployed this VM is now in charge of it.
There are many things to take into consideration such as Active Directory integration, internet access, routing, email, etc. Many companies I often visit say that if there is going to be a Windows Server on their network, then it must be joined to AD and it will have to follow standard patching processes. How does that patching process happen? Either through something like WSUS or another agent program that probes for Windows VMs on the network and then gets placed into the patching process. Have you determined how that's going to happen in a vCloud environment? There are only 2 ways.
- The first is that all your VMs/vApps will have to be deployed on direct connect external networks and you will get minimal use out of vClouds ability to create secure and isolated L2 network segments using vShield Edge. This is basically like have an automatic button for placing VMs on portgroups that we create.
- The second option is to use isolated L2 network segments with vShield Edge, except you need to do additional NATing on the device and open up firewall ports. This will require a bunch of extra work after a VM/vApp/Org Network is deployed but can be automated with something like vCenter Orchestrator.
- The third option uses vShield Edge, but you have to put whatever server you are trying to communicate with into the Organizational vDC for communication inside the firewall zone.
The reason it happens all in this fashion is because the L2 isolated network segments are built just like your corporate network. You have an IP address of 192.168.3.100. You can ping anything on that 192.168.3.0/24 network. You can ping your exchange server at 192.168.50.50, and your AD server at 192.168.50.15 can check in on your GPO because even though they are on different VLANs, there is a routing table that exists. You can also access the internet, however, since there is a firewall in place the Internet cannot access anything on your network. The same goes for a VMs/vApps inside of vCloud Director on a External Routed Network. You can get out beyond the firewall and go to the internet, and access other stuff on your network. On the flip side, anything outside on your network such as AD cannot get in contact with you. As I said before, unless you are creating a NAT and poking firewall holes, those VMs are behind a firewall and owned by the Organization Admin and not IT.
If you are designing a vCloud environement for anyone other than a Service Provider, make sure you understand their application requirements. If it's just for developers to play around and they can be isolated on their own, you can let them have at it. If it's for running production virtual machines that will be used by the enterprise, make sure it's deployed onto the correct network with appropriate firewall rules put in place.
7. What types of network pooling should I be using?
If you're here you should know the 3 different types of network pools: Port-group backed, VLAN backed, and VCDNI backed. You need to understand the pros and cons of each one to decide on what to use.
I think it's safe to say that 99% of installs won't use Port-group backed network pools because it removes what is great about vCloud Director, automation! The automation of dynamically creating networks is a big feature of vCloud and is essential for internal and external routed networks. For the longest time, the Nexus 1000v was only compatible with port-group backed pools, but with the release of 4.1(1)SV(5.1a) that constraint has been lifted. Unless you have uber strict security policies or need very granular control, ee can pretty much eliminate the use of portgroup backed and focus on the other two.
VLAN backed pools are good because it gives a service provider a familiar feeling to how it may currently operate. If a service provider typically sets aside a set of VLANs for each customer, this can still relate to a VLAN backed pool. You can create a small pool of VLAN networks for each organization, or you can create a large pool of VLANs that any organization can consume. An advantage of this makes adoption much easier. A second advantage is that there is minimal network configuration needed. All the network team needs to do is create more VLANs on the upstream switches without worrying about jumbo frames or anything else. The other advantage is that this network can be L3 compatible. Having this network pool being L3 capable gives you greater distance within a datacenter. If you have a Provider vDC stretched somewhere or it spans multiple switches, the L3 capability enables networking between the hosts without any difficult configuration. This is a primer for VXLAN coming in the future. A disadvantage of this network pool is that you are going to continue to burn up VLANs when there is only a finite amount.
VCD-NI or vCloud Network Isolation network pools is VMware's proprietary MAC-in-MAC encapuastion method. The biggest advantage of this network is the ability to use a single VLAN (or very few VLANs) and be able to create over 1000 networks in a single VLAN. So think of being able to encapsulate 1000 VLANs or separate networks all in a single VLAN. It greatly increases the network scalability that many service providers need. This network pool can also be dynamically controlled by vCloud Director with no administration overhead. The last advantage is that it prepares your network for vCloud.Next with its replacement of VXLAN. The dis-adavantages is that your network team will need to enable Jumbo Frames on all networking equipment and the cloud admin will need to set the vSphere vDS to 1600MTU as well as the vCloud Pool to 1600MTU. Your ESXi hosts will need a little bit extra CPU processing resources to examine and strip packet headers, but it's a very minimal amount and we all know that CPU processing is cheap now-a-days.
8. What resource allocation models am I going to provide?
This question will be answered and determined by what your tenants want. For a service provider, you may have all three options available to give every consumer the option to choose. An enterprise shop on the other hand may choose only 1 specific model and make that their base for ease of chargeback/showback to their business units.
This first option is the easiest to understand and relatable. Pay as you go. The amount billed is done on the actual usage of resources and tends to be unpredictable. This is the way we typically use electricity in our homes. If a light bulb is on, we are paying for the current, if it's off, then we don't. The longer we keep that light bulb on, the more it will cost us. Simple and straightforward. When configuring this option inside of vCloud Director, we need to decide if we want to reserve capacity. By default, 0% CPU is reserved but 100% of memory is reserved. That means a single organization can create a ton of VMs and for every VM that is turned on, 100% of the memory assigned to that VM will be guaranteed. This is a good thing and a bad thing. The good thing is that your tenants will get a predictable amount of performance for every VM that is turned on. The bad thing is that if you turn on too many of these VMs, you may run out of capacity in your Provider vDC, so that needs to be accounted for. If this setting were to be set at 0%, it would be like normal vSphere operations. You could power on more VMs, but you will eventually hit a threshold and if no more available RAM is present, the VMs could suffer performance issues. If you plan to give your consumers a consistent experience, keeping this at a reasonable percentage or at 100% is a safe bet because there will never be contention for RAM resources. As a cloud admin, you must be aware of the consumed RAM in a provider vDC. This is primarily how enterprises will want their allocation model because it gives actual representation of each business unit or organization
The second option is the Allocation Pool where we guarantee a a set of resources but the customer can over commit. This helps companies bill for a subset of resources. So if an organization knows it will need to consume 24GHz and 40GB of RAM per month, we can set those resources aside and they will be billed for those amount of resources. This model allows the organization to burst beyond that so the can still chew up an extra 10Ghz and 15GB of RAM if they would like. It forces the user into the overcommit range where you can charge different amounts. Kind of like your cell phone bill when you have an overage of minutes or texts. You pay for a certain amount, but you are allowed to burst, you just have to pay more.
The third option is the Reservation Pool where you are guaranteed an amount of resources but you may not consume all your paid for resources. It's a wasteful option, but it's for predicatable billing and no surprises. There is no burstability and you get capped once that limit has been hit. Its the same thing as owning a box, filling it with contents and once it's full, it's full.
9. Is my environment ready for vSphere to use vCloud Director?
vCloud Director is backwards compatible. So you need to understand the features and functionality that come with newer version of vSphere. For instance, the Fast Provisioing piece is only available with vSphere 5. So you can use always use the newest version of vCloud Director, but your infrastructure is backwards compatible to vSphere 4.0 Update 2.
If you have vSphere 5 and want to use FastProvisioning, there is another design criteria. We can only have 8 hosts in a single cluster when VMFS datastores are in use in version 1.5. This is a limitation of VMFS, but not with NFS. You can have more than 8 hosts in a cluster and use Fast Provisioning if NFS datastores are in use. This is all going to end up coming back to your storage design and Provider vDCs.
The other thing you might want to take into consideration is that you can use AutoDeploy to inject the vCloud agent into the build image. Auto-Deploy is compatible with vCloud so it will make it easier to upgrade hosts when the time comes to move from 5.0 to 5.next.
What if you are running IPv6, is that vCloud ready? Almost. Every single piece of the stack is compatible with IPv6 except for vShield Edge. vShield Edge still requires IPv4 addressing for fenced vApps and NATing. If you feel that IPv6 is the way you want to go, you can connect vApps directly to external networks supporting IPv6 and bypass vShield Edge completely.
These design points are thinking about it high level. This article didn't touch on things inside of Organizations such as catalogs, LDAP, permissions, etc.