EDIT: 11/29/2011
I have no updated these to reflect vSphere 5 changes!
VMware vSphere 5 Host Network Design Layout and Configuration
Another free consulting gig from yours truly. I was asked for some help through VMTN Forums and LinkedIn on planning a host NIC design. The design incorporated 6 NICs and it was going to be used as a proof of concept based on all vSphere features. Well, I couldn't just do a design based on that alone so I figured I would diagram out a few different solutions for the masses.
So here we go. If you want to do just a proof of concept and don't care about doing it "right", you can always design it as pictured below. This design gives you the ability to incorporate all of vSphere's features and should have plenty of bandwidth to take care of everything. Of course, this is proof of concept so I didn't take NIC redundancy into account. This design is solely to give you the ability to test out all of vSphere's features. One thing I constantly see mistaken on VMTN forums about NIC design is the Fault Tolerance network. To my understanding, when you enable FT on a VM, there is going to be tons of traffic flowing through that particular NIC and VLAN so you want to have it segregated from everything else. So if you are designing a vSphere environment to use Fault Tolerance, I would think about adding a few more NIC cards and checking out my blog post on vSphere Host NIC Design - 10 NICs. **UPDATED 5/28/2010** Check the bottom of the page for a layout to use FT.
NOTE: vmnic0, vmnic2, vmnic3 must all be configured as trunk ports on your physical switch, use VLAN tagging on your vSwitch Port Groups to allow traffic to flow.
Next I decided to do a few different kinds of scenarios if you want an enterprise ready solution. Physical NIC layout: Do you have 2 on board and 4 in 1 expansion card, or 2 2x expansion cards? And I wanted to give you the option of doing something a bit different if you think you might hit a VM network bottleneck.
These all have a few things in common:
- The Layer 3 Switch.
- If you are a SMB, you will most likely use some sort of stacked switch such as the Cisco 3750G or 3750E series. If you have a little bit larger enterprise, you will likely go with a Cisco 4500, 6500 or Nexus 7000 type of core switch. Do not think that 2 switches connected by a 1 GB uplink port suffices as a "stack". A stack solution such as the 3750G has a 32Gb/sec interconnect and the 3750E has a 64Gb/sec interconnect.
- Trunk Ports
- I found myself configuring every switch port as a trunk port and tagging the VLAN at the vSwitch port group when designing for only 6 NICs. Teaming NICs on fewer vSwitches gives more redundancy, but keeping them all logically separated will cut down on broadcast traffic and network noise.
- Jumbo Frames
- For jumbo frames to work, they must be configured from end-to-end. The vmnic, the physical switch, and the SAN must all be set to use 9000 mtu. If one piece is missing, network traffic will fail. I particularly only deploy Jumbo Frames on the storage network because that is where you have the most to gain.
- How to configure on vDS - http://blog.scottlowe.org/2009/05/21/vmware-vsphere-vds-vmkernel-ports-and-jumbo-frames/
- How to configure on vSwitch - http://blog.scottlowe.org/2008/04/22/esx-server-ip-storage-and-jumbo-frames/
- LACP
- Link Aggregation will benefit you most on the iSCSI storage network.
- To properly configure LACP, read KB Article: 1004048
This is a traditional VMware 3.5 design and not based on vSphere alone. This is what I would consider the "safe route", but doesn't take into account a Fault Tolerance network.
The difference in the diagrams takes into account the type of NIC expansion cards in your physical hosts.
6 NICs = 2x2x2 or 2x4
- If you have 6 NICs, you get there by having 2 onboard NICs, and either 2 x2 NIC expansion cards or 1 x4 NIC expansion card
- I particularly don't like to mix on-board with expansion in iSCSI because of fear of "flow control". Yet, the only way I could get around a proper HA configuration on 2x4 is to have iSCSI and VM network use 1 port on both the on-board and expansion card.
NOTE: On the next 4 pictures, your SAN must be directly connected into this layer 3 switch. Therefore, you are not routing your iSCSI traffic, but only switching it, vastly increasing performance.
I thought I would try out this next scenario if you feel that only having 2 NICs for VM Network Traffic might not be enough. The elimination of one vSwitch puts Service Console/Management, vMotion, and the VM Network all on 1 vSwitch or 1 dVS if you prefer. By actively using vmnic5 as both VM Network Traffic and your Service Console/Management, you can get more VM throughput.
If you belong to a medium or larger enterprise, I would physically separate the storage network onto a layer 2 stacked switch solution. Granted, the examples above will not have any routing going on if your SAN is also directly connected into the stacked layer 3 switch, but if you have the means to physically separate the storage network, you have more options to play with.
NOTE: Below is an example of physically separating the storage network. ANY of the previous examples will work (i just didn't want to reiterate the same diagrams) and should be configured this way if you have the means.
**UPDATED 5/28/2010**
After talking to Anton Zhbankov (@antonvirtual), he's given me a solution to incorporate Fault Tolerance into a 6 NIC layout.
From Anton:
"FT should have dedicated NIC. But unless you provide VLAN-based QoS there is no need to create additional VLAN. VMotion and FT are on the same level of security. FT even starts as VMotion, then only difference is that FT does not power off source VM after VMotion process succeds.
There is no BIG traffic in FT, since it only logs unpredictable changes like user input or network traffic, but FT requires low-latency network.
You can give dedicated NIC but also provide redundancy via NIC teaming policy for portgroups. Let's say SC + VMotion VMkernel have nic0 as active and nic1 as standby, FT VMkernel nic0 as standby and nic1 as active. In this case you netwrok redundancy for all three portgoups.
There can be 2 different designs:
1) VM and iSCSI are the same vSwitch, but VLAN divided. In this case we can put vmnic1 as standby for any portgroup, just for additional redundancy. Slightly less secure environment. But since iSCSI is basically on the same security level as VMotion - plain unencrypted traffic, we can put iSCSI on vSwitch1 with SC, VMotion and FT.
2) We have to choose which portgroup would have slightly more redundancy - VM or iSCSI. I'd prefer iSCSI in this case because if VM network fail completely we can VMotion VMs to good host, and we can't do anything if iSCSI fails."
I went ahead and hashed out another diagram for this. I went ahead and put FT on the save vSwitch as SC & vMotion, but I put FT in a different VLAN. It's a personal preference to seperate different kinds of traffic, but you can put vMotion and FT on the same VLAN as Anton pointed out. Again, if you can isolate the storage network on different physical switches, it's much more preferable.
Please take a look at my other posts for more information and diagrams:
vSphere Host NIC Design - 10 NICs
vSphere Host NIC Design - 12 NICs
Question:
I see that you have the service console & vmotion on vswitches rather than on distributed and they’re shared. I know that the service console has very little traffic and is probably safer on the vswitch so the system is accessible on its own but I’m surprised to see vmotion there. Wouldn’t that need its own NIC and be suited to be on the dv? Or is it not considered to have a good deal of traffic normally as well?
Also, I see you have created multiple dvwswitches instead of creating 1 with multiple portgroups. Is that to better segregate the physical NICs?
From Kenny:
vMotion will ONLY have traffic on it during a vMotion. It shouldn't be happening all the time and DRS will do it for you automatically (depending on licensing of course).
You have the option of putting EVERYTHING on a dVS, but I prefer to have my service console and vMotion sitting on 1 vSwitch with 2 NICs and use each others NICs as active/stand-by. This way you can get by using only 2 NICs for both, where as if you want to have redundancy, you have to have 2 NICs for each, taking up 4 NICs total. vCenter is the control plane for vDS while the ESX host is the data plane. If vCenter goes down, your dVS does not go down. Traffic will still flow, but since vCenter isn't available, you cannot make any changes.
I would always recommend using a dVS for VM Network traffic
It's your choice whether or not you want to create just 1 dvSwitch or multiple ones. I do it to have a cleaner look inside of the VI Client and it makes you separate the traffic a bit more on that level. Remember, a virtual switch is kind of like a physical switch. Whatever is plugged into that switch can talk before leaving and going back to the physical network (not like a hub, but like a switch). So having multiple vSwitches and dVS is just further segregating traffic.