of virtualization in the Intel lab using common network micro-
benchmarks before attempting the virtualization of the gaming
server environment. This would allow us to quantify the latency
added by virtualization to see if it would be significant. When we
were sure that the latency added should not be a concern, we
proceeded to test the gaming server virtualization with private
testing in the ESL lab and ultimately onto public testing on the
Internet with real ESL members.
Server hardware
The PoC targeted the Intel Xeon processor 7300 platform with
four processor sockets with the six-core Intel Xeon processor
7400 series (Dunnington). These new processors became
available in September 2008 and are hardware and software
compatible with Intel Xeon processor 7300-based platforms
that have been in production for more than a year. The Intel
Xeon processor 7400 series delivers a performance boost from
using six rather than four cores per socket and by the addition
of a new 16 MB L3 cache. It also delivers an energy-efficiency
boost derived from our 45nm high-k process technology. In
addition, the Intel Xeon processor 7400 series has added some
enhanced hardware-assist features for virtualization. The
platform supports 32 memory slots for up to 256 GB capacity.
In this PoC we used 32 GB.
Network I/O
But virtualization is not just about CPU and memory resources.
It’s important to have I/O tuned for virtualization, too.
In a typical virtualization scenario (Figure 3), the network I/O for
all the VMs is delivered to the hypervisor. The hypervisor then
performs the necessary Ethernet switching functions in software
to forward each network flow to the destination VM. This
software function, called a virtual switch, is much slower than
a typical hardware-based Ethernet switch and causes CPU
loading that detracts from application VM performance. Also,
the hypervisor virtual switch has to process all the interrupts
sent by the network I/O device on a single CPU core. This can
be a bottleneck too, especially for faster networks like 10 GbE.
As shown in Figure 4, the Intel® 10 GbE NIC runs into this single-
core interrupt processing load bottleneck. In this case, the 10 GbE
NIC can only receive 4 GB of traffic due to the saturation of the
single CPU core processing all the receive interrupts at 10 GB
line rate.
Virtualization
Hypervisor
VM
1
Virtual
VM
2
Virtual
VM
n
Virtual
NIC
LAN
Figure 3. Network data flow for virtualization without the use of
VMDq and NetQueue technologies.
without VMDq
Unused
I/O capacity
Result: NIC performance can
be up to ~60% underutilized
10.0
8.0
6.0
4.0
2.0
4.0
Throughput
Figure 4. Impact of virtualization on a 10 GB Ethernet NIC
without the use of VMDq and NetQueue.
5
White Paper Consolidation of a Performance-Sensitive Application