Introduction
At VMworld 2008 in Las Vegas several of us in
our virtual performance team met with a variety of customers to talk
about Microsoft SQL Server. We already had a large base of customers
running very many SQL Server DBs on our products and we wanted to
collect information on the challenges posed in the process of
virtualizing this critical workload. We were pleased to see that ESX
Server handled SQL VMs with excellent performance. But, for many
customers, the first efforts at virtualizing SQL didn't yield
high-performing SQL VM. After careful investigation and many, many
discussions we've started to put together the puzzle as to where SQL
Server performance problems come from. This page will document these
common problems, borrowing slides from our presentations on the
subject.
Virtualizing SQL: The Checklist
We've talked with
dozens of customers in the past months to document the issues that
resulted in poor SQL performance. Happily, none of the issues were due
to underlying technologies. Here is a list of issues and an explanation
of the impacts. These items are roughly listed in the order of
decreasing likelihood of occurrence.
Item 1: Configure Storage Correctly
Storage
configuration problems are the number one cause of SQL performance
issues. Usually these problems arise because the DBA requests a
virtual disk of the VI admin, the VI admin places the VMDK on a LUN
that may or may not meet the DBA's performance needs. For instance:
- VMs' VMDK files placed on VMFS volumes without enough spindles.
- Many VMDK files placed on a single VMFS volume which could use more spindles.
- Database and log files placed on the same LUN which, you guessed it, could use more spindles.
This may be obvious to some, but this problem occurs again and again.
The VI administrator should be aware of a few technical items that can
help understand and avoid this problem:
- Based on the IO
demands of the DB files, a certain number of spindles should be
guaranteed to this file. This means that its VMDK must be placed on a
VMFS volume to accout for the SQL Server's demands and all of the other
demands on that volume.
- Mixing sequential activity (such as log
file update) and random activity (such as database access) results in
random behavior. This means that the LUN configuration in the
pre-virtual physical environment may not be sufficient for the
consolidated environment. This is discussed some in Storage Performance: VMFS and Protocols.
- When
storage isn't meeting the SQL Server's demands, the device latency or
kernel latency (queueing time) will increase. Read up on these
counters in Storage Performance Analysis and Monitoring.
Item 2: Use Recent Hardware
Often companies that are dipping their metaphorical toes into
virtualization want to run proof-of-concept (POC) experiments to verify
that the virtual platform can meet their performance expectations. But
its surprising how many times these experiments are run on older,
poorly-performing hardware. Presumably the shiny, new systems were in
use for production applications so only the mothballed, cobweb-covered
servers from a previous generation were available for the POC. This
causes many problems. Check out this slide from a talk on SQL Server at VMworld Europe 2009:

The
slide points out a couple of things. First, the larger caches and
shorter pipelines on newer Intel processors results in a considerable
drops in performance overheads. Second, the latency of the VMEXIT
instruction, which determines the amount of time it takes to transition
from the VM to the VMkernel, has shrunk by a large amount with
subsequent generations of hardware. And don't forget the other
additions from Intel and AMD such as hardware assisted memory
management and IO virtualization.
Item 3: Follow SQL Server Best Practices
Microsoft has kindly provided a
web page of best practices for SQL Storage configuration. These be practices should still be followed when configuring your virtual SQL deployments!
Item 4: Configure VM Identically to Native and Run The Right Test
For
many SQL Server POCs the goal is to measure the VM's ability to
perform, with respect to the virtual platform. If this comparison is to
be performed, its critical that the VM be configured identically to
the physical hardware. Obviously this means that the VM should be run
on the same hardware using identically configured LUNs. Its also
important to ensure that the VM has the same number of vCPUs and amount
of memory as the physical baseline. This means restricting the number
of pCPUs and amount of memory with NUMPROC and MAXMEM, respectively, in
boot.ini.
It also means that the test being applied should be
understood. If a benchmark is chosen that uses a very small database,
the content will be cached and the storage system won't be used. This
can skew the results and produce recommendations not consistent with
production deployments. Here is another slide from the same VMworld
Europe 2009 presentation detailing some of what we know about the SQL
Server benchmarking alternatives:

We at VMware prefer DVD Store.
Item 5: Use VMware's ESX Server
VMware's hosting products, VMware Server, VMware Workstation, and even
VMware Fusion, are all capable of running SQL Server. But if the
database is going to be run in production on enterprise-class hardware,
use VMware's enterprise-class hypervisor: ESX Server. These products
are not often confused by the initiated but rogue members of large
companies often run off-the-books proof-of-concept experiments on
VMware's hosted products. When they produce results they don't like,
the results get spread throughout the company which can slow the
virtual deployment.
Consider the following data, again from the VMworld Europe 2009 SQL Server presentation:

This
information is getting a bit dated now, as it was performed years ago
on ESX Server 3.0. But the point stands: before believing results
claiming that "VMware cannot run SQL Server" its worth investigating
the platform used to generate the results.
Item 6: Understand Memory Management and Configure Correctly
Database
performance is heavily dependent on the amount of memory available.
Almost without exception, providing more memory to SQL Server will
improve performance. However, if that memory is coming from a host that
is already over-committed or is being provided through workarounds to
32-bit limitations, performance may suffer. Here are a few keys for SQL
Server memory management:
- If more than 3 GB is desired, use 64-bit versions of the OS and application.
- If
memory is over-committed on the box, set reservations for
performance-critical SQL Server VMs to guarantee that those VMs' memory
isn't ballooned or swapped out.
- If SQL Server's "lock pages in
memory" parameter has been set, provide set the VM's reservations to
the amount of memory in the VM. This setting can adversely interfere
with ESX Server's balloon driver. Setting reservations will stop the
balloon driver from inflating into the VM's memory space.
Item 7: Align Disk Partitions
This
item is really a special but very important case of item two, follow
best practices. Partition alignment can impact storage performance
which can be critical to some SQL Server VMs' performance. See VMware's
paper on partition alignment for more information on this.