How to throttle that disk I/O hog

We are in the middle of a large server virtualization project and are utilizing two Clariion CX-400 arrays as target storage systems. The load on these arrays is increasing while we are putting more and more VMs on them. This is somewhat expected, but recently we noticed an unusual and unexpected drop of performance on one of the CX-400s. The load on its storage processors went way up and its cache was quickly and repeatedly filled up to 100% causing so-called forced flushes: That means the array needs to shortly stop any I/O coming in while it is staging the cache contents down to the hard disks in order to free the cache up again. As a result overall latency went up and throughput went down, and this affected every VM on every LUN of this array!

As the root cause of this we identified a single VM that fired up to 50.000(!) write I/Os per second. It was a MS SQL server machine that we recently virtualized. When it was on physical hardware it used locally attached hard disks that were never able to provide this amount of I/O capacity, but now - being a VM on high-performance SAN storage - it took every I/O it could have, monopolizing the storage array's cache and bringing it to its knees.

We found that we urgently needed to throttle that disk I/O hog, or it would severely impact the whole environment's performance. There are several means to prioritize disk I/O in a vSphere environment: You can use disk shares to distribute available I/Os among VMs running on the same host. This did not not help here: the host that ran the VM had no reason to throttle it, because the other VMs it was running did not require lots of I/Os at the same time. So, for the host there was no real need to fairly distribute the available resources.
Storage I/O Control (SIOC) is a rather new feature that allows for I/O prioritization at the datastore level. It utilizes the vCenter server's view on datastore performance (rather than a single host's view) and kicks in when a datastore's latency raises over a defined threshold (30ms by default). It will then adapt the I/O queue depth's of all VMs that are on this datastore according to the shares you have defined for them. Nice feature, but it did not help here either, because the I/O hog had a datastore on its own and was not competing with other VMs from a SIOC perspective ...

We needed a way to throttle the VM's I/O absolutely, not relatively to other VMs. Luckily there really is a way to do exactly this: It is documented in KB1038241 "Limiting disk I/O from a specific virtual machine". There are VM advanced-configuration parameters described here that allow to set absolute throughput caps and bandwidth caps on a VM's virtual disks. We did this and it really helped to throttle the VM and restore overall system performance!

By the way, the KB article describes how to change the VM's advanced configuration by using the vSphere client which requires that the VM is powered off. However, there is a way to do this without powering the VM off. Since this can be handy in a lot of situations I added a description of how to do this on the HowTo page.

Update (2011-08-30): In the comments of this post Didier Pironet pointed out that there are some oddities with using this feature and refers to his blog post Limiting Disk I/O From A Specific Virtual Machine. It features a nice video demonstrating the effect of disk throttling. Let me summarize his findings and add another interesting information that was clarified and confirmed by VMware Support:
  • Unlike stated in KB1038241 you can also specify IOps or Bps values (not only K, M or GIOps resp. K, M or GBps) for the caps (e.g. "500IOps"). If you do not specify a unit at all IOps resp. Bps is assumed, not KIOps/KBps like stated in the article.
  • The throughput cap can also be specified through the vSphere client (see VM properties / Resources / Disk), but not the bandwidth cap. This can even be done while the machine is powered on, and the change will become immediately effective.
  • And now the part that is the least intuitive: Although you specify the limits per virtual disk the scheduler will manage and enforce the limits on a per datastore(!) basis. That means:
    • If the VM has multiple virtual disks on the same datastore, and you want to limit one of them, then you must specify limits (of the same type, throughput or bandwidth) for all the virtual disks that are on the same datastore. If you don't do this, no limit will be enforced.
    • The scheduler will add up the limits of all virtual disks that are on the same datastore and will then limit them altogether by this sum of their limits. This explains Didier's finding that a single disk is limited to 150IOps although he defined a limit of 100IOps for this disk, but another limit of 50IOps for a second disk that was on the same datastore.
    • So, if you want to enforce a specific limit to only a single virtual disk then you need to put that disk on a datastore where no other disks of the VM are stored.

4 comments:

  1. Hi Andreas, as you found out there are situations where SIOC can't help.
    I've a blog post on that topic as well with some findings you might find useful: http://deinoscloud.wordpress.com/2011/06/23/limiting-disk-io-from-a-specific-virtual-machine/

    Rgds,
    Didier

    ReplyDelete
  2. Hi Andreas, we have similar performance problems.
    A question - how did you locate which virtual machine was causing the excessive I/O?

    Rgds,
    Henrik

    ReplyDelete
  3. Hi Henrik,

    our Storage guys were somehow able to find this out. The array (EMC Clariion) told them that the I/O that was flooding the cache came from a particular LUN, and since we had only one VM on this LUN/datastore it was easy for us.

    However, you should also be able to find this out using vCenter. With vSphere 4.1 you have performance statistics per datastore with nice "Top 10 VM"-views that will help figureing out what VMs do the most I/O.

    - Andreas

    ReplyDelete
  4. For information this doesn't work with an NFS Datastore... The limit isn't enforced.

    ReplyDelete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!