FAQ: Using SSDs with ESXi (Updated)


Most state-of-the-art enterprise storage architectures make use of SSD (Solid State Disk) storage in one or the other way, and - with the inevitably dropping prices - they have become affordable even for home use. What is their benefit? Since they are based on Flash memory SSDs offer a much higher throughput and much lower latency than traditional magnetic hard disks. I can well remember my delight when I equipped my home PC with an SSD for the first time and saw Windows booting ten times faster than before, in only a few seconds ... and I always wondered how VMware ESXi and the VMs it runs would benefit from SSD storage.

Well, a while ago I upgraded the two ESXi boxes that make up my Small Budget Hosted Virtual Lab (SBHVL) to include Intel Haswell CPUs, and one of them is also running with 2x Intel 240GB SSDs now. It's time to write about what I have learnt about ESXi and SSDs: In this blog post I will summarize how ESXi can make use of local SSDs in general, and specifically what you need to think about when using them as regular datastores.

How VMware ESXi can make use of local SSDs

ESXi can use locally attached SSDs in multiple ways:
  • as Host swap cache (since 5.0): You can configure ESXi to use a portion of an SSD-backed datastore as swap memory shared by all VMs. This is only useful if you plan to heavily overcommit the RAM of your host (e.g. in VDI scenarios). Swapping out a VM's memory to disk is the last resort if all other memory reclamation methods (like page sharing and memory compression) have already been fully utilized, and it will usually have a significant performance impact. However, swapping to SSD is less bad than swapping to hard disks and will reduce this impact.
    For details on how to configure the Host swap cache see the vSphere Resouce Management documentation.
  • as Virtual Flash (since 5.5): Since vSphere 5.5 you can format an SSD disk with the Virtual Flash File System (VFFS) and use it either as Host swap cache (see above) or as a configurable read and write-through cache for selected VMs. I consider the latter to be much more useful than a swap cache, because it allows you to use SSDs as a drive cache for a VM's virtual disks that are stored on regular hard disks. However, it requires Enterprise Plus licensing. The vSphere 5.5 Storage documentation includes details on how to set this up.
  • as part of a Virtual SAN (VSAN) (soon coming): VSAN is in public beta right now and will require vSphere 5.5 once it is generally available. It allows to combine the local storage of multiple ESXi hosts into a dynamic and resilient shared pool and even requires to include at least one SSD per host which is used as write buffer and read cache. For further information you can read Duncan Epping's introduction to VSAN and the white paper What's new in VMware Virtual SAN (VSAN).
  • as a regular datastore: Of course you can also just format your SSD disks with VMFS and use them as regular datastores for your VMs. This is fully supported by VMware, and the remainder of this post will focus on this usage scenario.

Checks and fakes ...

After you have built an SSD disk into your host you should at first check whether it was properly detected as SSD or not. In the vSphere Client you need to look at Host/Configuration/Storage/Devices. The Drive Type will be shown there either as non-SSD or SSD:
SSD display in vSphere Client
If you prefer the CLI to do this check then you can run
 esxcli storage core device list
to list all disk devices and their properties. If the output includes a line
  Is SSD: true
then the disk is properly detected as SSD.

In the rare event that an SSD is not properly detected you can use Storage claim rules to force its type to SSD. The vSphere 5 docs include detailed instruction on how to do this. This is also useful if you want to fake a regular hard disk to be an SSD, e.g. for testing purposes!

Once you have created a VMFS datastore on a properly detected (or faked) SSD disk and put a VM on this datastore its virtual disks will inherit the SSD property. That means the Guest OS will be able to detect a virtual disk residing on an SSD datastore as a virtual SSD disk and treat it accordingly. For this detection to work ESXi 5.x, VM hardware version 8+ and a VMFS5 datastore are required (see vSphere docs)!

Again, for testing purposes you can also fake a single virtual disk to appear as SSD (regardless of the underlying datastore's type) by setting a parameter like scsiX:Y.virtualSSD = 1 in the VM's configuration file. See William Lam's post about this for details.

Finally you may want to check if the Guest OS in the VM properly detects its virtual disk as SSD. This is important at least for modern Windows versions, because they then put various system optimizations in place. For Windows 7 (and 2008 R2) it looks like there is no easy way to tell if it has properly detected the SSD. You need to use an indirect way and check if the system optimizations have been applied or not - this MSDN blog post will help. With Windows 8 (and 2012) it is much easier. Open the Control Panel applet Defragment and optimise your drives and it will clearly list your drives' media types:

Defragment and optimise your drives in Windows 8

What about lifetime?

SSDs are supposed to have a limited lifetime (compared to hard disks), because their flash-based cells can only bear a certain number of (re)write cycles before they fail. Nevertheless most consumer grade SSDs are sold with a 5-years warranty - under the assumption that you write an average of max. 20 GB per day on to the disk.

That means you can estimate the (remaining) life time of an SSD disk by monitoring its write volume. The topic Estimate SSD Lifetime in the vSphere docs explains how to do this:
  1. Determine the device id of the SSD by listing the disks with
      esxcli storage core device list
  2. Display the statistics of the SSD by running
      esxcli storage core device stats get -d=device_id
Here is an example of the output:

Displaying disk statistics with ESXi
Unfortunately the vSphere docs do not mention what blocks the output is referring to and - most importantly - what size these blocks have. By testing I found out that one block is 256 bytes - that means you need to divide the above displayed number by 222 (= 4.194.304) to get the number of GB that was written to the disk. In the above example the 361.090.578 blocks translate to ~86 GB.

These statistics are reset to 0 when the host reboots, so just check its uptime with the uptime command and you will get an idea of how many GB were written per day in average.

Does ESXi support TRIM?

Since the write cycles of each Flash cell are limited the controller of an SSD will try to evenly distribute all writes over the complete disk. And it will carefully track what cells are already in use and what cells are no longer in use and can be overwritten. Now there is a problem: The controller has no awareness and understanding of the file system (e.g. Windows NTFS) that the OS uses on the disk and thus can not easily tell on its own whether a block is in use or not. As the number of known free Flash cells decreases the write performance of the SSD also decreases because it heavily depends on the number of cells that can be simultaneously written to.

To address this issue the ATA TRIM command was introduced many years ago. Modern Operating Systems use the TRIM command to inform the SSD controller when they delete a block so that it can add the associated Flash cell to its free list and knows that it can be overwritten.

So, does ESXi support TRIM? I tried really hard ..., but it looks like today you cannot find an official and reliable source clearly stating either that ESXi supports TRIM or not. Most non-VMware sources state (in blog posts etc.) that ESXi does not support TRIM, but without providing a reliable source.

However, while researching I found out that the SCSI equivalent of the ATA TRIM command is the UNMAP command, and this rang a bell in me: In vSphere 5.0 reclamation of VMFS deleted blocks with the help of SCSI UNMAP commands was introduced as part of the vStorage APIs for Array Integration (VAAI). When vSphere 5.0 was released this functionality was enabled by default (if the Storage Array supported it), but this was soon changed, because it had undesired side effects in some situations. Today VAAI space reclamation is a manual process that can be triggered by running the command
  vmkfstools -y nn

on a datastore. The VMware KB article 2014849 explains this in detail and also mentions how you can check whether this is supported on a disk or not: The command
 esxcli storage core device vaai status get -d device_id

will display the line
  Delete Status: Supported

if SCSI UNMAP can be used on a disk to perform space reclamation. And guess what: This is the case with the SSD disks that I have in my ESXi host! And consequently I was able to run vmkfstools -y on them!
But will this really fire TRIM commands to the SSD? With this question in my mind I used my Google-Fu again and finally stumbled over this french blog post by RaphaĆ«l Schitz who discovered the same and asks: SSD + VAAI = TRIM? Like me he is also unable to definitely answer this question ...

If someone of VMware reads this and is able to answer this question then I would be grateful to hear from him - please end our days of uncertainty about TRIM support in ESXi!! (Note: This has been confirmed! See Update at the end of this post).

What if ESXi does not support TRIM?

In the early days of SSDs TRIM support was very important to keep the drive healthy and fast. But nowadays' SSD controllers have become much more intelligent - they are able to detect unused pages on their own and free up Flash cells with a so-called Background Garbage collection process. So it is disputable whether TRIM is today really still needed or not. But hey, if Windows supports TRIM then ESXi should do so, too, right?! At least I would feel more confident then ...



Update (2014-07-23)

I brought up the question (whether a manual SCSI UNMAP triggers a TRIM on an SSD datastore) again in a forums thread discussing the current public beta of the next vSphere release (Please note: You need to be registered for the beta program to access this forum and read the thread). And there two VMware employees confirmed that this is true!

By the way, in ESXi 5.5 the "vmkfstools -y" command is now deprecated, and the new and better way to run SCSI UNMAP on a datastore is

 esxcli storage vmfs unmap -n reclaim-unit-size -l volume-label

This will run a series of unmap commands on the datastore with the label volume-label, and with each iteration reclaim-unit-size blocks will be reclaimed (the number defaults to 200). For more information refer to KB2057513, resp. KB2014849 for ESXi 5.0/5.1.

I have been running these commands on my SSD datastores from time to time since about 9 months now, and they still show a good performance. So, it doesn't harm at least ;-)




25 comments:

  1. Have you looking into TRIMcheck? The software will automatically verify whether TRIM is working within a Windows VM. The question is whether we could use the manual method of determining whether TRIM worked on VMFS volume. It may take some thinking but I think it can be done. It is just that no one seems to have done it yet.

    With that said, today's SSDs employ garbage collection routines in firmware. Some are better than others as can be seen in reviews over at Anandtech and discussion threads on Apple forums where posters are using third party SSDs to upgrade Mac hardware. This could also be an interesting post to determine how SSD performance fairs (or degrades) when used as, for example, a datastore over days, weeks, and months.

    ReplyDelete
    Replies
    1. Thanks for your comments! In VMs TRIM is definetely not available, I already checked that. The GuestReclaim fling can help here, but its usability is limited.

      The manual check method that you are refering to sounds interesting, you might get this to work on a VMFS volume inside an ESXi shell.

      According to the write performance of an SSD datastore decreasing or not ... I will check that from time to time on my host, esp. If "vmkfstools-y" will restore the performance or not.

      Delete
    2. Hello Andreas, great article about TRIM.
      Are you sure trims are not propagated from VMs to ESX?
      Then what use is there to run TRIMs in the first plase?
      Assume we have 1T of SSD, datastore1.
      We put an image on datastore1 that occupies 99% of all disc space.
      What will be the use for TRIMs in ESX?
      As far as I can see, TRIMs come to use if the guest-image could propagate empty disc space up to Vmware and vmware could trigger trims for this.
      But you say, it doesn't work like this?

      Delete
    3. Hi Anonymous,

      yes, I'm sure that currently TRIMs are not propagated from VMs to the host. So it is actually only useful to reclaim free space when you delete a complete VM (resp. VM disk) from the datastore.

      Andreas

      Delete
  2. I have a HP Bl 460c G8 server with 2 slots for local storage - can I use two SATA 120 Gb SSD to install ESXi 5.5 locally? If so, will there be any performance benefit as opposed to using two 146 GB SAS hard disks?
    PL advice.

    ReplyDelete
    Replies
    1. It doesn't really matter where ESXi itself is installed. IT will certainly boot faster from SSD, but once it's booted it's loaded into a RAM disk and does access the boot disk only for writing log files.
      However, if you create a VMFS datastore on the SSDs and store VMs on it, then these VMs will of course perform much faster than from regular SATA hard disks.

      Delete
  3. Excellent review, Andreas! I was looking into the exact same questions, and I'm happy to have stumbled across your page. Unfortunately I am still running ESXi 4. Which features would I miss if I added an SSD (RAID) data store as a regular VMFS data store? It sounds like I'm okay as long as the native background garbage collection of the SSD works. Thank you!

    ReplyDelete
    Replies
    1. Hi Anonymous,

      the manual TRIM by VAAI space reclamation will not work in ESXi 4, but I agree that this is nothing to really worry about when the SSD has a decent background garbage collection.

      Andreas

      Delete
  4. Google lists a preview of this document starting with "It's time to write about what I have learnt....". Learnt. LEARNT.

    Learned.

    ReplyDelete
    Replies
    1. Oh nice, a grammar nazi on my blog ...
      I'm not a native English speaker, but according to my sources both forms ("learnt" and "learned") can be used.

      I hope you found the contents of my post useful.

      Delete
    2. Yes. "Learnt" is another past tense, usually in European English.

      Delete
  5. Andreas,
    Thanks for posting your findings. I stumbled across this while researching SSD lifespan and found the data you provided very useful.

    -Eddie

    ReplyDelete
  6. There is a great deal of misinformation around concerning garbage collection (GC) on SSDs. GC is not a replacement for TRIM. TRIM helps GC by marking blocks that don't need to be copied to a new block before erasing a block (a.k.a performing a write on a block that has already contains data). Without TRIM, GC has no way of knowing what data is valid or invalid other than when the same logical block is overwritten. If you think about this it makes perfect sense. How can an SSD know about the metadata and file structures of every FS in existence? It simply can't!

    So a good GC algorithm may be able to mitigate the performance penalty incurred by having to read and rewrite blocks (valid and invalid FYI) with good timing, fast operation, etc, but it can't reduce write amplification and wear level as well without TRIM.

    ReplyDelete
  7. Andreas,
    do you know if these SCSI UNMAP commands are also working with a SAS RAID controller like the LSI Megaraids instead of a directly connected disk with a HBA or the internal mobo disk controller ? Often these kind of commands did not work because of the interfering RAID management...

    -Alex

    ReplyDelete
    Replies
    1. Hi Alex,

      indeed this will probably not work due to the extra SCSI layer.
      But just try to run the commands mentioned. If they throw an error or complete instantly then they probably don't work, but if they run for a while (20 to 60 sec. depending on the SSD size) then they are obviously doing something.

      Andreas

      Delete
    2. I would not expect something bad, but otherwise this could be a good DR test ;-) You have good backups in place anyway, right?!

      Delete
    3. And yes, you can run them online with running VMs on the same host and datastore. I do this all the time ...

      Delete
  8. Thanks Andreas for both this info and your ps script, great work love it!

    -Hendrix

    ReplyDelete
  9. Just my 2 cents:
    I have read your article and appreciate it a lot, and also tested immediately but found an incongruence.
    Reading this whitepaper from fujitsu http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-solution-specific-evaluation-of-ssd-write-endurance-ww-en.pdf you can see that here, at the top of page 7, they write that block size is 512 bytes, not 256.

    So I tested your commands and copying one file of 2GB on a idle local datastore in 2 different host esx, found that also for me, with vsphere 5.1 u1, 512bytes is the correct block size.

    Don't know if it may vary by riad controller (like it could be stripe size dependent) or by vmware version or also by disk type and haven't the time to test it all, have only verified it in my enviroments

    ReplyDelete
    Replies
    1. Hi there,

      yes, I also heard from one other guy that his testing resulted in a block size of 512 bytes. I repeated my tests after that and still ended up with 256 bytes. So both block sizes seem to be in use.

      I need to update my post to include this information.

      Thanks
      Andreas

      Delete
  10. Andreas hi,
    did you test already TRIM support in ESXi 6.0?
    Is this a good idea to put both esxi instalation and 2 VM's on samsung 850 pro 128GB ssd?
    Thank you in advance for reply

    ReplyDelete
    Replies
    1. Hi Ilya,

      I have not yet tested ESXi 6.0 with SSD datastores, but I think that it has not really changed compared to 5.5.

      Putting ESXi and VMs on the same SSD datastore will work (I do the same). It will give you nice performance, but you should keep in mind that a single non-RAIDed disk (no matter if hard disk or SSD) is always a single point of failure. So do back up!

      Andreas

      Delete
    2. Hi Andreas,

      thanks a lot for your efforts.

      I have found another useful post conerning Esxi 6 und Trim/Unmap, was very helpful too.
      http://www.purestorage.com/blog/direct-guest-os-unmap-in-vsphere-6-0-2/
      Especially the part with Windows, Trim und Unmap, also the display text in the windows defrag tool (SSD and thin disk).

      Best wishes

      Alex

      Delete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!