The status of SMP Fault Tolerance

VMware FT (Fault Tolerance) has been around for a while an enables you to protect a VM in a way that allows for a seamless non-disruptive fail over to a copy of it running on another host. The execution of the primary and the secondary copy of an FT-protected VM is synchronized through the vLockstep protocol by sending all input data over a dedicated FT logging link to the host that runs the secondary copy and replaying it there. If the primary host fails for whatever reason the secondary host will detect this and instantly activate its own copy of the VM to become the primary one - with no loss of data or network connections.


This is a very cool feature, but today it has a major limitation: It only works with uniprocessor VMs (VMs with 1 vCPU). And that means that a lot of workloads that run in VMs can not be protected by FT, because they require multiple vCPUs. Of course a lot of people were asking (and keep asking) for this limitation to be lifted, and we know that VMware is actively working on this since a while. But they also told us that this feature (called SMP FT) imposes big challenges and needs great engineering efforts to be implemented (Bob Plankers wrote a nice blog post about the reasons for this).

At this year's VMworld conferences there was a session about the status of these efforts: VMware vSphere Fault Tolerance for Multiprocessor Virtual Machines - Technical Preview and Best Practices, and I will summarize it here. One thing to always keep in mind though is that anything that was presented in this session reflects the current status of work running in the VMware Labs, with no guarantee that it will be ever available in a released product!:

It turned out that the vLockstep protocol is not suitable for handling multiple vCPUs, so VMware developed a new protocol from scratch: the SMP FT protocol. An FT logging link is still needed, and it is even required that it is a 10 Gbit link now (as opposed to the 1 Gbit requirement of the current FT implementation). However, the increased bandwidth is not only used to synchronize multiple vCPUs, it is also used for lifting another limitation of the current FT implementation: the requirement for shared storage between the primary and secondary VM is gone. Yes, with SMP FT the primary and secondary VM are completely independent from each other and both use their own set of virtual hard disks, and these are mirrored through the FT logging link. The two hosts that run the primary and secondary VM still need to have shared storage though, but this is only needed as an additional heartbeat method to determine what VM should become (or stay) primary if the logging link gets broken (similar to the heartbeat datastores of today's improved HA/FDM implementation). Great news: this means that SMP FT will not only protect against host failures, but also against storage failures. You might already utilize other and more efficient methods to synchronically mirroring VMs' storage (like array based replication), and this is why VMware also looks into making storage mirroring an optional feature of SMP FT. Anyway, this new feature also has another important benefit: You can now use snapshots (e.g. for backups) on FT-enabled VMs!

During the session a demo was shown of a VM with 4 vCPUs and 16 GB of RAM that was protected by SMP FT. It was running a workload that probably a lot of people would love to be protected by FT (including myself): The VMware vCenter server. The demo made clear why a 1 Gbit link is not enough anymore for FT logging. The utilized bandwidth climbed up to 1.3 Gbit/s when the CPU load went up in the protected VM, and it might climb even higher depending on the I/O load and the number of vCPUs (4 is not the maximum, it is only limited theoretically by the available bandwidth). When the host that ran the primary VM was forcefully rebooted the secondary VM took over seamlessly - as expected.

So, why is the new SMP FT still not generally available? Well, one reason might be that VMware is not yet comfortable with the performance that you can get out of a FT protected SMP-VM: Depending on the workload it is only between 55 and 75% of the same VM without FT protection. I wonder if the VMware engineers will be able to significantly raise this value, and if this is even possible with the current implementation. Maybe it requires yet another major re-engineering effort? Remember: this is a technical preview at its current status and maybe might never make it into a released product ...

... which brings us to the last and most important question: When will SMP FT be generally available? ;-) This question was of course not answered during the session, but - given the fact that a lot of people already expected it in vSphere 5.1 - I am confident that we will see it in the next major vSphere release at the latest. Maybe in a year or so. It will be very interesting to see how this development continues and what will the final result will look like.


2 comments:

  1. I thought that uniprocessor FT was implemented not by sending and replaying every CPU instruction on the secondary VM but rather by replaying a synchronized mirror of input data (without sending output anywhere) there.

    ReplyDelete
    Replies
    1. You are right. I corrected my post accordingly.
      Thanks
      - Andreas

      Delete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!