How to vMotion from Intel to AMD - and why not to do it.

Have you ever been bugged by a vMotion Compatibility check error that you couldn't really explain e.g. while you tried to put a host into maintenance mode for doing some urgent work?

Have you ever dreamed of live migrating VMs from a cluster of hosts with Intel CPUs to another cluster of hosts with AMD CPUs (in a hardware migration scenario or just for fun)?

About vMotion, vMotion Compatibility and EVC

For me Live Migration (aka vMotion) is still one of the most impressive features of the vSphere platform - although it has been around for quite a while now. Why? Because - like no other technology - it demonstrates the superiority of virtual machines over physical servers, their independence from the underlying physical hardware. It enables machine mobility and theoretically eternal uptime.

But the hardware independence is not complete: Unlike other devices CPUs are not emulated by the hypervisor, but passed through to the VM, utilizing hardware virtualization features that are built into the processor - in partly vendor specific ways.
As a consequence vMotion does only work between similar CPUs of the same vendor - by default.

VMware vCenter (the component that controls vMotion operations) puts a lot of effort in ensuring vMotion compatibility between source and target hosts whenever a Live Migration task is about to be performed (be it manually or automatically through DRS). And by default it will prevent a vMotion task from being started if it finds CPUs to be incompatible.

To improve interoperability between different CPU models of the same vendor VMware introduced EVC (Enhanced vMotion Compatibility) when releasing their Virtual Infrastructure 3.5 product back in 2008. With EVC you can group hosts with different CPUs in a cluster and enable vMotion between them. It works by masking CPU features that are not available on all cluster nodes, presenting only the greatest common denominator of CPU features (the so-called EVC baseline) to the VMs. For detailed information please refer to the EVC and CPU compatibility FAQ article in the VMware KB.

How to relax and disable vMotion Compatibility checks

Although it is not recommended sometimes it is desirable to relax or completely disable the vMotion compatibility checks. This is possible by configuring certain advanced vCenter settings. These are set through the legacy vSphere Client (for vSphere versions prior 5.1) in the Administration / vCenter Server Settings / Advanced Settings menu, or (since vSphere 5.1) in the new Web Client (see vCenter / Manage / Settings / Advanced Settings).

Configure vCenter Advanced Settings in the legacy vSphere Client

1. Adding the key config.migrate.test.CpuCompatibleWithHost with value false will completely disable all compatibility checks. This is the brute force method, and I would really not recommend doing this, because this will also suppress any warnings to be shown.

2. Adding the key config.migrate.test.CpuCompatibleMonitorSupport with value false will only disable checking the VMM (Virtual Machine Monitor) on the source and target hosts for supported CPU features (preventing any "product version does not support features" error messages).

3. Adding the key config.migrate.test.CpuCompatibleError with value false will display all compatibility check errors as warnings only that do not prevent starting the migration (still not recommended, but at least you have been warned).

The settings take effect immediately, you do not need to restart any vCenter services! To revert to the default behavior you need to change the values of these options to true (you cannot remove them again once they have been added).

Please note: Suppressing vMotion Compatibility checks by these means is not supported by VMware! If you ever talk to someone of VMware Support about this they will even deny flatly that these options exist.

Why not to do this in production

Of course there is a reason why the vMotion Compatibility checks exist and why VMware does not support disabling them: Different CPUs provide different sets of advanced CPU instructions and features. If an application or the OS running inside a VM is using a certain CPU feature and is then migrated to a host with a different CPU that does not provide this feature ... guess what will happen: The application or even the whole OS inside the VM will crash.

Examples are Multimedia or compute intensive applications that use SSE extensions. These extensions are even used by Operating Systems' kernel code. The software RAID code of certain Linux kernels e.g. do a quick benchmark at boot time to determine the most effective method for computing check sums: Imagine it decides to use SSE3 extensions for calculating RAID5 check sums. If you happen to actually use software RAID5 in such a machine and vMotion it to a host that does not provide this CPU feature then this will certainly result in an instant kernel panic, hopefully without data loss or corruption.

However, there are still situations where disabling vMotion Compatibility checks might come in handy, even in production! When I updated a cluster of ESXi 5.0 hosts with a minor patch a while ago I was not able to put the last remaining host of the cluster into maintenance mode. vCenter was refusing to migrate a handful of VMs to any of the other (already updated) hosts in the cluster, throwing a compatibility check error that I could not really explain. Looking at these VMs' event logs I was able to verify that they were successfully migrated by DRS between different hosts in the past, so it was obviously the patch that caused this issue.
I opened a case with VMware support hoping that this is a known issue and they would be able to provide a quick fix. They could not, so I ended up in scheduling a downtime for the remaining VMs to be able to complete the hosts' update in time.
This would not have been necessary if I knew the options mentioned above at this time.

Credits

This goes back to a VMware Communities post by Jon Munday (Many thanks, Jon! Great work!). It was the first time that I found working instructions on how to disable vMotion compatibility checks in current vCenter versions - and I had been looking for this for a very long time.

After I read this I could not wait to test it in the lab, and yes: Using the third of the above mentioned options I was really able to vMotion VMs from a host with DualCore AMD Opteron 8220 CPUs to another host with QuadCore Intel Xeon E5440 CPUs! I tested this with a Windows Server 2003 VM and a SLES11 Linux VM, and both survived the cross vendor live migration without a system crash or other obvious hick-ups. I didn't really expect this ...

Probably I was just lucky. You have been warned!




7 comments:

  1. This is seriously some good info! Was looking for this for quite some time now.

    Just when I started toying around with it again, you happen to post this! :) I was finally able to migrate a low-priority dumb vm (live) from old AMD hardware to newer Intel cluster.

    Someone had to take it out of the lab!

    ReplyDelete
  2. Great post!
    I'm curious how do you identify, within a VM, the applications, processes, threads, etc. that are currently using or will require certain vendor's features such Intel' SSE's?

    ReplyDelete
  3. Hi Andreas,

    Great article, thanks for the credit! I've posted a summary on my blog and linked back to this for the full experience;

    http://www.jonmunday.net/2013/04/18/suppressing-cpu-compatibility-checks-in-vsphere-5-x/

    Cheers,
    Jon

    ReplyDelete
  4. I made the changes, but I am still seeing the errors. Does EVC also need to be enabled for the settings to take effect?

    ReplyDelete
    Replies
    1. Hi Eric,

      I don't think that EVC is required for this. What version of vSphere are you using? (I haven't tested with 5.5) And what error messages do you get?

      Andreas

      Delete
    2. Andreas,

      I am using vSphere 5.1 with vCenter Web Client 5.5. I am attempting to vMotion between an AMD and Intel cluster. Error: The target host does not support the virtual machine's current hardware requirements.
      To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.

      To your point it may be related to vCenter 5.5

      Thanks for your help.


      ~Eric

      Delete
  5. Yep, on 5.5 as well. These custom attributes do not work for migrating between amd/intel.

    ReplyDelete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!