While browsing through the VMware Community Forums I stumbled over a thread that made me curious: A customer experienced serious issues after upgrading his vSphere environment to 5.5 Update 1. The ESXi hosts became unresponsive right after the Software iSCSI adapter was added.
Later in the thread it turned out that the customer used the ESXi 5.5 Update 1 Driver Rollup ISO for upgrading the hosts. This ISO shows up first in the list on the ESXi 5.5 Update 1 download page - it is meant to make installing new hosts easier, because it does not only include the latest Update of ESXi 5.5, but also a lot of additional and updated drivers that are not part of the vanilla ESXi 5.5 U1 ISO (see the README PDF on the same page).
Well, in this case it did not really make things easier, but caused the above mentioned issue. Lots of customers (incl. myself) can confirm that it will not appear if you use the regular ESXi 5.5 Update 1 ISO (resp. Offline Bundle) for installing or updating your hosts. So it was clear that a driver package that was added or updated in the Rollup ISO caused the issue.
I did not have to try and search very long to find the offending package: It is the Diablo MCS driver that renders the Software iSCSI adapter unusable. This is a driver for a quite exotic (but very interesting) piece of hardware: a DIMM-based Flash storage device that is named ULLtraDIMM and is "built on Diablo Technologies’ DDR3 translation protocol (known as Memory Channel Storage™, or MCS™)". This is a quote from a post by Cormac Hogan about this device.
After you uninstall the driver by running
esxcli software vib remove -n scsi-teradimm
in an ESXi shell and reboot the host the described issue will disappear.
This is not the first time that an ESXi Driver Rollup ISO causes issues (see my earlier warning about the ESXi 5.0 Rollup ISOs), and to me it is not really clear why VMware publishes these ISOs and what their intended use case is.
In my opinion you should never use a Driver Rollup ISO to install a new host (updating a host with such an ISO is not even supported!), because it adds a lot of overhead that you will probably never need. My recommendation for installing/updating hosts is:
1. If your hardware vendor supplies a customized ESXi installation ISO (resp. Offline Bundle) then use that to install or update your hosts (In some cases - e.g. certain HP ProLiant servers - this is even required to get all hardware devices properly working).
2. Otherwise use the VMware supplied regular installation ISOs or Offline Bundles.
3. If you need drivers for your hardware that are not included in the regular installation ISOs then add them after the base install as needed or create your own customized installation ISO with these drivers (see KB2005205 for instructions).
Tip: My ESXi-Customizer-PS script makes it easy to build ESXi images with the latest patch level and additional drivers included. Way better than using buggy Driver Rollup ISOs ;-)
Update (2014-04-08): Today VMware has published a new KB article KB2075171 titled Hostd becomes unresponsive after enabling software iSCSI when installed from the ESXi 5.5 Update 1 Driver Rollup ISO. According to this article it is not scsi-teradimm that causes the issue, but the Emulex scsi-be2iscsi driver.
I have net yet tested it (and I wonder why removing the scsi-teradimm VIB also resolves the issue then?!), but I trust VMware that they tested it more thoroughly this time ... So the real fix for this issue is to update the scsi-be2iscsi VIB to the version that is available here. Of course, removing this VIB should also resolve the issue.
Please note: The ESXi 5.5 Update 1 Driver Rollup ISO download is still the old one with the faulty driver. I wonder why VMware does not just replace it with an updated version?!
Another important note: A reader pointed out in the comments that you cannot use esxcli to uninstall (or update) a VIB once you are already affected by the issue, because esxcli uses the local hostd daemon to carry out its commands. So if hostd has crashed and is unavailable then you cannot run esxcli commands (this also applies to running esxcli commands remotely e.g. through PowerCLI).
A workaround is to use the localcli command instead of esxcli. According to the vSphere documentation localcli should only be used if hostd is not available and when advised by VMware Technical Support. You might have no other chance in this scenario than e.g. removing the scsi-be2iscsi VIB with the command
localcli software vib remove -n scsi-be2iscsi
This post first appeared on the VMware Front Experience Blog and was written by Andreas Peetz. Follow him on Twitter to keep up to date with what he posts.