Updated: [ALERT] Issue with the ESXi 5.5 U1 Driver Rollup ISO: Software iSCSI adapter crashes hostd


While browsing through the VMware Community Forums I stumbled over a thread that made me curious: A customer experienced serious issues after upgrading his vSphere environment to 5.5 Update 1. The ESXi hosts became unresponsive right after the Software iSCSI adapter was added.

Later in the thread it turned out that the customer used the ESXi 5.5 Update 1 Driver Rollup ISO for upgrading the hosts. This ISO shows up first in the list on the ESXi 5.5 Update 1 download page - it is meant to make installing new hosts easier, because it does not only include the latest Update of ESXi 5.5, but also a lot of additional and updated drivers that are not part of the vanilla ESXi 5.5 U1 ISO (see the README PDF on the same page).

Well, in this case it did not really make things easier, but caused the above mentioned issue. Lots of customers (incl. myself) can confirm that it will not appear if you use the regular ESXi 5.5 Update 1 ISO (resp. Offline Bundle) for installing or updating your hosts. So it was clear that a driver package that was added or updated in the Rollup ISO caused the issue.

I did not have to try and search very long to find the offending package: It is the Diablo MCS driver that renders the Software iSCSI adapter unusable. This is a driver for a quite exotic (but very interesting) piece of hardware: a DIMM-based Flash storage device that is named ULLtraDIMM and is "built on Diablo Technologies’ DDR3 translation protocol (known as Memory Channel Storage™, or MCS™)". This is a quote from a post by Cormac Hogan about this device.

After you uninstall the driver by running
   esxcli software vib remove -n scsi-teradimm

in an ESXi shell and reboot the host the described issue will disappear.

This is not the first time that an ESXi Driver Rollup ISO causes issues (see my earlier warning about the ESXi 5.0 Rollup ISOs), and to me it is not really clear why VMware publishes these ISOs and what their intended use case is.

In my opinion you should never use a Driver Rollup ISO to install a new host (updating a host with such an ISO is not even supported!), because it adds a lot of overhead that you will probably never need. My recommendation for installing/updating hosts is:

1. If your hardware vendor supplies a customized ESXi installation ISO (resp. Offline Bundle) then use that to install or update your hosts (In some cases - e.g. certain HP ProLiant servers - this is even required to get all hardware devices properly working).

2. Otherwise use the VMware supplied regular installation ISOs or Offline Bundles.

3. If you need drivers for your hardware that are not included in the regular installation ISOs then add them after the base install as needed or create your own customized installation ISO with these drivers (see KB2005205 for instructions).

Tip: My ESXi-Customizer-PS script makes it easy to build ESXi images with the latest patch level and additional drivers included. Way better than using buggy Driver Rollup ISOs ;-)

Update (2014-04-08): Today VMware has published a new KB article KB2075171 titled Hostd becomes unresponsive after enabling software iSCSI when installed from the ESXi 5.5 Update 1 Driver Rollup ISO. According to this article it is not scsi-teradimm that causes the issue, but the Emulex scsi-be2iscsi driver.

I have net yet tested it (and I wonder why removing the scsi-teradimm VIB also resolves the issue then?!), but I trust VMware that they tested it more thoroughly this time ... So the real fix for this issue is to update the scsi-be2iscsi VIB to the version that is available here. Of course, removing this VIB should also resolve the issue.

Please note: The ESXi 5.5 Update 1 Driver Rollup ISO download is still the old one with the faulty driver. I wonder why VMware does not just replace it with an updated version?!

Another important note: A reader pointed out in the comments that you cannot use esxcli to uninstall (or update) a VIB once you are already affected by the issue, because esxcli uses the local hostd daemon to carry out its commands. So if hostd has crashed and is unavailable then you cannot run esxcli commands (this also applies to running esxcli commands remotely e.g. through PowerCLI).

A workaround is to use the localcli command instead of esxcli. According to the vSphere documentation localcli should only be used if hostd is not available and when advised by VMware Technical Support. You might have no other chance in this scenario than e.g. removing the scsi-be2iscsi VIB with the command

   localcli software vib remove -n scsi-be2iscsi


This post first appeared on the VMware Front Experience Blog and was written by Andreas Peetz. Follow him on Twitter to keep up to date with what he posts.



7 comments:

  1. I completely agree, these Rollup ISOs just cause confusion. You cannot use them for upgrading, but on fresh installations they frequently cause major issues. I tried the 5.5U1 Rollup for a fresh kickstart installation and the host would not even boot. Error messages indicated it could not find/read the bootbank. WTF?
    Plus: why are the VIBs inside named differently from the regular ISOs? They replace underscores with dashes, for example.

    ReplyDelete
    Replies
    1. The VIB package names are the same. Just the file names in the ISO sometimes use underscores and sometimes dashes. I guess that is caused by creating the ISO with different file name compatibility settings. This should not cause any issue.

      Is your kickstart installation working with the regular ISO?

      Delete
  2. Probably. I just don't see why someone would accept such an inconsistency. Wondering what happened to the QA at VMware.
    And yes, the regular ISO works perfectly, no issues whatsoever.

    ReplyDelete
  3. Andreas i have the same problem but with one host in production environment.
    What to do if "esxcli software vib remove -n scsi-teradimm" is not working at all? I get msg: Connect to localhost failed: Connection failure

    Regards
    Greg

    ReplyDelete
    Replies
    1. Hi Greg,

      thanks for your comment! I just updated my post, and it does also include an answer to your question now.

      Andreas

      Delete
  4. Hi Andreas, are there possibly any other vibs that cause trouble? How do I identify them? Regards, Axel

    ReplyDelete
    Replies
    1. Hi Axel,

      not that I know of. Nevertheless I would not recommend using the Rollup ISO.
      Use hardware vendor specific custom ISOs (like HP's) if that is applicable to you, or the regular ISO and add just the drivers that you really need.

      Andreas

      Delete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!