How to do an Online Virtual Connect firmware upgrade

Okay, this is a follow-up to my previous post ... I was finally able to find out on my own how to do this. The answer is in HP's white paper "HP Virtual Connect Firmware Upgrade Steps and Procedures". This is a must read for anyone being concerned with the VC firmware upgrade process, I will try to summarize the most important points here.

You must use the Virtual Connect Support Utility (VCSU). The current version is 1.60 and is available for download here.

It helps to understand how the VCSU does the upgrade: First it uploads the new firmware to all VC modules simultaneously. This phase is absolutely uncritical, because the VC modules continue working normally during the upload. If you use the default parameters it will then activate the new firmware by rebooting the VC modules one after the other in a controlled manner - and this is the process that really impacts the network availability of your hosts and VMs!
Why? The controlled reboot takes 20 or more seconds, and - of course - the VC module will not properly forward and receive network traffic during that time. However, the blade servers, resp. their NICs that are connected to this module are not properly disconnected during that time, i.e. they do not get a link down notification! If you use the default failover detection method for your virtual switches (Link state only) the hosts will continue using the up-links to the module that is just rebooting, and this results in a loss of network connectivity.

So, how do you cope with that? One possible work around is to use Beacon probing as the failover detection method for the virtual switches. But in my opinion this is not the best and easiest choice. No, the real answer is on page 13 of the white paper:
"For the customer environments where changing Network Failover Detection options or HA settings is not possible, utilizing VCSU manual firmware activation order (-of manual) is recommended. In this case, modules will be updated but not activated and the user will need to perform manual activation by resetting (rebooting) modules via OA GUI or CLI interface. This option will eliminate potential of up to 20 sec network outage that may occur on a graceful shutdown of VC Ethernet and FlexFabric modules."
Using the manual activation order (parameters "-oe manual" and "-of manual") ensures that the VCSU will not gracefully reboot the VC modules at all. You then need to do that on your own (just manual), by resetting the VC modules through the Onboard Administrator (OA). When you do a hard reset of a VC module the connected hosts will immediately get link down notifications, just as if the module suddenly fails or loses all its own up-links because the external switch failed. You should just wait about 5 minutes for the resetted module to get fully online before you reset the second one.

If your ESX(i) hosts are properly and redundantly configured you will notice only a minimal network interruption during this process. In my test it was just a single ping drop.

Yes, that's the whole secret of doing an online VC firmware upgrade! For me only one questions remains: Why is HP making it so hard to find this information? If you search for instructions on how to do this you will find tons of useless and contradicting information on this topic, and even their own Support engineers are not able to give a quick and right answer to the question. At least, one of them sent me a copy of the white paper (he could not just provide a link to it, because he was not able to find it on the HP pages...).

HP Virtual Connect firmware update - can you do this online?

I don't know the answer to this question, but I'm trying to find this out ...

We have two HP c7000 enclosures with Virtual Connect FlexFabric modules to connect to external Cisco Ethernet switches and Brocade FC switches. Both enclosures are fully loaded with 8x BL620c G7 blade servers running ESXi 4.1 Update 2.
Right now we are still able to completely evacuate an enclosure if we want to do maintenance (mainly firmware upgrades) on it, because we have stretched two clusters over both enclosures that each have not more than 50% of their capacity used.

However, given our current VM growth rate we will soon reach a point where this will be no longer possible (without purchasing and deploying a third enclosure). So, I'm currently testing and looking for ways to do an online Virtual Connect firmware upgrade without interrupting network and SAN connectivity. With all the redundancy that is in the enclosure this should be possible, and an HP engineer I lately talked to confirmed that this is indeed possible using HP's Virtual Connect Support Utility (VCSU), and he pointed me to its manual for instructions.

I remember that I already tried this method a while ago. I don't know the firmware and tool versions anymore that I did this test with, but it was not very successful. Although I followed the instructions given I noticed ping timeouts for up to 15 seconds during the upgrade process (I was pinging the hosts VMkernel address).

I just started a thread in the VMTN forums to get some input from others. Has anyone done this successfully? Is there anything to check and configure that is not obvious before trying this? Please share your experience by posting to the VMTN thread or leaving a comment here. Thanks!

Once I have found a working method I will of course update this post!

Update (2011-12-21): I found it ... Read about it in my next post!