How to list vSphere 5.0 HA restarts with PowerCLI

VMware HA usually does a good job restarting VMs in case of an ESX(i) host failure. Imagine this happened at night - you come into office the next morning and want to know what VMs were restarted (or eventually failed to restart) because of the HA event.

You can find that out by looking at the vCenter event log, but if this happened several hours (or even longer) ago the vSphere client will no longer display the associated events. Even if the events are still displayed you will have a hard time to find them and compile a list of restarted VMs.

To cope with this situation I wrote a small PowerCLI script that searches the vCenter event log, finds the relevant entries and prints the list of VMs that were restarted during the latest HA event. It works with vSphere 5.0 only. Here it is:
param(
    [string]$vcenter = "localhost",
    [int]$last = 24,
    [switch]$help = $false
)

$maxevents = 250000

"`nScript to generate list of successful and failed VM restart attempts after an HA host failure."
if ($help) {
"Optional parameters:
-help:               display this help
-vcenter servername: connect to vCenter server servername (default is localhost)
-last n:             analyze events from the last n hours (default is 24)
"
exit
} else {
"(Use -help for list of parameters)"
}

$stop = get-date
$start = $stop - (New-TimeSpan -Hours $last)

if (!(get-pssnapin -name "VMware.VimAutomation.Core" -ErrorAction SilentlyContinue )) { add-pssnapin "VMware.VimAutomation.Core" }

write-host "`nConnecting to vCenter server $vcenter ..."
Connect-VIServer $vcenter | out-null

write-host "`nGetting all events from $start to $stop (max. $maxevents) ..."
$events = Get-VIEvent -Start $start -Finish $stop -MaxSamples $maxevents

write-host Got $events.Length events ...

write-host -nonewline "`nSearching for host failure events ..."
$ha = @()
$events | where-object { $_.EventTypeID -eq "com.vmware.vc.HA.DasHostFailedEvent" } | foreach { $ha += $_ }
write-host (" found " + $ha.Length + " event(s).")
if ($ha.Length -eq 0) {
    write-host "`nNo host failure events found in the last $last hours."
    write-host "Use parameter -last to specify number of hours to look back.`n"
    exit
} else {
    write-host ("`nLatest host failure event was " + $ha[0].ObjectName + " at " + $ha[0].CreatedTime + ".")
}

$events = $events | where-object { $_.CreatedTime -ge $ha[0].CreatedTime }

write-host "`nList of successful VM restarts:"
$events | where-object { $_.EventTypeID -eq "com.vmware.vc.ha.VmRestartedByHAEvent" } | foreach {
    write-host $_.CreatedTime: $_.ObjectName
}

write-host "`nList of failed VM restarts:"
$failures = @{}
$events | where-object { $_.FullFormattedMessage -like "vSphere HA stopped trying*" } | foreach {
    $vmname = $_.FullFormattedMessage.Split(" ")[6]
    if (!($failures.ContainsKey($vmname))) {
        $failures.Add($vmname,$_.CreatedTime)
        write-host $_.CreatedTime: $vmname
    }
}

Disconnect-VIServer -Force -Confirm:$false
The script takes three optional parameters:
  • -vcenter servername: Connect to the vCenter server named servername (default is localhost)
  • -last n: Analyze events of the last n hours (default is 24). If e.g. the HA event was during a weekend, and you run this script on Monday you may need to raise this to as much as 72. Please note: The higher n is the longer the script will run and the more memory it will consume!
  • -help: display help on parameters
Usually you may want to specify at least the vCenter server name. If you have only one you can of course hard code it into the script as the default value (instead of localhost) in line 2.

How does it work?

As a first step the script reads all events of the last n (default: 24) hours and searches them for host failure events (event id "com.vmware.vc.HA.DasHostFailedEvent", see line 36). If there were multiple host failures in this time frame it will only look at the latest one and discard all earlier events (line 46).

To find out what VMs were restarted the script looks for events of id "com.vmware.vc.ha.VmRestartedByHAEvent". It will print the time stamps of these events and the names of the VMs that were restarted (line 49 to 51).

At last the script looks for events messages that start with the text "vSphere HA stopped trying". These events are thrown when HA fails to restart a VM multiple times in a row. With vSphere 5 HA will try very hard and repeatedly to restart a VM, so you might see this event multiple times for each failing VM. The script records the failing VMs in a hashed array in order to print their names only once, together with the time stamp of the latest failing restart attempt.


2 comments:

  1. Very helpful script, thank you!

    ReplyDelete
    Replies
    1. Hi, excellent script most helpful - Just a quick one (hopefully) what would be the best way to export the results to a txt file?

      Delete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!