When configuring replication for a VM you specify a target RPO (Recovery Point Objective) that can range between 15 minutes and 4 hours. I chose the minimum for most of my VMs, and this means that vCenter (resp. the vSphere Replication Appliance) should replicate changed data so frequent and fast that the state of the replica lags behind the primary VM for at most 15 minutes. It depends on a number of factors whether this is always successful or not:
The data change rate, overall disk activity, the available network bandwidth, etc. - In my lab setup I noticed that the RPO was violated at some times (at least once per day), but only by 1 minute. You can track that in the Events view of the vSphere Client:
|vSphere Replication RPO events in the vSphere client|
A great (and often underestimated) way to monitor your vSphere environment for any kind of events is setting up a vCenter alarm. You can do this with both the new Web Client and the legacy C# Client. Like many of us I have a hard time to get used to the Web Client, because I am just so familiar with the legacy client and know how to get things done quickly with it. I thought that this is a good opportunity to familiarize with the new client, and tried to create the alarm with it. However, I found out that it is just not possible in this special case, because of the advanced options that I needed.
I will come back to this later and explain in more detail - while we go through the process of creating the alarm with the legacy Client:
1. Open the Settings dialog of a new alarm from the context menu of the Alarms view:
|Create a vCenter alarm - General section|
2. Change to the Triggers tab of the dialog:
|Create a vCenter alarm - Define trigger events|
There are a number of ways to do this ... If you are familiar with PowerCLI then the quickest way is to look at the properties of the Event objects using the cmdlet Get-VIEvent and appropriate filters. But I want to point you to another resource that I have not yet found mentioned in other places:
The vCenter server program directory ("%ProgramFiles%\VMware\Infrastructure\VirtualCenter Server") contains a lot of text files with the suffix .vmsg that contain the localized values of all string resources of the vCenter server software. E.g. the file locale\en\event.vmsg contains all vCenter built-in event definitions in English (additional languages are stored in their own sub directories: de, fr, ja, ko, zh_CN).
VCenter extension specific resources can be found in the sub directories under extensions: The file that we finally need to look at is extensions\com.vmware.vcHms\locale\en\event.vmsg. If you search for the relevant event message text (e.g. the string violated) in this file then you will find these lines:
com.vmware.vcHms.rpoViolatedEvent.category = "error" com.vmware.vcHms.rpoViolatedEvent.description = "RPO violated" com.vmware.vcHms.rpoViolatedEvent.formatOnComputeResource = "" com.vmware.vcHms.rpoViolatedEvent.formatOnDatacenter = "" com.vmware.vcHms.rpoViolatedEvent.formatOnHost = "" com.vmware.vcHms.rpoViolatedEvent.formatOnVm = "" com.vmware.vcHms.rpoViolatedEvent.fullFormat = "Virtual machine vSphere Replication RPO is violated by [data.currentRpoViolation] minute(s)" com.vmware.vcHms.rpoRestoredEvent.category = "info" com.vmware.vcHms.rpoRestoredEvent.description = "RPO restored" com.vmware.vcHms.rpoRestoredEvent.formatOnComputeResource = "" com.vmware.vcHms.rpoRestoredEvent.formatOnDatacenter = "" com.vmware.vcHms.rpoRestoredEvent.formatOnHost = "" com.vmware.vcHms.rpoRestoredEvent.formatOnVm = "" com.vmware.vcHms.rpoRestoredEvent.fullFormat = "Virtual machine vSphere Replication RPO is no longer violated"Here we learn that the RPO violation and restore events have the internal names com.vmware.vcHms.rpoViolatedEvent and com.vmware.vcHms.rpoRestoredEvent, and we can use these as custom event triggers. The violation event will set the status Alert (or Red), and the restoration event will revert the status to Normal (or Green) and clear the alarm again.
Important note: When using the new Web Client for creating the alarm you will find the "RPO violated" and "RPO restored" events in the dropdown menu for the triggers, so you would not have to look up their identifiers like we did above. But on the other hand you can only choose entries from the dropdown list here and can not type a custom identifier at all. This is a clear limitation of the Web Client GUI, and we will also stumble over this in the next step:
3. Set an advanced condition for the Alert trigger
Remember we only want an alert to be triggered if the RPO was violated by more than 1 minute! So we need to define an advanced trigger condition by clicking on the Advanced... link in the picture above. This will open the following dialog:
|Create a vCenter alarm - Advanced trigger conditions|
By the way: In the Web Client we have the same limitation for the Operator, and what's worse: Even the Argument can only be chosen from a dropdown list that only includes some generic arguments (like VM Name), but not currentRpoViolation. This means that we can not define this specific advanced trigger condition in the Web Client!
4. Define the alarm action
The last step is to define the alarm action, and this needs to be done in the Actions tab of the Alarm Settings dialog:
|Create a vCenter alarm - Define actions|
This alarm works fine for me, but I'm still puzzled about the lack of a "greater than" comparison operator and other GUI restrictions. Maybe we can overcome them by creating this alarm via PowerCLI rather than using a GUI? But this will be the topic of another post ...