Troubleshooting VM network performance using vsish


Long time, no post ... I'm currently busy with several support requests that I opened with VMware. These keep me busy but they are also good opportunities to learn something new.

One of the cases involves troubleshooting the network performance of Lync 2013 servers running on top of vSphere. It turned out that the VMkernel system information shell (better known as vsish) is a great tool for this.

vsish runs in the ESXi shell and lets you look at (and partly manipulate) a lot of advanced system parameters and performance information of the ESXi host and associated objects like the VMs that it runs. William Lam has some posts about vsish - I suggest you read them if you want to learn more details.

In the following walk-through we will look at the performance counters of a VM's vmxnet3 network adaper:

As a prerequisite we need to find out the name of the port group that the VM is attached to and its port ID. An easy way to get this information is to launch the esxtop utility and look at its network display (Press n to get there):


Find the name of the VM in the USED-BY column and take a note of the port ID displayed in the PORT-ID column (in this example: 16777266) and the internal port group name in the DNAME column (DvsPortset-0).

Now we are ready to launch vsish and look at this specific port:


Using vsish is like navigating through a Unix filesystem tree that is full of text files. You use cd to change to different branches, ls to display nodes and sub-branches and cat to display the contents of nodes.

cat status e.g. reveals some details about the port's configuration. The clientName: entry shows us that this port indeed belongs to the VM that we are interested in. And we learn that this vNIC is of type vmxnet3.


The stats node shows us that the vNIC is dropping a high number of received packages. That is bad ... can we find out more? Let's switch to the vNIC type specific sub-branch vmxnet3 and look at the rxSummary node there:


These values show us that the vmxnet3 driver is running out of buffers quite a few times, and that the 1st ring ran full 34 times already.

Let's go further down the rabbit hole and look at the receive (rx) queue(s) of the adapter:


It turns out that there is only one receive queue - that means RSS is not enabled on this adapter. RSS stands for Receive Side Scaling and allows having multiple receive queues for a network adapter which are handled by different CPUs.

We can also see that the ring sizes of the receive queue are 512, resp. 32 (which are the default values).

The ring sizes, the number of buffers and the RSS features of the vmxnet3 adapter are parameters that can be configured from inside the VM's guest OS (Windows in this case):

vmxnet3 driver's advanced properties
Some guidelines are in KB2039495 (Large packet loss at the guest OS level on the VMXNET3 vNIC in ESXi). KB2008925 (Poor network performance or high network latency on Windows virtual machines) talks a bit about RSS, but mainly refers to an associated Microsoft Technet article.

Please note that there is not one right set of settings that will give you the optimal network performance for your VM! It very much depends on the configuration, workload and performance characteristics of your VM and the load on your overall virtual infrastructure. We are still struggling with this and have not yet found optimal settings in this specific case.

As always I hope that you found this information useful. This little walk-through is far from being a complete overview, so I encourage you to navigate through vsish yourself and discover the plethora of information that it reveals!



This post first appeared on the VMware Front Experience Blog and was written by Andreas Peetz. Follow him on Twitter to keep up to date with what he posts.



2 comments:

  1. Hi, We have an issue with Lync 2013 running on ESXi 6.0 experiencing a high level of UDP packet drops.
    We see the quality of a video call bad during the first few minute and then quality stabilizes.
    there is a buffering happening somewhere when the size of the video stream exceeds a certain threshold.
    Do you know if there are any settings on the vmxnet3 adapter that could explain this buffering ?

    ReplyDelete
    Replies
    1. Hi Anonymous,

      I don't think that this kind of behavior can be caused by some vmxnet3 driver settings. The ring buffer sizes and RSS settings are everything that we were playing with.

      For Lync it seems to be very important to follow VMware's guidelines for latency sensitive workloads, esp. regarding host CPU power management settings. If you have not already done this then you should definitely take a look at that.

      Andreas

      Delete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!