Annoying IP dual stack issue with VMware ESXi


While testing direct host access to the new V-Front Online Depot (via esxcli) I stumbled over an annoying issue that I was finally able to resolve, but it was hard to find the root cause ... So I want to share my findings here in the hope to make life easier for all the others that will - very likely - stumble over the same issue.

After opening the  firewall for outgoing http(s) requests using
   esxcli network firewall ruleset set -e true -r httpClient

you can try to access the Online Depot in an ESXi shell with commands like
  esxcli software sources vib list -d http://vibsdepot.v-front.de
  esxcli software vib install -d http://vibsdepot.v-front.de -n package-name

If you experience the issue then these commands will work as expected, but they will take a very very long time (>10 minutes!) to execute and return. Of course there are other and more obvious root causes for slow network access: a slow or saturated up-link to the Internet, improper NIC speed negotiation settings etc.

But in my case it had something to do with IPv6 ...

I am a somewhat early adopter of IPv6 (although I do not really think that you can be too early with this nowadays), so the web site that hosts the vibsdepot is publicly reachable by both IPv4 and IPv6. You can check that by just using nslookup with the site's name vibsdepot.v-front.de on an IPv6 enabled machine - it should return both addresses.

A freshly installed ESXi host will have IPv6 enabled (since ESXi 5.1), regardless of whether your network is IPv6 ready or not. If it is not (and I still expect that to be the normal case) then the only IPv6 address that the host will get is a so-called link-local address starting with fe80:: (see picture above). This type of address is mandatory in IPv6, but it can only be used to communicate within the local network segment.

Python to blame

The esxcli command and the modules that it uses are written in Python, and that programming language (resp. its libraries used for IP communication) has an annoying issue with IP dual stacks: It will prefer to use IPv6 over IPv4 (normally this is good!) for all IP comunication even if the machine has only received a link-local address! That means: any http(s)-request to vibsdepot.v-front.de (or any other IPv6-enabled machine) will first be tried via IPv6. This will fail, because you cannot reach outside Internet servers using a link-local address. The second try will then use IPv4 and usually succeed. And now the part that makes it really bad:
  • The timeout to detect the IPv6 connection failure is ridiculously long (120 sec.?)
  • With each new connection the same is tried again. It will not "learn" (resp. cache) negative connection results ...
... and querying an Online Depot will result not only in a single http-request, but in a lot of them ... So, the timeouts will add up!

Please note: This is a problem of Python, not the ESXi IP stack! However, it looks like the busybox shell of ESXi (and its builtin commands like wget) show the same issue, but with a much shorter timeout (15 sec.?).

How to solve?

The solution is easy: If you do not have a working IPv6 setup in your network then disable IPv6 on the ESXi host! It is a best practice anyway to disable anything that you do not use, but there is another best practice that tells you to not change any system defaults if you do not have any issues. So most ESXi beginners probably still have IPv6 enabled on their hosts although they don't use it.

But I'm not sure about that ... So, please comment and tell me if you are experiencing this issue! If there are a lot of people bothered by that then I will probably go and disable IPv6 access to vibsdepot.v-front.de. I would hate doing this, but it would at least instantly solve the vibsdepot access issue for anyone.


This post first appeared on the VMware Front Experience Blog and was written by Andreas Peetz. Follow him on Twitter to keep up to date with what he posts.



3 comments:

  1. Hi Andreas,
    I never tried an online depot on IPv6, but I always disable IPv6 on every ESXi installation, at least because there is no need to consume additional resources to manage an unused part of the kernel.
    Nice at some point this (excessive?) zeal I have seems to pay.

    Thanks for the article,
    Luca.

    ReplyDelete
  2. I agree that any unused service should be disabled to reduce the surface area for attack. I use kickstarts to build out ESXi servers - for those wondering what the esxcli command is:

    esxcli network ip set --ipv6-enabled=false

    ReplyDelete
  3. If you have IPv6 Enable on your host, you will see it in your view admin console. Your desktops will show agent unreachable and your host will assign IPv6 to your VM's first then try IPv4. This causes a problem with production as we all know the desktop needs to come in 3 secs or its a failure of a product. so if you are getting agent unreachable and you can't figure it out. disable ipv6 on your host.

    ReplyDelete

***** All comments will be moderated! *****
- Please post only comments or questions that are related to this post's contents!
- Advertising and link spamming will not be tolerated!