VMwarewolf

Technical tidbits for VMware virtualized environments

ESX disconnects randomly or timeout after a long idle time

By: Rick Blythe | Date: January 9, 2009 | Categories: System Management, vCenter, Virtual Center, VMware, VMware ESX

There’s a new problem a few customers have reported whereby-

  • ESX disconnects randomly from VirtualCenter
  • ESX disconnects when performing VI Client tasks from VirtualCenter.
  • Tasks randomly timeout after a long idle time
  • “An error occurred communicating to the remote host” pops up.

This only refers to Update 3. This issue seem to be affecting customers with firewalls using state-ful inspection, such as a Juniper ISG 2000, Software Revision 6.1.0R3.

The problem occurs because of SOAP timeouts, and this behavior did not exist in VC 2.0.x or 2.5 GA, as they used a different mechanism to communicate with ESX.

A knowledgebase article is in the works, but Engineering is still actively investigating this problem. One work-around that I have seen mentioned (not verified to work!) is as follows:

  • Create a dummy VM on each host (e.g. 16 MB RAM, no disk, no network).
  • Set CPU affinity to the last core to prevent VMotion.
  • Create a new Scheduled Task that performs:
    • Change power state: Power-on dummy VM. Every hour, on the hour.
  • Create another Scheduled Task that performs:
    • Change power state: Power-off dummy VM. Every hour, 30 minutes after the hour.

Like I said, Engineering is aware of this issue and actively investigating it.

18 Responses to ESX disconnects randomly or timeout after a long idle time

  1. Tom says:

    Why does Update 3 have SO MANY BUGS????????????

  2. Pingback: VMware ESX U3 - VirtualCenter Potential Disconnect Issue | TechHead.co.uk

  3. dm330 says:

    We have the same problem out here with VC2.5u3 and ESX 3.0.3

    many disconnects troughout the clusters and with almost any action we get
    disconnect errors…

    very strange :-(

  4. Pingback: ESX disconnects randomly or when doing VI client tasks from VC, task randomly timeout after a long idle time | vmwarenews.de

  5. Pingback: vCenter tasks time-out or ESX disconnects? » Yellow Bricks

  6. John says:

    I have had this problem for ages! This specific cause might be U3 related but I am running U2 and have opened at least five support cases about this.

    It is always some “Unknown environmental cause” and I have to close them because I just don’t have time to run network traces all over north america.

    Any other ideas on what might cause this kind of behavior?

  7. C Stewart says:

    for your workaround, does the dummy VM have to be created on every host or just the host that your having problems with. Only one of our hosts is having this issue, funnily enough its the only one that has firewalls between it and the vCenter Server

  8. Pingback: VirtualPro » vCenter tasks time-out or ESX host disconnects

  9. VMwarewolf says:

    C Stewart,
    I don’t know the answer to that for sure, but to me it only would make sense to do this on the affected hosts.

  10. C Stewart says:

    Have set this up so will let you know how we get on. we will know quite quickly if it’s working as we have certain tasks that happen daily that are consistently erroring.

  11. C Stewart says:

    Hi Richard

    Just to let you know that the workaround has resolved the issue with the consistent failure we were having. Will keep an eye out for the permanent fix from VMware.

    Thanks

    C Stewart

  12. David Brightman says:

    Having this problem on a ‘solo’ upd 3 box, which is outside of virtual centre (our test env box, with out a licenses for vc!) i.e. direct vi session to the host
    It keeps dropping out my VI sessions… Argh! No VC, so no scheduled task… Any ideas on a work around? Perhaps a cronjob on the host?

  13. Matt says:

    I too have been struggling with this problem since U3. I’m now try VC U4 and I am still having it. I am using a Cisco ASA as my firewall. Per another posting I add this line to my configuration yesterday.
    sysopt connection tcpmss 0
    Today I added the dummys, but I still get the error. I have noticed each thing I try seems to help a little, but the problem will never go away.

  14. You’d think that the QA testing would catch stuff like that. If the problem really does stem from something as mundane as SPI firewalls, that’s pretty bad since many people can be expected to be using those. Can’t wait to see what VI4 has in store bug-wise!

  15. Matt says:

    I hear there is a patch for this problem. I is cuased by SOAP timeouts. You need to open a service request with VMware to get it. I can’t becuase I get my support through HP and I figured there would be a new vCenter released before I could get a patch through them. Now if someone was to get the patch from VMware support and send it to me I would gladly test it for them.

  16. ArildS says:

    We have this problem on a brand new ESX 4.0 server and VC 4.0.
    My server disconnects every few minutes. While it was in test with no load it worked flawlessly, but as soon as I Vmotioned my VMs to it it started disconnecting. Opening a support case for that SOAP patch now :(

  17. B Hanna says:

    I had the same problem with 4 ESX servers using v-center. one server keep disconnecting.

    The way it solved is by connecting to the ESX server which has the problem using putty

    and ran
    /etc/init.d/vmware-vpxa stop
    /etc/init.d/vmware-vpxa start

  18. Barry K says:

    Same issue just started happening for me with one server running vSohere and previous post appears to have resolved… Thanks