ESX disconnects randomly or when doing VI client tasks from VC, task randomly timeout after a long idle time
There’s a new problem a few customers have reported whereby-
- ESX disconnects randomly from VirtualCenter
- ESX disconnects when performing VI Client tasks from VirtualCenter.
- Tasks randomly timeout after a long idle time
- “An error occurred communicating to the remote host” pops up.
This only refers to Update 3. This issue seem to be affecting customers with firewalls using state-ful inspection, such as a Juniper ISG 2000, Software Revision 6.1.0R3.
The problem occurs because of SOAP timeouts, and this behavior did not exist in VC 2.0.x or 2.5 GA, as they used a different mechanism to communicate with ESX.
A knowledgebase article is in the works, but Engineering is still actively investigating this problem. One work-around that I have seen mentioned (not verified to work!) is as follows:
- Create a dummy VM on each host (e.g. 16 MB RAM, no disk, no network).
- Set CPU affinity to the last core to prevent VMotion.
- Create a new Scheduled Task that performs:
- Change power state: Power-on dummy VM. Every hour, on the hour.
- Create another Scheduled Task that performs:
- Change power state: Power-off dummy VM. Every hour, 30 minutes after the hour.
Like I said, Engineering is aware of this issue and actively investigating it.
Related posts:Live Cloning
Failed to install the VirtualCenter Agent Service
How does the VMware HA work
VMware VI and vSphere SDK: Managing the VMware Infrastructure and vSphere
VMware View Client requires HTTP 1.1 to be enabled in Internet Explorer
System Management, VMware, VMware ESX, Virtual Center, vCenter January 9th. 2009, 11:00am
VMwarewolf
January 9th, 2009 at 11:31 am
Why does Update 3 have SO MANY BUGS????????????
January 13th, 2009 at 1:25 am
[...] Knowledge Base article regarding this which apparently should be out soon. Check out his blog posting here for more [...]
January 28th, 2009 at 8:25 am
We have the same problem out here with VC2.5u3 and ESX 3.0.3
many disconnects troughout the clusters and with almost any action we get
disconnect errors…
very strange
February 5th, 2009 at 8:19 am
[...] ESX disconnects randomly or when doing VI client tasks from VC, task randomly timeout after a long i… [...]
February 5th, 2009 at 8:39 am
[...] no apparent reason at all. He discovered the following article by Richard Blythe aka VMware Wolf: ESX disconnects randomly or when doing VI client tasks from VC, task randomly timeout after a long i…. Richard created a list of issues/errors that might be related to this [...]
February 5th, 2009 at 10:58 am
I have had this problem for ages! This specific cause might be U3 related but I am running U2 and have opened at least five support cases about this.
It is always some “Unknown environmental cause” and I have to close them because I just don’t have time to run network traces all over north america.
Any other ideas on what might cause this kind of behavior?
February 5th, 2009 at 5:00 pm
for your workaround, does the dummy VM have to be created on every host or just the host that your having problems with. Only one of our hosts is having this issue, funnily enough its the only one that has firewalls between it and the vCenter Server
February 5th, 2009 at 5:10 pm
[...] no apparent reason at all. He discovered the following article by Richard Blythe aka VMware Wolf: ESX disconnects randomly or when doing VI client tasks from VC, task randomly timeout after a long i…. Richard created a list of issues/errors that might be related to this [...]
February 6th, 2009 at 9:43 am
C Stewart,
I don’t know the answer to that for sure, but to me it only would make sense to do this on the affected hosts.
February 8th, 2009 at 1:05 pm
Have set this up so will let you know how we get on. we will know quite quickly if it’s working as we have certain tasks that happen daily that are consistently erroring.
February 9th, 2009 at 2:05 pm
Hi Richard
Just to let you know that the workaround has resolved the issue with the consistent failure we were having. Will keep an eye out for the permanent fix from VMware.
Thanks
C Stewart
February 24th, 2009 at 11:26 am
Having this problem on a ’solo’ upd 3 box, which is outside of virtual centre (our test env box, with out a licenses for vc!) i.e. direct vi session to the host
It keeps dropping out my VI sessions… Argh! No VC, so no scheduled task… Any ideas on a work around? Perhaps a cronjob on the host?
February 27th, 2009 at 2:10 pm
I too have been struggling with this problem since U3. I’m now try VC U4 and I am still having it. I am using a Cisco ASA as my firewall. Per another posting I add this line to my configuration yesterday.
sysopt connection tcpmss 0
Today I added the dummys, but I still get the error. I have noticed each thing I try seems to help a little, but the problem will never go away.
April 11th, 2009 at 7:57 pm
You’d think that the QA testing would catch stuff like that. If the problem really does stem from something as mundane as SPI firewalls, that’s pretty bad since many people can be expected to be using those. Can’t wait to see what VI4 has in store bug-wise!
June 10th, 2009 at 3:44 pm
I hear there is a patch for this problem. I is cuased by SOAP timeouts. You need to open a service request with VMware to get it. I can’t becuase I get my support through HP and I figured there would be a new vCenter released before I could get a patch through them. Now if someone was to get the patch from VMware support and send it to me I would gladly test it for them.
July 30th, 2009 at 3:48 am
We have this problem on a brand new ESX 4.0 server and VC 4.0.
My server disconnects every few minutes. While it was in test with no load it worked flawlessly, but as soon as I Vmotioned my VMs to it it started disconnecting. Opening a support case for that SOAP patch now
February 3rd, 2010 at 1:01 pm
I had the same problem with 4 ESX servers using v-center. one server keep disconnecting.
The way it solved is by connecting to the ESX server which has the problem using putty
and ran
/etc/init.d/vmware-vpxa stop
/etc/init.d/vmware-vpxa start
February 17th, 2010 at 1:55 pm
Same issue just started happening for me with one server running vSohere and previous post appears to have resolved… Thanks