VMware HA issue Again
(2011-06-16 02:11:11)
标签:
it |
分类: 工作 |
After running the VMware health check report, one of the host has HA issue shows up on the report:
“ESX4.0Host1" received “HA agent on ESX4.0Host1 in cluster Production Farm 1 in Production has an error : Unknown HA error.
As normal, I did reconfigure it and disable/re-enable the HA from the cluster, it fails on me.
Googled and found a KB1007234 from VMware site, performed the steps in the KB (http://kb.vmware.com/kb/1007234):
a.Disable VMware HA on the cluster.
Deselect Enable VMware HA in the Cluster Settings, and click OK.
b.Put the ESX host into maintenance mode.
Right-click the Managed Host icon in the Inventory panel and choose Enter Maintenance Mode.
c.Drag the host icon to a new location outside the
cluster.
d.Log in as root with SSH to the ESX host.
e.Run this command:
In ESX Classic – /opt/vmware/aam/bin/VMware-aam-ha-uninstall.sh
If this fails and complains about files in use:
i.Stop the services by running the command:
service vmware-aam stop
ii.Delete all remaining files by running the command:
rm -rf /opt/vmware/aam
iii.Start the services by running the command:
service vmware-aam start
iv.Restart the vCenter Server Agent (vpxa) by running the command:
service vmware-vpxa restart
f.Drag the host icon back into the cluster.
g.Take the managed host out of maintenance mode.
To take the managed host out of maintenance mode, right-click the host and click Exit Maintenance Mode.
h.Repeat Steps 5-10 for each of the other managed hosts in the
cluster.
i.Enable HA on the cluster.
To enable HA on the cluster, select Enable VMware HA in the Cluster Settings and click OK
It does not bring back the HA for that host.
Called Support and did the following and make it works again:
First try: failed:
rpm -e VMware-vpxa-#.#.#-#####
rpm -e VMware-aam-haa-#.#.#-#
rpm -e VMware-aam-vcint-#.#.#-#
- removed the host from VC
- added the host back into VC
- re-able HA on the cluster
Error:
Configuring HA
ESX4.0Host1
Cannot complete the
configuration of the HA
agent on the host. Other
HA configuration error.
Second try worked:
- added bypassNetCompatCheck
- HA now works on the cluster
This issue occurs if all the hosts in the cluster do not share
the same service console or management network configurations. Some
hosts may have service consoles using a different name or may have
more service consoles than other hosts.
For example, this error may also occur if the VMkernel gateway
settings are not the same across all hosts in the cluster. To
reconfigure the setting, right-click on the hosts with this error
and select Reconfigure for HA.
Address the network configuration differences between the hosts if
you are going to use the Shut Down or Power Off isolation responses
because these options trigger a VMware HA isolation in the event of
Service Console or Management Network failures.
If you are using the Leave VM Powered on isolation response, the
option to ignore these messages is available in VMware
VirtualCenter 2.5 Update 3.
To configure VirtualCenter to ignore these messages, set the
advanced option das.bypassNetCompatCheck to true:
Note: When using the das.bypassNetCompatCheck option, the heartbeat
mechanism during configuration used in VirtualCenter 2.5 only pairs
symmetric IP addresses within subnets across nodes. For example, in
a two node cluster, if host A has vSwif0 “Service Console”
10.10.1.x 255.255.255.0 and vSwif1 “Service Console 2” 10.10.5.x
and host B has vSwif0 “Service Console” 10.10.2.x 255.255.255.0 and
vSwif1 “Service Console 2” 10.10.5.x, the heartbeats only happen on
vSwif1. Starting in vCenter Server 4.0, they can be paired across
subnets if pings are allowed across the subnets. However, VMware
recommends having them within subnets.
- das.bypassNetCompatCheck = True