VMware vSphare High Availability (HA):-
Network Load
Balancing (NLB) clustering:- The Network Load Balancing configuration
involves an aggregation of servers that balances the requests for applications
or services. In a typical NLB cluster, all nodes are active participants in the
cluster and are consistently responding to requests for services. If one of the
nodes in the NLB cluster goes down, client connections are simply redirected to
another available node in the NLB cluster. NLB clusters are most commonly
deployed as a means of providing enhanced performance and availability.
Example- NLB on IIS server, ISA server, VPN server etc.
Windows Failover
Clustering (WFC):- it is used solely for the sake of availability. Server clusters
or WFC do not provide performance enhancements outside of high availability. In
a typical server cluster, multiple nodes are configured to be able to own a
service or application resource, but only one node owns the resource at a given
time. Each node requires at least two network connections: one for the
production network and one for the cluster service heartbeat network between
nodes. A common datastore is also needed that houses the information accessible
by the online active node and all the other passive nodes. When the current active
resource owner experiences a failure, causing a loss in the heartbeat between
the cluster nodes, another passive node becomes active and assumes
ownership of the resource to allow continued access with minimal data loss.
Raw device mapping
(RDM):- An RDM is a combination of direct access to a LUN, and a normal
virtual hard disk file.
An RDM can be configured in either Physical Compatibility mode or Virtual Compatibility mode. The Physical Compatibility mode option allows the VM to have direct raw LUN access. The Virtual Compatibility mode, however, is the hybrid configuration that allows raw LUN access but only through a VMDK file acting as a proxy.
An RDM can be configured in either Physical Compatibility mode or Virtual Compatibility mode. The Physical Compatibility mode option allows the VM to have direct raw LUN access. The Virtual Compatibility mode, however, is the hybrid configuration that allows raw LUN access but only through a VMDK file acting as a proxy.
So, why choose one over the other? Because the
RDM in Virtual Compatibility mode uses a VMDK proxy file, it offers the
advantage of allowing snapshots to be taken. By using the Virtual Compatibility
mode, you will gain the ability to use snapshots on top of the raw LUN access
in addition to any SAN-level snapshot or mirroring software.
Cluster with
Windows Server 2008 VMs:-
Cluster in a Box:- The clustering of two VMs on
the same ESXi host
Cluster across Boxes- The clustering of two VMs that
are running on different ESXi hosts.
Physical to Virtual Clustering- The clustering of a physical
server and a VM together.
FDM- vSphere HA uses a new VMware-developed tool known as Fault Domain
Manager (FDM) for supporting HA.
What
is VMware HA?
As per VMware Definition:-
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity.
As per VMware Definition:-
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity.
What
are pre-requites for HA to work?
1.Shared storage for the VMs running in HA cluster
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
AAM- Earlier versions of vSphere used Automated Availability Manager
(AAM), which had a number of notable limitations, like a strong dependence on name resolution and scalability limits.
What is the command to restart /Start/Stop HA
agent in the ESXi host?
#
/etc/init.d/vmware-fdm stop
#
/etc/init.d/vmware-fdm start
#
/etc/opt/init.d/vmware-fdm restart
Where to locate HA related logs in case of
troubleshooting?
/var/log/fdm.log
HA-MASTER- When vSphere HA is enabled,
the vSphere HA agents participate in an election to pick a vSphere HA master.
The vSphere HA master is responsible for a number of key tasks within a vSphere
HA–enabled cluster. If the existing master fails, a new vSphere HA master is
automatically elected. The new master will then take over the responsibilities
listed here, including communication with vCenter Server.
HA-Slaves- Once an ESXi host in a vSphere HA–enabled cluster
elects a vSphere HA master, all other hosts become slaves connected to that
master.
HA
Master's responsibilities:-
monitors slave
hosts:-
sends heartbeat
messages to the
slave hosts:-
manages addition
and removal of Hosts:-
monitors the power
state of VMs:-
reports state
information to vCenter Server:-
keeps list of
protected VMs:-
notifies cluster
configuration change to slave hosts:-
HA
Slave Host's responsibilities:
HA master's health check:-
implement some
vSphere HA features like local vm's health check:-
watches local VM's runtime states:-
watches local VM's runtime states:-
network partition:-
"Network
partition" is
the term used to describe the situation in which one or more slave hosts cannot
communicate with the master even though they still have network connectivity between themselves.
In this case, vSphere HA is able to use the heartbeat datastores to detect
whether the partitioned hosts are still live and whether action needs to be
taken to protect VMs on those hosts.
network isolation:-
Network isolation is the situation in which one or
more slave hosts have lost all management network connectivity. Isolated hosts
can neither communicate with the vSphere HA master nor communicate with other
ESXi hosts.
datastore heart-beating:-
In this
case, the slave host uses heartbeat datastores to notify the master that it is
isolated. The slave host uses a special binary file, the host-X-poweron file,
to notify the master. The vSphere HA master can then take the appropriate
action to ensure that the VMs are protected.”
What is the maximum number is of hosts per HA
cluster?
Maximum
number of hosts in the HA cluster is 32
How is the Host Isolation is detected?
In HA cluster, ESXi hosts
uses heartbeats to communicate among other hosts in the cluster. By
default, Heartbeat will be sent every 1 second.
If a master ESXi
host in the HA enabled cluster didn’t received heartbeat from any
other hosts in the cluster then
the master host assumes that the slave host may be in isolated state. It then checks
that the slave host is capable of pinging its configured isolation
address(default gateway by default) or not. If the ping fails, VMware HA will
execute the Host isolation response.
In VMware
vSphere 5.x, if the agent which fails is from a master host, then isolation is
declared in 5 seconds. If it is a slave, isolation is declared in 30 seconds.
In vmware 5.x then master host uses another technique
to check live-ness of the slave hosts in the cluster before declaring it as
isolated. It is called datastore heartbeating. Datastore heartbeating is used to
determine whether the slave host has failed, is in a network partition, or is
network isolated. If the slave host has stopped datastore heartbeating, it is
considered to have failed and its virtual machines are restarted elsewhere.
Vsphare HA requirements:-
Same shared
storage for all hosts:-
Identical virtual
networking configuration:-
Do HA uses vMotion to transfer live VM's to other HA hosts when source Hosts fails?
No because HA restarts VMs to other Hosts when source hosts fails. It is not live migration and involves few minutes of downtime.
Do HA uses vMotion to transfer live VM's to other HA hosts when source Hosts fails?
No because HA restarts VMs to other Hosts when source hosts fails. It is not live migration and involves few minutes of downtime.
Vsphare Height
Availability admission control :-
It control the behavior Of the vSphere HA–enabled
cluster with regard to cluster capacity or cluster tolerance. Specifically,
should vSphere HA allow the user to power on more VMs than it has capacity to
support in the event of a failure?
Or should the cluster prevent more VMs from being
powered on than it can actually protect? That is the basis for the admission control
— and by extension, the admission control policy — settings.
Admission Control has two settings:-
Enable: Disallow VM
power-on operations that violate availability constraints.
Disable: Allow VM
power-on operations that violate availability constraints.
Admission control
policy:-
When Admission Control is enabled, the Admission
Control Policy settings control its behavior by determining how much resources
need to be reserved and the limit that the cluster can handle and still be able
to tolerate failure.
Vmware HA isolation
response:-When an ESXi host in a vSphere HA–enabled cluster is isolated —
that is, it cannot communicate with the master host nor can it communicate with
any other ESXi hosts or any other network devices — then the ESXi host triggers
the isolation response settings configured. The default isolation response is "Leave
Powered On". it can be Shut Down or Power Off also.
High
Availability VM Monitoring:- vSphere HA has the ability to look for guest OS and
application failures. When a failure is detected, vSphere HA can restart the VM
or the specific application. The foundation for this functionality is built
into the VMware Tools which provide a series of heartbeats from the guest OS up
to the ESXi host on which that VM is running. By monitoring these heartbeats in
conjunction with disk and network I/O activity, vSphere HA can attempt to
determine if the guest OS has failed.
vSphere Fault Tolerance:- vSphere Fault Tolerance (FT) is the evolution of
“continuous availability” that works by utilizing VMware vLockstep technology
to keep a primary machine and a secondary machine in a virtual lockstep. This
virtual lockstep is based on the record/playback technology. vSphere FT will
stream data that will be recorded, and then replayed. By doing it this way,
VMware has created a process that matches instruction for instruction and
memory for memory to get identical results on the secondary VM. So, the record
process will take the data stream from primary VM, and the playback will
perform all the keyboard actions and mouse clicks on the secondary VM.
Perquisites or Requirements of vSphere Fault Tolerance:-
Cluster
level
Same FT version or build number on at least 2
host:-
HA must be enabled:-
VMware EVC must be enabled:-
ESXi
host level
vSphere FT compatible CPUs:-
Hosts must be licensed for vSphere FT.
Hardware Virtualization (HV) must be enabled:-
Access to the same datastores:-
vSphere FT logging network with at least Gigabit
Ethernet connectivity:-
VM
level
VMs with a single vCPU:-
Supported guest OS's:-
VM files on share storage:-
Thick provisioned (eagerzeroedthick) or a Virtual mode
RDM
No VM snapshots:-
No NIC passthrough or
the older vlance NIC driver:-
No Paravirtualized
kernel:-
No USB devices, sound
devices, serial ports, or parallel ports:-
No mapped CD-ROM or floppy devices:-
No N_Port ID Virtualization:-
No Nested page
tables/extended page tables (NPT/EPT):-
Not a linked clone VM:-
Operational
changes or recommendations for FT:-
Power management must be turn off in the host BIOS:-
No sVmotion or sDRS for vSphere FT:-
No Hot-plugging
devices:-
No Hardware Changes:- No
Hardware Changes Includes No Network Changes.
NO snapshots based
backup solutions:-
What the basic troubleshooting steps in case
of HA agent installs failed on hosts in HA cluster?
1. Check for
some network connectivity issues.
2. Check the
DNS is configured properly.
3. Check HA related ports are open in firewall to allow for the
communication.
Ex-
8182- (TCP/UDP) (Inbound/outbound )Traffic
between hosts for vSphere High Availability (vSphere HA)
4.Troubleshoot FDM :-
A.> Verify that all the configuration files of the FDM agent
were pushed successfully from the vCenter Server to your ESXi host:
Location: /etc/opt/vmware/fdm
Location: /etc/opt/vmware/fdm
File Names:
clusterconfig (cluster configuration),
compatlist (host compatibility list for virtual machines),
hostlist(host membership
list), and
fdm.cfg.
B.> Search the log files for any error message:
/var/log/fdm.log or /var/run/log/fdm* (one log file for FDM operations)
/var/log/fdm.log or /var/run/log/fdm* (one log file for FDM operations)
/var/log/fdm-installer.log (FDM agent installation log)
5. Check the
network settings like port group, switch
configuration, etc are properly configured and named exactly as other hosts in the cluster.
6. First try to restart /stop/start the VMware HA agent on the affected
host using the below commands. In addition u can also try to restart
vpxa and management agent in the Host.
#
/etc/init.d/vmware-fdm stop
#
/etc/init.d/vmware-fdm start
#
/etc/opt/init.d/vmware-fdm restart
7. Right
Click the affected host and click on “Reconfigure for VMWare HA” to re-install the HA agent that particular
host.
8. Remove the affected host from the cluster.
Removing ESXi host from the cluster will not be allowed untill that
host is put into maintenance mode.
Alternative
solution for 3 step is, Goto cluster settings and uncheck the vmware HA in to turnoff the HA in that
cluster and re-enable the vmware HA to get the agent installed.
No comments:
Post a Comment