Serviceguard allows you to create
high availability clusters of HP 9000 or HP Integrity servers. A high
availability computer system allows application
services to continue in spite of a hardware or software failure.
Highly available systems protect users from software failures as well
as from failure of a system processing unit (SPU), disk, or local
area network (LAN) component. In the event that one component fails,
the redundant component takes over. Serviceguard and other high availability
subsystems coordinate the transfer between components.
A Serviceguard cluster is a networked
grouping of HP 9000 or HP Integrity servers (host systems known
as nodes) having
sufficient redundancy of software and hardware that a single
point of failure will not significantly
disrupt service. Application services (individual HP-UX processes)
are grouped together in packages; in
the event of a single service, node, network, or other resource
failure, Serviceguard can automatically transfer control of the
package to another node within the cluster, allowing services to
remain available with minimal interruption.
In Figure 1-1 “Typical
Cluster Configuration ”, node 1
(one of two SPU's) is running package A, and node 2 is running package
B. Each package has a separate group of disks associated with it,
containing data needed by the package's applications, and a mirror
copy of the data. Note that both nodes are physically connected
to both groups of mirrored disks. However, only one node at a time
may access the data for a given group of disks. In the figure, node
1 is shown with exclusive access to the top two disks (solid line),
and node 2 is shown as connected without access to the top disks
(dotted line). Similarly, node 2 is shown with exclusive access
to the bottom two disks (solid line), and node 1 is shown as connected
without access to the bottom disks (dotted line).
Mirror copies of
data provide redundancy in case of disk failures. In addition, a
total of four data buses are shown for the disks that are connected
to node 1 and node 2. This configuration provides the maximum redundancy
and also gives optimal I/O performance, since each package is using
different buses.
Note that the network hardware is cabled to provide redundant
LAN interfaces on each node. Serviceguard
uses TCP/IP network services for reliable communication among nodes
in the cluster, including the transmission of heartbeat
messages, signals from each functioning node
which are central to the operation of the cluster. TCP/IP services also
are used for other types of inter-node communication. (The heartbeat
is explained in more detail in the chapter “Understanding Serviceguard
Software.”)
Failover |
 |
Under normal conditions, a fully operating Serviceguard cluster
simply monitors the health of the cluster's components while the
packages are running on individual nodes. Any host system running
in the Serviceguard cluster is called an active node. When you create the package,
you specify a primary node and one or more adoptive
nodes. When a node or its
network communications fails, Serviceguard can transfer control
of the package to the next available adoptive node. This situation
is shown in Figure 1-2 “Typical
Cluster After Failover ”.
After this transfer, the package typically remains on the
adoptive node as long the adoptive node continues running. If you
wish, however, you can configure the package to return to its primary
node as soon as the primary node comes back online. Alternatively,
you may manually transfer control of the package back to the primary
node at the appropriate time.
Figure 1-2 “Typical
Cluster After Failover ” does not show the
power connections to the cluster, but these are important as well.
In order to remove all single points of failure from the cluster,
you should provide as many separate power circuits as needed to
prevent a single point of failure of your nodes, disks and disk mirrors.
Each power circuit should be protected by an uninterruptible power
source. For more details, refer to the section on “Power
Supply Planning” in Chapter 4, “Planning and Documenting
an HA Cluster.”
Serviceguard is designed to work in conjunction with other
high availability products, such as MirrorDisk/UX or VERITAS Volume Manager,
which provide disk redundancy to eliminate single points of failure
in the disk subsystem; Event Monitoring Service (EMS), which lets
you monitor and detect failures that are not directly handled by Serviceguard;
disk arrays, which use various RAID levels for data protection;
and HP-supported uninterruptible power supplies (UPS), such as HP
PowerTrust, which eliminates failures related to power outage. These
products are highly recommended along with Serviceguard to provide
the greatest degree of availability.