In order to provide a high level of availability, a typical
cluster uses redundant system components, for example two or more
SPUs and two or more independent disks. This redundancy eliminates
single points of failure. In general, the more redundancy, the greater
your access to applications, data, and supportive services in the
event of a failure. In addition to hardware redundancy, you must
have the software support which enables and controls the transfer
of your applications to another SPU or network after a failure.
MC/LockManager provides this support as follows:
In the case of LAN failure, MC/LockManager switches
to a standby LAN or moves affected packages to a standby node.
In the case of SPU failure, OPS instances on other
SPUs continue to function, and user applications can be transferred
from a failed SPU to a functioning SPU automatically and in a minimal
amount of time.
For failure of other monitored resources, such as
disk interfaces, a package can be moved to another node.
For software failures, an application can be restarted
on the same node or another node with minimum disruption.
MC/LockManager also gives you the advantage of easily transferring
control of your application to another SPU in order to bring the
original SPU down for system administration or maintenance.
The
current maximum number of nodes supported in an MC/LockManager cluster
is 8. Fast/Wide SCSI disks or disk arrays can be connected to a
maximum of 4 nodes at a time on a shared (multi-initiator) bus.
Disk arrays using fibre channel and those that do not use a shared
bus — such as the EMC Symmetrix — can be simultaneously
connected to all 8 nodes.
The guidelines for package failover depend on the type of
disk technology in the cluster. For example, a package that accesses
data on a Fast/Wide SCSI disk or disk array can failover to a maximum
of 4 nodes. A package that accesses data from a disk in a cluster
using Fibre Channel or EMC Symmetrix disk technology can be configured
to failover to 8 nodes.
Note that a package that does not access
data from a disk on a shared bus can be configured to failover to
however many nodes are configured in the cluster (regardless of
disk technology). For instance, if a package only runs local executables,
it can be configured to failover to all nodes in the cluster that
have local copies of those executables, regardless of the type of
disk connectivity.
HP 9000 Systems |
 |
The nodes in an OPS cluster are HP 9000 systems with similar
memory configuration and processor architecture. A node can be any
Series 800 model; Series 700s are not supported as OPS cluster nodes.
It is recommended that both nodes be of similar processing power
and memory capacity. If the nodes to be clustered have different
amounts of processing power and memory size, you may observe the
following behavior:
The node with less memory may become
a bottleneck. The reason is that the distributed lock manager (DLM),
which provides parallel cache management for OPS, has shared memory
segments which must be the same size on both nodes.
The node with less processing power may become a
bottleneck, since roughly half the DLM locks requested by one node
will be serviced by the other node.