Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
VERITAS Volume Manager 3.2 Administrator's Guide: for HP-UX 11i and HP-UX 11i Version 1.5 > Chapter 10 Administering Cluster Functionality

Cluster Initialization and Configuration

» 

Technical documentation

» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Before any nodes can join a new cluster for the first time, you must supply certain configuration information during cluster manager setup. This information is normally stored in some form of cluster manager configuration database. The precise content and format of this information depends on the characteristics of the cluster manager. The information required by VxVM is as follows:

  • cluster ID

  • node IDs

  • network addresses of nodes

  • port addresses

When a node joins the cluster, this information is automatically loaded into VxVM on that node at node startup time.

NOTE:

The cluster functionality of VxVM requires that a cluster manager (such as provided by MC/ServiceGuard) has been configured. If MC/ServiceGuard is chosen as your cluster manager, no additional configuration of VxVM is required, apart from the cluster configuration requirements of MC/ServiceGuard.

The cluster manager startup procedure effects node initialization, and brings up the various cluster components (such as VxVM with cluster support, the cluster manager, and a distributed lock manager) on the node. Once this is complete, applications may be started. The cluster manager startup procedure must be invoked on each node to be joined to the cluster.

For VxVM in a cluster environment, initialization consists of loading the cluster configuration information and joining the nodes in the cluster. The first node to join becomes the master node, and later nodes (slaves) join to the master. If two nodes join simultaneously, VxVM chooses the master. Once the join for a given node is complete, that node has access to the shared disks.

Cluster Reconfiguration

A cluster reconfiguration occurs if a node leaves or joins a cluster. Each node's cluster manager continuously monitors the other cluster nodes. When the membership of the cluster changes, the cluster manager calls the vxclustd cluster reconfiguration daemon. The vxclustd daemon coordinates cluster reconfigurations and provides communication between VxVM and the cluster manager.

During cluster reconfiguration, VxVM suspends I/O to shared disks. I/O resumes when the reconfiguration completes. Applications may appear to freeze for a short time during reconfiguration.

If other operations, such as VxVM operations or recoveries, are in progress, cluster reconfiguration can be delayed until those operations have completed. Volume reconfigurations (see “Volume Reconfiguration”) do not take place at the same time as cluster reconfigurations. Depending on the circumstances, an operation may be held up and restarted later. In most cases, cluster reconfiguration takes precedence. However, if the volume reconfiguration is in the commit stage, it completes first.

For more information on cluster reconfiguration, see “vxclustd Daemon”.

vxclustd Daemon

The vxclustd daemon is the VxVM cluster reconfiguration daemon. The vxclustd daemon provides communication between the cluster manager and VxVM, and initiates cluster reconfigurations. Every node currently in the cluster runs an instance of the vxclustd daemon. Whenever cluster membership changes, the cluster manager notifies the vxclustd daemon, which then initiates a reconfiguration within VxVM.

The vxclustd daemon is started up by the cluster manager when the node initially attempts to join the cluster. The vxclustd daemon first registers with the cluster manager and obtains the following information from the cluster manager:

  • cluster ID and cluster name

  • node IDs and hostnames of all configured nodes

  • IP addresses of the network interfaces through which the nodes communicate with each other

Registration also sets up a callback mechanism for the cluster manager to notify the vxclustd daemon when cluster membership changes. After initializing kernel cluster variables, the vxclustd daemon waits for a callback from the cluster manager. When the vxclustd daemon obtains membership information from the cluster manager, it validates the membership change, and provides the new membership to the kernel. The reconfiguration process continues within the kernel and the vxconfigd daemon. This includes selection of a new master node if necessary, initiation of communication between vxconfigd daemons on the master and slave nodes, and a join protocol at the vxconfigd and kernel levels that validates VxVM objects and distributes VxVM configuration information across the cluster.

If reconfiguration completes successfully, the vxclustd daemon does not take any further action; it waits for the next membership change from the cluster manager. If reconfiguration within the kernel or within the vxconfigd daemon fails, the node must leave the cluster. The kernel fails I/Os in progress to shared disks, and stops access to shared disks and the vxclustd daemon. The vxclustd daemon invokes the cluster manager command to halt the cluster on this node.

When a clean node shutdown is performed, vxclustd waits until kernel cluster reconfiguration completes and then exits.

NOTE: If MC/ServiceGuard is the cluster manager, it expects the vxclustd daemon registration to complete within a given timeout period. If registration times out, MC/ServiceGuard aborts cluster initialization and fails cluster startup.

Volume Reconfiguration

Volume reconfiguration is the process of creating, changing, and removing VxVM objects such as disk groups, volumes and plexes. In a cluster, all nodes cooperate to perform such operations. The vxconfigd daemons (see “vxconfigd Daemon”) play an active role in volume reconfiguration. For reconfiguration to succeed, a vxconfigd daemon must be running on each of the nodes.

A volume reconfiguration transaction is initiated by running a VxVM utility on the master node. The utility contacts the local vxconfigd daemon on the master node, which validates the requested change. For example, vxconfigd rejects an attempt to create a new disk group with the same name as an existing disk group. The vxconfigd daemon on the master node then sends details of the changes to the vxconfigd daemons on the slave nodes. The vxconfigd daemons on the slave nodes then perform their own checking. For example, each slave node checks that it does not have a private disk group with the same name as the one being created; if the operation involves a new disk, each node checks that it can access that disk. When the vxconfigd daemons on all the nodes agree that the proposed change is reasonable, each notifies its kernel. The kernels then cooperate to either commit or to abandon the transaction. Before the transaction can be committed, all of the kernels ensure that no I/O is underway. The master node is responsible both for initiating the reconfiguration, and for coordinating the commitment of the transaction. The resulting configuration changes appear to occur simultaneously on all nodes.

If a vxconfigd daemon on any node goes away during reconfiguration, all nodes are notified and the operation fails. If any node leaves the cluster, the operation fails unless the master has already committed it. If the master node leaves the cluster, the new master node, which was previously a slave node, completes or fails the operation depending on whether or not it received notification of successful completion from the previous master node. This notification is performed in such a way that if the new master does not receive it, neither does any other slave.

If a node attempts to join a cluster while a volume reconfiguration is being performed, the result of the reconfiguration depends on how far it has progressed. If the kernel has not yet been invoked, the volume reconfiguration is suspended until the node has joined the cluster. If the kernel has been invoked, the node waits until the reconfiguration is complete before joining the cluster.

When an error occurs, such as when a check on a slave fails or a node leaves the cluster, the error is returned to the utility and a message is sent to the console on the master node to identify on which node the error occurred.

vxconfigd Daemon

The VxVM configuration daemon, vxconfigd, maintains the configuration of VxVM objects. It receives cluster-related instructions from the kernel. A separate copy of vxconfigd runs on each node, and these copies communicate with each other over a network. When invoked, a VxVM utility communicates with the vxconfigd daemon running on the same node; it does not attempt to connect with vxconfigd daemons on other nodes. During cluster startup, the kernel prompts vxconfigd to begin cluster operation and indicates whether it is a master node or a slave node.

When a node is initialized for cluster operation, the vxconfigd daemon is notified that the node is about to join the cluster and is provided with the following information from the cluster manager configuration database:

  • cluster ID

  • node IDs

  • master node ID

  • role of the node

  • network address of the vxconfigd daemon on each node

On the master node, the vxconfigd daemon sets up the shared configuration by importing shared disk groups, and informs the vxclustd daemon when it is ready for the slave nodes to join the cluster.

On slave nodes, the vxconfigd daemon is notified when the slave node can join the cluster. When the slave node joins the cluster, the vxconfigd daemon and the VxVM kernel communicate with their counterparts on the master node to set up the shared configuration.

When a node leaves the cluster, the vxclustd daemon notifies the kernel on all the other nodes. The master node then performs any necessary cleanup. If the master node leaves the cluster, the kernels choose a new master node and the vxconfigd daemons on all nodes are notified of the choice.

The vxconfigd daemon also participates in volume reconfiguration as described in “Volume Reconfiguration”.

vxconfigd Daemon Recovery

The vxconfigd daemon can be stopped or restarted at any time. While the vxconfigd daemon is stopped, volume reconfigurations cannot take place and other nodes cannot join the cluster until it is restarted. In the cluster, the vxconfigd daemons on the slave nodes are always connected to the vxconfigd daemon on the master node. It is therefore not advisable to stop the vxconfigd daemon on any cluster node.

Different actions are taken depending on which node the vxconfigd daemon is stopped:

  • If the vxconfigd daemon is stopped on a slave node, the master node takes no action. When the vxconfigd daemon is restarted on the slave, the slave vxconfigd daemon attempts to reconnect to the master daemon and to re-acquire the information about the shared configuration. (Neither the kernel view of the shared configuration nor access to shared disks is affected.) Until the vxconfigd daemon on the slave node has successfully reconnected to the vxconfigd daemon on the master node, it has very little information about the shared configuration and any attempts to display or modify the shared configuration can fail. For example, shared disk groups listed using the vxdg list command are marked as disabled; when the rejoin completes successfully, they are marked as enabled.

  • If the vxconfigd daemon is stopped on the master node, the vxconfigd daemons on the slave nodes periodically attempt to rejoin to the master node. Such attempts do not succeed until the vxconfigd daemon is restarted on the master. In this case, the vxconfigd daemons on the slave nodes have not lost information about the shared configuration, so that any displayed configuration information is correct.

  • If the vxconfigd daemon is stopped on both the master and slave nodes, the slave nodes do not display accurate configuration information until vxconfigd is restarted on the master and slave nodes, and the daemons have reconnected.

If the vxclustd daemon determines that the vxconfigd daemon is not running on a node during a cluster reconfiguration, vxclustd restarts vxconfigd.

NOTE: The -r reset option to vxconfigd restarts the vxconfigd daemon and recreates all states from scratch. This option cannot be used to restart vxconfigd while a node is joined to a cluster because it causes cluster information to be discarded.

Node Shutdown

Although it is possible to shut down the cluster on a node by invoking the shutdown procedure of the node's cluster manager, this procedure is intended for terminating cluster components after stopping any applications on the node that have access to shared storage. VxVM supports clean node shutdown, which allows a node to leave the cluster gracefully when all access to shared volumes has ceased. The host is still operational, but cluster applications cannot be run on it.

The cluster functionality of VxVM maintains global state information for each volume. This enables VxVM to determine which volumes need to be recovered when a node crashes. When a node leaves the cluster due to a crash or by some other means that is not clean, VxVM determines which volumes may have writes that have not completed and the master node resynchronizes these volumes. It can use dirty region logging (DRL) or FastResync if these are active for any of the volumes.

Clean node shutdown must be used after, or in conjunction with, a procedure to halt all cluster applications. Depending on the characteristics of the clustered application and its shutdown procedure, a successful shutdown can require a lot of time (minutes to hours). For instance, many applications have the concept of draining, where they accept no new work, but complete any work in progress before exiting. This process can take a long time if, for example, a long-running transaction is active.

When the VxVM shutdown procedure is invoked, it checks all volumes in all shared disk groups on the node that is being shut down. The procedure then either continues with the shutdown, or fails for one of the following reasons:

  • If all volumes in shared disk groups are closed, VxVM makes them unavailable to applications. Because all nodes are informed that these volumes are closed on the leaving node, no resynchronization is performed.

  • If any volume in a shared disk group is open, the shutdown operation in the kernel waits until the volume is closed. There is no timeout checking in this operation.

NOTE: Once shutdown succeeds, the node has left the cluster. It is not possible to access the shared volumes until the node joins the cluster again.

Since shutdown can be a lengthy process, other reconfigurations can take place while shutdown is in progress. Normally, the shutdown attempt is suspended until the other reconfiguration completes. However, if it is already too far advanced, the shutdown may complete first.

NOTE: The MC/ServiceGuard cmhaltnode command first attempts to halt all packages that are using shared disks before attempting to shut down VxVM. If an application running outside of a defined package performs I/O to a shared volume, it can delay shutdown of VxVM, resulting in an MC/ServiceGuard timeout.

Node Abort

If a node does not leave a cluster cleanly, this is because it crashed or because some cluster component made the node leave on an emergency basis. The ensuing cluster reconfiguration calls the VxVM abort function. This procedure immediately attempts to halt all access to shared volumes, although it does wait until pending I/O from or to the disk completes.

I/O operations that have not yet been started are failed, and the shared volumes are removed. Applications that were accessing the shared volumes therefore fail with errors.

After a node abort or crash, shared volumes must be recovered, either by a surviving node or by a subsequent cluster restart, because it is very likely that there are unsynchronized mirrors.

Cluster Shutdown

If all nodes leave a cluster, shared volumes must be recovered when the cluster is next started if the last node did not leave cleanly, or if resynchronization from previous nodes leaving uncleanly is incomplete.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1983-2001 Hewlett-Packard Development Company, L.P.