 |
» |
|
|
 |
This section shows the output from the cluster_config utility and describes how to respond to
each configuration question. If you configured availability sets,
you are prompted to specify how you want to configure each service
for improved availability.  |  |  |  |  | NOTE: Table 3-3 describes each prompt and provides information
to help you with your answers. |  |  |  |  |
Accept the default value
when you are prompted to enter the number of NFS daemons required
on the system: Configuring system wide functions / policies / behaviors
Executing C02hptc_cluster_sfs sconfigure
Executing C02ssh_config sconfigure
Executing C03avail sconfigure
Executing C10cluster_fstab sconfigure
Executing C20sysparams sconfigure
NFS daemon tuning:
Given that there are 6 nodes in this cluster, enter the number of
NFS daemons that shall be configured to support them [8] : Enter |
The default value for the number
of NFS daemons scales according to the number of nodes in the system.
The default value represents the number of NFS daemons required on
the head node to adequately support serving the /hptc_cluster file system. Table 3-7 lists the default values. Table 3-7 Number of NFS Daemons Based on System Size | Number of Nodes | Number
of NFS Daemons |
|---|
| 8 | 8 | | 128 | 16 | | 256 | 32 | | 512 | 64 | | 768 | 96 | | 1024 or more | 128 |
Specify a quorum server node
name or a full path to the lock LUN (for example, /dev/sdb1, where 1 is the partition number) if you have
configured availability sets to use Serviceguard as the availability
tool . See “Deciding on the Method to Achieve Quorum for HP Serviceguard
Clusters” or the Serviceguard documentation
for more information. Executing C25syslimits sconfigure
Configuring service specific functions
Executing C03avail gconfigure
You need to configure quorum for the availability set with
members n16 and n14.
Valid choices are quorum server [q] or lock LUN [l]: [q] Enter
Please provide the name of the quorum server? [] node_name
OR
Please provide the name of the lock LUN [ ] full_path_to_lock_LUN
Executing C05pdsh gconfigure |
Specify the Network
Time Protocol (NTP) server. The head node is automatically configured
as the system's NTP server if another server is not specified,
but you have the option to provide up to four external NTP servers
instead. If the HP XC system will
be integrated with HP StorageWorks Scalable File Share (HP SFS), the HP XC and
HP SFS systems must be synchronized to a common time server. Therefore,
do not take the default response; instead, enter the same external
time server that will be used for the HP SFS system. Executing C08ntp gconfigure
Configuring the following nodes as ntp servers for the cluster:
n16
You must now specify the clock source for the server nodes. If the
nodes have external connections, you may specify up to 4 external NTP
servers. Otherwise, you must use the node's system clock.
Enter the IP address or host name of the first external NTP server
or leave blank to use the system clock on the NTP server node: IP_address
Enter the IP address or host name of the second external NTP server or
leave blank if you have no more servers: Enter
Renaming previous /etc/ntp.conf to /etc/ntp.conf.bak |
Supply
the network type if the system has a QsNetII interconnect; the possible choices for the system are displayed,
and a default is provided: Enter the network type of your system.
Valid choices are QMS16 or QMS32: [QMS32]: Enter |
The network type reflects the maximum number of
ports the fabric topology can support. See Appendix H for information
about how to determine the QsNetII network
type for the system. Specify how many node-level and top-level switches are
in the system configuration if the system has a QsNetII interconnect: Enter the number of node level switches in your configuration [1-32]:
Enter the number of top level switches in your configuration [0-32]: |
Specify
how you want to configure improved availability of the dbserver service if you have configured it for improved availability. If
you have not configured an availability set with this service, omit
this step and proceed to the next step. Executing C10hptc_cluster_fs gconfigure
Executing C12dbserver gconfigure
Availability can be configured for dbserver in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Executing C20gmmon gconfigure
Executing C20smartd gconfigure
Executing C30syslogng_forward gconfigure
Executing C35dhcp gconfigure
Executing C42mcs gconfigure
Executing C50cmf gconfigure
Executing C50lvs gconfigure |
Define an LVS alias
if you assigned a login role to one or more nodes.
This alias is the name by which users will log in to the system. If
you did not assign a login role to any node, you
are not asked to supply an LVS alias. If
you have configured the LVS director service for
improved availability, you are prompted to specify how you want to
configure improved availability. You are also prompted to decide whether you want
to configure the LVS director to accept login sessions
(that is, to be a real server). Executing C50lvs gconfigure
Enter the name of the cluster alias []: penguin
Availability can be configured for lvs in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Do you want the LVS director to act as a real server? (y/n) |
Enable web access
to the Nagios monitoring application and create a password for the nagiosadmin user. This password does not have to match
any other password on the system. In this example, the Nagios service
has been configured with improved availability. Executing C50nagios gconfigure
Availability can be configured for nagios in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Would you like to enable web based monitoring? ([y]/n) y
Enter the password for the 'nagiosadmin' web user:
New password: your_nagios_password
Re-type new password: your_nagios_password
Adding password for user nagiosadmin
Executing C50nat gconfigure |
Provide an additional
external IP address to use as an external alias if you configured
improved availability for the NAT service: Availability can be configured for nat in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Please provide the external IP alias for server n16 []: IP_address
Please provide the external IP alias for server n14 []: IP_address |
Specify
how you want to configure the snmptrapd service
if the hardware configuration includes HP server blade enclosures
or if you have defined an mcs.ini file to configure
MCS devices. Enclosures and MCS devices
are connected to the administration network, so accept the default
responses of Admin and loopback . Enter the letter R to specify External and Interconnect if the
system is configured with additional SNMP-based devices that are connected
to either the external or interconnect networks but are not connected
to the administration network.  |
Executing C50supermond gconfigure
Executing C51nagios_monitor gconfigure
Executing C60nis gconfigure
Executing C51nagios_monitor gconfigure
Executing C52snmp_traps gconfigure
Configuring the snmptrapd service for the cluster.
SNMP traps are configured to be received over the following interfaces:
loopback
Admin
Snmptrapd is also configured to listen to the Nagios IP alias.
Enter any additional interfaces over which SNMP traps are to be received:
1) External
2) Interconnect
Use a space-separated list to specify multiple interfaces, for example, 1 2
or enter 'd' if you want to start again with the (d)efault configuration
or leave blank if you want to use the current configuration:
Interfaces over which traps will be accepted:
loopback
Admin
[O]k, [R]especify Interfaces: O |
Provide information about SVA and
the remote graphics software configuration if you installed the SVA or
remote graphics software: Executing C52sva_stuff gconfigure
Does this cluster have a KVM attached to the visualization nodes (y/n)?
You must now specify the display nodes for the cluster.
Enter the host name of the first display node.
You can enter names like n[1-4] or n[1,3] too
n[1-5]
Enter the host name of the next display node
leave blank if you have no more display nodes:
You must now specify the Remote Graphics nodes for the cluster.
Each Remote Graphics node must have an external ethernet address configured.
Enter the host name of the first Remote Graphics node.
You can enter names like n[1-4] or n[1,3] too
n1
Enter the host name of the next Remote Graphics node
leave blank if you have no more Remote Graphics nodes: |
The following configuration questions are specific
to HP RGS: Do you want to use the RGS sender on the head node (y/n)? [n]
You have installed RGS Sender 4.0.0. Will you be using
RGS Receiver 3.0 to connect this cluster (y/n)? [n] |
Configure a self-signed certificate for the Apache server.
This step is optional, but if you do not configure a self-signed certificate,
the certificate that is shipped with the Linux base operating system
will be used instead.  |
Executing C52xcgraph gconfigure
Executing C54httpd gconfigure
Would you like to create a self-signed certificate for the Apache server? [y]/n Enter
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:
State or Province Name (full name) [Berkshire]:
Locality Name (eg, city) [Newbury]:
Organization Name (eg, company) [My Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:
Email Address []:
A self-signed certificate has been created for this cluster |
Supply
the name or IP address of the external NIS master server and the NIS
domain name if you assigned the nis_server role
to one or more nodes to configure them as NIS slave servers. If you
did not assign a nis_server role to any node, you
are not asked to supply this information.  |
Network Information Service (NIS) Configuration
This step sets up one or more NIS servers within the XC system
that are "slaves" to an external NIS "master". The master NIS
server provides the slaves with copies of its NIS maps.
In order to successfully complete this configuration step, the NIS
master must have been previously set to allow slaves to communicate
with it. On Linux systems, this is typically accomplished by adding
the NIS slave hostname(s) to the /var/yp/ypservers file on the NIS
master, and then running 'make'.
In addition, to complete this configuration, you will need to provide
1) the name or IP address of the NIS master, and
2) the NIS domain name hosted by the NIS master
Enter the name or IP address of the external NIS master: []
Enter the NIS domain hosted by the NIS master: [] your_NIS_domain
Executing C66ibmon gconfigure
Executing C80sfs gconfigure
Executing C90munge gconfigure
Executing C90slurm gconfigure |
Decide
whether you want to configure SLURM. SLURM is required if you installed SVA and
if you plan to install LSF-HPC with SLURM. Do you want to configure SLURM? (y/n) [y]: |
Do one of the following: If you intend to install LSF-HPC with SLURM or
if you intend to install the Maui Scheduler, or if you have already
installed SVA, enter y and proceed
to step 15. If you intend to install standard LSF do not install SLURM and enter n. Proceed to step 16.
Define a SLURM user
name and accept all default responses. The output looks different
if you assigned the resource_management role to
one or more additional nodes because you will be prompted to assign
the master and backup controller nodes.  |
This SLURM configuration needs a special SLURM user. The SLURM
controller daemons will be run by this user, and certain SLURM
runtime files will be owned by this user.
Enter the SLURM username [slurm]: Enter
User 'slurm' does not exist.
If this user account is created here, it will not have login
access. Do you want to create this user? (y/n) [y]: Enter
n16 is the only node with the Resource Management
role. Therefore the SLURM Master Controller daemon will be set up
on this node, and there will be no SLURM Backup Controller.
The current Compute Node configuration is:
NodeName=n[11-16] Procs=2
NOTE: The only Partition created by default is the lsf
partition. If you want additional partitions, configure
them manually in the /hptc_cluster/slurm/etc/slurm.conf file.
The current Node Partition configuration is:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes=n[11-16]
Do you want to enable SLURM-controlled user-access to the
compute nodes? (y/n) [n]: n 1
SLURM configuration complete. Press 'Enter' to continue: Enter
Executing C91swmlogger gconfigure
Executing C95lsf gconfigure |
 |
| 1 | By default, all compute nodes in the HP XC system
are accessible by any user after their user accounts have been set
up. This prompt enables you to restrict individual access to each
compute node to the user who currently has the compute node reserved
within SLURM. It is important that you assign a login role to each node on which you expect users to be able to log in
and use the system. If you answer yes here
and configure all nodes with the compute role (the default), but you
do not configure any nodes with the login role,
non-root users will not be allowed to log in to the system. |
Decide whether you want to install LSF as the job management
system: Do you want to install LSF now? (y|n) [y]: |
Do one of the following: To install LSF, enter y or press the Enter key. If you elected to install
SLURM in step 14, proceed to step 17 to choose the type of LSF to
install. If you did not elect to
install SLURM, you cannot choose the type of LSF to install, and standard LSF is
installed automatically. Proceed to step 19.
If you intend to install
another job management system, such as PBS Professional (see Chapter 9) or the Maui Scheduler
(see Chapter 10) enter n. Proceed to step 22. If
at a future time you want to install LSF, rerun the cluster_config utility, and answer y to this question. The remainder of this procedure does not describe
how to install any other job management system other than LSF-HPC with SLURM or standard LSF.
Decide what type
of LSF to install: There are two types of LSF available to install:
1. Standard LSF: the standard Load Sharing Facility product.
2. LSF-HPC integrated with SLURM: the LSF High Performance
Computing solution integrated with SLURM for XC.
Which LSF product would you like to install (1/2)? [2]: |
Table 3-8 lists the features of LSF-HPC with SLURM and standard LSF to help you decide
the type of LSF to install. Table 3-8 LSF-HPC with SLURM and Standard LSF Features | LSF-HPC with SLURM | Standard LSF |
|---|
Parallel support (through SLURM) for: Designed to ensure that parallel jobs (MPI jobs) achieve
the best performance by dedicating whole nodes to parallel jobs. This
works well for systems with 2 processor and 4 processor nodes where
jobs are expected to span across nodes. Exclusive node allocation with exclusive user access
control. Presents the entire system as a single, large SMP
host rather than a large system of many hosts. This simplifies system
status commands because information is shown for one host, which makes
it desirable for large-scale systems.
| Ideal for serial jobs because it is a load-based scheduler Finds the free resource that is the least loaded and
dispatches the job to that node. Sufficient for sites that do not need the type of
parallel job support provided by LSF-HPC with SLURM.
|
If you are installing LSF-HPC with SLURM,
your first decision is where to assign the primary LSF node; this
decision is not required for standard LSF. If more than one node
has the resource_management role, you are prompted
to identify the primary LSF node, as follows: Here is the set of nodes from which to select the Primary
XC LSF-HPC node: n[15-16]
Enter the Primary XC LSF-HPC node [n16] : n16 |
If only one node has the resource_management role, the following appears: n16 is the only node with the Resource Management role,
and it is the Primary LSF-HPC node. |
Provide responses
to install and configure LSF. This requires you to supply information
about the primary LSF administrator and administrator's password. The default user name for the primary LSF administrator
is lsfadmin. If you accept the default user name
and a NIS account exists with the same name, LSF is configured with
the existing NIS account, and you are not be prompted to supply a
password. Otherwise, accept all default answers. Command output is similar to the following:  |
What name shall LSF use to uniquely identify this system?
No existing host names are allowed, and the name must be
less than 39 characters with no whitespace:
LSF System Name [hptclsf]: Enter
Enter the name of the Primary LSF Administrator. You can
configure additional administrators later, but this user
must exist now in order to be given ownership of the files
to be installed. If this user does not exist, it will be
created locally [lsfadmin]: Enter
The lsfadmin user does not exist. Do you want to
create this user now? (y/n) [y] Enter
Changing password for user lsfadmin.
New UNIX password: your_lsfadmin_password
Retype new UNIX password: your_lsfadmin_password
passwd: all authentication tokens updated successfully.
Executing the Platform LSF-HPC installation script (hpcinstall)...
Logging installation sequence in /opt/hptc/lsf/files/lsfslurm/
install-20060925114531/lsf6.2_lsfinstall/Install.log
1) linux2.6-glibc2.3-x86_64-slurm
Press 1 or Enter to install this host type: Enter 1 |
 |
| 1 | The sample command output was obtained from an
HP ProLiant server. The tar file name is linux2.6-glibc2.3-x86_64-slurm (the string x86_64 signifies an Opteron or Xeon
chip architecture). The string ia64 is included
in the file name for HP Integrity servers. |
Follow along with
the remainder of the system configuration process. This sample output
is provided for your information only; there is nothing else you have
to do in this step. Despite some of the messages shown in the command
output, everything you need to install, configure, and verify LSF
and SLURM is described in this document.  |
Pre-installation check report saved as text file:
/opt/hptc/lsf/files/lsfslurm/install-20060925114531/
lsf6.2_lsfinstall/prechk.rpt
.
... Done LSF pre-installation check.
... Done installing lsf binary files "linux2.6-glibc2.3-x86_64-slurm".
... LSF configuration is done.
lsfinstall is done.
To complete your lsf installation and get your
cluster "hptclsf" up and running, follow the steps in
"/opt/hptc/lsf/files/lsfslurm/install-20060925114531/
lsf6.2_lsfinstall/lsf_getting_started.html".
After setting up your LSF server hosts and verifying
your cluster "hptclsf" is running correctly,
see "/opt/hptc/lsf/top/6.2/lsf_quick_admin.html"
to learn more about your new LSF cluster.
***Begin LSF-HPC Post-Processing***
Created '/hptc_cluster/lsf/tmp'...
Editing /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf...
Moving /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf
to /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf.old.7858...
Editing /opt/hptc/lsf/top/conf/lsf.conf...
Moving /opt/hptc/lsf/top/conf/lsf.conf
to /opt/hptc/lsf/top/conf/lsf.conf.old.7858...
Editing /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params...
Moving /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params
to /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params.old.7858...
Replaced default lsb.queues with a preconfigured lsb.queues.
C95lsf finished
Executing C99translate gconfigure
All user specified configuration is complete.
The Golden Image will be created next.
[P]roceed, [Q]uit: |
 |
Do one of the following: Enter the letter p to create the golden image. Proceed to step 22. Enter the letter q to exit the cluster_config utility
and change your responses. Exiting now does not create the golden
image, and your previous responses are stored in the configuration
management database. The next time you re-run the cluster_config utility, your previous responses are used as the default responses.
Follow along as the golden image is created and all system services
are configured and started. This process can take up to one hour to
complete.  |
Configuring the image replication environment
Initializing 172.20.0.16 as golden client
Creating the golden image (takes approximately 10 minutes)
**Do not interrupt this process or else the golden image will be incomplete**
Setting up the bootserver
Linking client nodes to their autoinstall script
Initializing service persistence
Sanitizing services in the golden image
Creating golden image 'tar' file (takes approximately 10-15 minutes)
Verifying integrity of golden image 'tar' file
Image replication environment configuration complete.
info: nconfig started
info: Executing on head node
info: Executing C02network nconfigure
info: Executing C04ip6tables nconfigure
info: Executing C04iptables nconfigure
info: Executing C06nfs_server nconfigure
info: Executing C08ntp nconfigure
info: Executing C10hptc_cluster_fs nconfigure
info: Executing C10hptc_cluster_fs_client nconfigure
info: Executing C11avail nconfigure
info: Executing C12dbserver nconfigure
info: Executing C20gmmon nconfigure
info: Executing C20smartd nconfigure
info: Executing C35dhcp nconfigure
info: Executing C40hpasm nconfigure
info: Executing C50cmf nconfigure
info: Executing C50collectl nconfigure
info: Executing C50gather_data nconfigure
info: Executing C50hptc-lm nconfigure
info: Executing C50lvs nconfigure
info: Executing C50nagios nconfigure
info: Executing C50nat nconfigure
info: Executing C51nrpe nconfigure
info: Executing C52snmp_traps nconfigure
info: Executing C66ibmon nconfigure
info: Executing C90slurm nconfigure
info: Executing C91swmlogger nconfigure
info: Executing C95lsf nconfigure
info: Executing C30syslogng_forward cconfigure
info: Executing C35dhcp cconfigure
info: Executing C50supermond cconfigure
info: Executing C90munge cconfigure
info: Executing C90slurm cconfigure
info: Executing C95lsf cconfigure
info: nconfig shut down
info: nconfig started
info: Executing on head node
info: Executing C02network nrestart
info: Executing C04ip6tables nrestart
info: Executing C04iptables nrestart
info: Executing C06nfs_server nrestart
info: Executing C08ntp nrestart
info: Executing C10hptc_cluster_fs nrestart
info: Executing C10hptc_cluster_fs_client nrestart
info: Executing C11avail nrestart
info: Executing C12dbserver nrestart
info: Executing C20gmmon nrestart
info: Executing C20smartd nrestart
info: Executing C35dhcp nrestart
info: Executing C40hpasm nrestart
info: Executing C50cmf nrestart
info: Executing C50collectl nrestart
info: Executing C50gather_data nrestart
info: Executing C50hptc-lm nrestart
info: Executing C50lvs nrestart
info: Executing C50nagios nrestart
info: Executing C50nat nrestart
info: Executing C51nrpe nrestart
info: Executing C52snmp_traps nrestart
info: Executing C66ibmon nrestart
info: Executing C90slurm nrestart
info: Executing C91swmlogger nrestart
info: Executing C95lsf nrestart
info: Executing C30syslogng_forward crestart
info: Executing C35dhcp crestart
info: Executing C50supermond crestart
info: Executing C90munge crestart
info: Executing C90slurm crestart
info: Executing C95lsf crestart
info: nconfig shut down |
 |
Proceed to “Task 12: Run the startsys Utility to Start the System and Propagate
the Golden Image”.
|