 |
» |
|
|
 |
This section shows output from the cluster_config utility.
Depending upon how you want the system configured, respond to the following
prompts as services are being configured on the head node and client nodes.
If you configured availability sets, you are prompted to specify how you want
to configure each service for improved availability.  |  |  |  |  | NOTE: Table 3-3 describes
each prompt and provides information to help you with your answers. |  |  |  |  |
When you are prompted to enter
the number of NFS daemons required on the system; accept the default value. Configuring system wide functions / policies / behaviors
Executing C02hptc_cluster_sfs sconfigure
Executing C02ssh_config sconfigure
Executing C03avail sconfigure
Executing C10cluster_fstab sconfigure
Executing C20sysparams sconfigure
NFS daemon tuning:
Given that there are 6 nodes in this cluster, enter the number of
NFS daemons that shall be configured to support them [8] : Enter |
The default value for the number of NFS daemons scales according
to the number of nodes in the system. The default value represents the number
of NFS daemons required on the head node to adequately support serving the /hptc_cluster file
system. Table 3-7 lists the default
values. Table 3-7 Number of NFS Daemons Based on System Size | Number of Nodes | Number of
NFS Daemons |
|---|
| 8 | 8 | | 128 | 16 | | 256 | 32 | | 512 | 64 | | 768 | 96 | | 1024 or more | 128 |
If you have configured
availability sets to use Serviceguard as the availability tool, specify a
quorum server node name or a full path to the lock LUN (for example, /dev/sdb1,
where 1 is the partition number). See “Deciding on the Method to Achieve Quorum for Serviceguard Clusters” or the Serviceguard documentation for more
information. Configuring service specific functions
Executing C03avail gconfigure
You need to configure quorum for the availability set with
members n16 and n14.
Valid choices are quorum server [q] or lock LUN [l]: [q] Enter
Please provide the name of the quorum server? [] node_name
OR
Please provide the name of the lock LUN [ ] full_path_to_lock_LUN
Executing C05pdsh gconfigure |
Specify the Network Time Protocol
(NTP) server. The head node is automatically configured as the system's
NTP server if another server is not specified, but you have the option to
provide up to four external NTP servers instead. If
the HP XC system will be integrated with HP StorageWorks Scalable
File Share (HP SFS), the HP XC and HP SFS systems must be synchronized
to a common time server. Therefore, do not take the default response; instead,
enter the same external time server that will be used for the HP SFS system. Executing C08ntp gconfigure
Configuring the following nodes as ntp servers for the cluster:
n16
You must now specify the clock source for the server nodes. If the
nodes have external connections, you may specify up to 4 external NTP
servers. Otherwise, you must use the node's system clock.
Enter the IP address or host name of the first external NTP server
or leave blank to use the system clock on the NTP server node: IP_address
Enter the IP address or host name of the second external NTP server or
leave blank if you have no more servers: Enter
Renaming previous /etc/ntp.conf to /etc/ntp.conf.bak |
Supply the network type if the
system has a QsNetII interconnect; the possible
choices for the system are displayed, and a default is provided: Enter the network type of your system.
Valid choices are QMS16 or QMS32: [QMS32]: Enter |
The network type reflects the maximum number of ports the
fabric topology can support. See Appendix H for
information about how to determine the QsNetII network
type for the system. Specify how many node-level and top-level switches are in the
system configuration if the system has a QsNetII interconnect: Enter the number of node level switches in your configuration [1-32]:
Enter the number of top level switches in your configuration [0-32]: |
If you have
configured the dbserver service for improved availability,
you are prompted to specify how you want to configure improved availability.
If you have not configured an availability set with this service, bypass this
step and proceed to the next step. Executing C10hptc_cluster_fs gconfigure
Executing C12dbserver gconfigure
Availability can be configured for dbserver in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Executing C20gmmon gconfigure
Executing C20smartd gconfigure
Executing C30syslogng_forward gconfigure
Executing C35dhcp gconfigure
Executing C42mcs gconfigure
Executing C50cmf gconfigure
Executing C50lvs gconfigure |
Supply the name of the LVS alias
if you assigned a login role to one or more nodes. This
example specifies the alias penguin and enables the LVS
director for improved availability. This is the name by which users will log
in to the system. If you did not assign a login role to
any node, you are not asked to supply an LVS alias. If you have
configured the LVS director service for improved availability,
you are prompted to specify how you want to configure improved availability. Executing C50lvs gconfigure
Enter the name of the cluster alias []: penguin
Availability can be configured for lvs in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2 |
Enable Web access to
the Nagios monitoring application and create a password for the nagiosadmin user.
This password does not have to match any other password on the system. In
this example, the Nagios service has been configured with improved availability. Executing C50nagios gconfigure
Availability can be configured for nagios in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Would you like to enable web based monitoring? ([y]/n) y
Enter the password for the 'nagiosadmin' web user:
New password: your_nagios_password
Re-type new password: your_nagios_password
Adding password for user nagiosadmin
Executing C50nat gconfigure |
If you configured improved availability
for the NAT service, you are prompted to enter an additional external IP address
to use as an external alias: Availability can be configured for nat in one of several ways.
The choices are:
1: standard
2: serviceguard
A choice of 'standard' (1) means no improved availability.
Enter the number corresponding to the way to configure availability []: 2
Please provide the external IP alias for server n16 []: IP_address
Please provide the external IP alias for server n14 []: IP_address |
Specify how you want to configure
the snmptrapd service if the hardware configuration includes
HP server blade enclosures or if you have defined an mcs.ini file
to configure MCS devices. Enclosures and
MCS devices are connected to the Administration Network, so accept the default
responses of Admin and loopback if
you want Nagios to generate alerts for critical SNMP traps sent from either
an enclosure or an MCS device. Enter the letter R to specify External and Interconnect if the system is configured with additional SNMP-based devices that are
connected to either the external or interconnect networks but are not connected
to the Administration Network.  |
Executing C50supermond gconfigure
Executing C51nagios_monitor gconfigure
Executing C60nis gconfigure
Executing C51nagios_monitor gconfigure
Executing C52snmp_traps gconfigure
Configuring the snmptrapd service for the cluster.
SNMP traps are configured to be received over the following interfaces:
loopback
Admin
Snmptrapd is also configured to listen to the Nagios IP alias.
Enter any additional interfaces over which SNMP traps are to be received:
1) External
2) Interconnect
Use a space-separated list to specify multiple interfaces, for example, 1 2
or enter 'd' if you want to start again with the (d)efault configuration
or leave blank if you want to use the current configuration:
Interfaces over which traps will be accepted:
loopback
Admin
[O]k, [R]especify Interfaces: O |
Optionally configure a self-signed
certificate for the Apache server. Otherwise, the certificate that is shipped
with the Linux base operating system will be used instead Executing C54httpd gconfigure
Would you like to create a self-signed certificate for the Apache server? [y]/n Enter
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:
State or Province Name (full name) [Berkshire]:
Locality Name (eg, city) [Newbury]:
Organization Name (eg, company) [My Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:
Email Address []:
A self-signed certificate has been created for this cluster |
Supply the name or IP address
of the external NIS master server and the NIS domain name if you assigned
the nis_server role to one or more nodes to configure them
as NIS slave servers. If you did not assign a nis_server role
to any node, you are not asked to supply this information.  |
Network Information Service (NIS) Configuration
This step sets up one or more NIS servers within the XC system
that are "slaves" to an external NIS "master". The master NIS
server provides the slaves with copies of its NIS maps.
In order to successfully complete this configuration step, the NIS
master must have been previously set to allow slaves to communicate
with it. On Linux systems, this is typically accomplished by adding
the NIS slave hostname(s) to the /var/yp/ypservers file on the NIS
master, and then running 'make'.
In addition, to complete this configuration, you will need to provide
1) the name or IP address of the NIS master, and
2) the NIS domain name hosted by the NIS master
Enter the name or IP address of the external NIS master: [] your_NIS_IP_address
Enter the NIS domain hosted by the NIS master: [] your_NIS_domain
Executing C66ibmon gconfigure
Executing C90munge gconfigure
Executing C90slurm gconfigure |
Configure SLURM: Do you want to configure SLURM? (y/n) [y]: |
Your answer depends on the type
of LSF you plan to install; do one of the following: If you intend to install LSF-HPC with SLURM or
the Maui Scheduler, enter y. If you intend to install standard LSF do not install SLURM and enter n.
If you are installing SLURM, define a SLURM user name and
accept all default responses. The output looks different if you assigned the resource_management role
to one or more additional nodes because you will be prompted to assign the
master and backup controller nodes.  |
This SLURM configuration needs a special SLURM user. The SLURM
controller daemons will be run by this user, and certain SLURM
runtime files will be owned by this user.
Enter the SLURM username [slurm]: Enter
User 'slurm' does not exist.
If this user account is created here, it will not have login
access. Do you want to create this user? (y/n) [y]: Enter
n16 is the only node with the Resource Management
role. Therefore the SLURM Master Controller daemon will be set up
on this node, and there will be no SLURM Backup Controller.
The current Compute Node configuration is:
NodeName=n[11-16] Procs=2
NOTE: The only Partition created by default is the lsf
partition. If you want additional partitions, configure
them manually in the /hptc_cluster/slurm/etc/slurm.conf file.
The current Node Partition configuration is:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes=n[11-16]
Do you want to enable SLURM-controlled user-access to the
compute nodes? (y/n) [n]: n 1
SLURM configuration complete. Press 'Enter' to continue: Enter
Executing C91swmlogger gconfigure
Executing C95lsf gconfigure |
 |
| 1 | By default, all compute nodes in the HP XC system
are accessible by any user after their user accounts have been set up. This
prompt enables you to restrict individual access to each compute node to the
user who currently has the compute node reserved within SLURM. It is important that you assign a login role
to each node on which you expect users to be able to log in and use the system.
If you answer yes here and configure all nodes with
the compute role (the default), but you do not configure any nodes with the login role,
non-root users will not be allowed to log in to the system. |
Decide whether or not you want to install LSF as the job management
system: Do you want to install LSF now? (y|n) [y]: |
Do one of the following: To install LSF-HPC with SLURM or standard LSF,
enter y or press the Enter key. Proceed
to step 15. If you intend to install another
job management system, such as PBS Professional (documented in Appendix L) or the Maui Scheduler (Appendix M) enter n. Proceed to step 20. If
at a future time you want to install LSF, rerun the cluster_config utility,
and answer y to this question. The remainder of this procedure does not describe how
to install any other job management system other than LSF-HPC with SLURM or standard LSF.
Decide
what type of LSF to install, if any: There are two types of LSF available to install:
1. Standard LSF: the standard Load Sharing Facility product.
2. LSF-HPC integrated with SLURM: the LSF High Performance
Computing solution integrated with SLURM for XC.
Which LSF product would you like to install (1/2)? [2]: |
Table 3-8 describes
characteristics of LSF-HPC with SLURM and standard LSF to help you decide the type
of LSF to install. Table 3-8 Characteristics of LSF-HPC with SLURM and Standard LSF | LSF-HPC with SLURM | Standard
LSF |
|---|
Parallel support (through SLURM) for: Designed to ensure that parallel jobs (MPI jobs) achieve the
best performance by dedicating whole nodes to parallel jobs. This works well
for systems with 2 processor and 4 processor nodes where jobs are expected
to span across nodes. Exclusive node allocation with exclusive user access control. Presents the entire system as a single, large SMP host rather
than a large system of many hosts. This simplifies system status commands
because information is shown for one host, which makes it desirable for large-scale
systems.
| Ideal for serial jobs because it is a load-based scheduler Finds the free resource that is the least loaded and dispatches
the job to that node. Sufficient for sites that do not need the type of parallel
job support provided by LSF-HPC with SLURM.
|
If you are installing LSF-HPC with SLURM,
your first decision is where to assign the primary LSF node; this decision
is not required for standard LSF. If more than one node has the resource_management role,
you are prompted to identify the primary LSF node, as follows: Here is the set of nodes from which to select the Primary
XC LSF-HPC node: n[15-16]
Enter the Primary XC LSF-HPC node [n16] : n16 |
If only one node has the resource_management role,
the following is displayed: n16 is the only node with the Resource Management role,
and it is the Primary LSF-HPC node. |
Provide responses to install and configure
LSF. This requires you to supply information about the primary LSF administrator
and administrator's password. The default user
name for the primary LSF administrator is lsfadmin. If
you accept the default user name and a NIS account exists with the same name,
LSF is configured with the existing NIS account, and you are not be prompted
to supply a password. Otherwise, accept all default answers. Command output is similar to the following:  |
What name shall LSF use to uniquely identify this system?
No existing host names are allowed, and the name must be
less than 39 characters with no whitespace:
LSF System Name [hptclsf]: Enter
Enter the name of the Primary LSF Administrator. You can
configure additional administrators later, but this user
must exist now in order to be given ownership of the files
to be installed. If this user does not exist, it will be
created locally [lsfadmin]: Enter
The lsfadmin user does not exist. Do you want to
create this user now? (y/n) [y] Enter
Changing password for user lsfadmin.
New UNIX password: your_lsfadmin_password
Retype new UNIX password: your_lsfadmin_password
passwd: all authentication tokens updated successfully.
Executing the Platform LSF-HPC installation script (hpcinstall)...
Logging installation sequence in /opt/hptc/lsf/files/lsfslurm/
install-20060925114531/lsf6.2_lsfinstall/Install.log
1) linux2.6-glibc2.3-x86_64-slurm
Press 1 or Enter to install this host type: Enter 1 |
 |
| 1 | The sample command output was obtained from an HP ProLiant
server. The tar file name is linux2.6-glibc2.3-x86_64-slurm (the
string x86_64 signifies an Opteron or Xeon chip architecture).
The string ia64 is included in the file name for HP Integrity
servers. |
Follow along with the remainder
of the system configuration process. This sample output is provided for your
information only; there is nothing else you have to do in this step. Despite
some of the messages shown in the command output, everything you need to install,
configure, and verify LSF and SLURM is described in this document.  |
Pre-installation check report saved as text file:
/opt/hptc/lsf/files/lsfslurm/install-20060925114531/
lsf6.2_lsfinstall/prechk.rpt
.
... Done LSF pre-installation check.
... Done installing lsf binary files "linux2.6-glibc2.3-x86_64-slurm".
... LSF configuration is done.
lsfinstall is done.
To complete your lsf installation and get your
cluster "hptclsf" up and running, follow the steps in
"/opt/hptc/lsf/files/lsfslurm/install-20060925114531/
lsf6.2_lsfinstall/lsf_getting_started.html".
After setting up your LSF server hosts and verifying
your cluster "hptclsf" is running correctly,
see "/opt/hptc/lsf/top/6.2/lsf_quick_admin.html"
to learn more about your new LSF cluster.
***Begin LSF-HPC Post-Processing***
Created '/hptc_cluster/lsf/tmp'...
Editing /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf...
Moving /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf
to /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf.old.7858...
Editing /opt/hptc/lsf/top/conf/lsf.conf...
Moving /opt/hptc/lsf/top/conf/lsf.conf
to /opt/hptc/lsf/top/conf/lsf.conf.old.7858...
Editing /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params...
Moving /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params
to /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params.old.7858...
Replaced default lsb.queues with a preconfigured lsb.queues.
C95lsf finished
Executing C99translate gconfigure
All user specified configuration is complete.
The Golden Image will be created next.
[P]roceed, [Q]uit: |
 |
Do one of the following: Enter the letter p to create the golden
image. Proceed to step 20. Enter the letter q to edit your responses.
Choosing this option does not create the golden image, but your previous responses
are stored in the configuration management database. The next time you re-run
the cluster_config utility, your previous responses are
used as the default responses.
Follow
along as the golden image is created and all system services are configured
and started. This process can take up to one hour to complete.  |  |  |  |  | CAUTION: Do not press the Ctrl+c key
sequence to stop or interrupt the golden image creation. Doing so corrupts
the database. |  |  |  |  |
 |
Configuring the image replication environment
Initializing 172.20.0.16 as golden client
Creating the golden image (takes approximately 10 minutes)
**Do not interrupt this process or else the golden image will be incomplete**
Setting up the bootserver
Linking client nodes to their autoinstall script
Initializing service persistence
Sanitizing services in the golden image
Creating golden image 'tar' file (takes approximately 10-15 minutes)
Verifying integrity of golden image 'tar' file
Image replication environment configuration complete.
info: nconfig started
info: Executing on head node
info: Executing C02network nconfigure
info: Executing C04ip6tables nconfigure
info: Executing C04iptables nconfigure
info: Executing C06nfs_server nconfigure
info: Executing C08ntp nconfigure
info: Executing C10hptc_cluster_fs nconfigure
info: Executing C10hptc_cluster_fs_client nconfigure
info: Executing C11avail nconfigure
info: Executing C12dbserver nconfigure
info: Executing C20gmmon nconfigure
info: Executing C20smartd nconfigure
info: Executing C35dhcp nconfigure
info: Executing C40hpasm nconfigure
info: Executing C50cmf nconfigure
info: Executing C50collectl nconfigure
info: Executing C50gather_data nconfigure
info: Executing C50hptc-lm nconfigure
info: Executing C50lvs nconfigure
info: Executing C50nagios nconfigure
info: Executing C50nat nconfigure
info: Executing C51nrpe nconfigure
info: Executing C52snmp_traps nconfigure
info: Executing C66ibmon nconfigure
info: Executing C90slurm nconfigure
info: Executing C91swmlogger nconfigure
info: Executing C95lsf nconfigure
info: Executing C30syslogng_forward cconfigure
info: Executing C35dhcp cconfigure
info: Executing C50supermond cconfigure
info: Executing C90munge cconfigure
info: Executing C90slurm cconfigure
info: Executing C95lsf cconfigure
info: nconfig shut down
info: nconfig started
info: Executing on head node
info: Executing C02network nrestart
info: Executing C04ip6tables nrestart
info: Executing C04iptables nrestart
info: Executing C06nfs_server nrestart
info: Executing C08ntp nrestart
info: Executing C10hptc_cluster_fs nrestart
info: Executing C10hptc_cluster_fs_client nrestart
info: Executing C11avail nrestart
info: Executing C12dbserver nrestart
info: Executing C20gmmon nrestart
info: Executing C20smartd nrestart
info: Executing C35dhcp nrestart
info: Executing C40hpasm nrestart
info: Executing C50cmf nrestart
info: Executing C50collectl nrestart
info: Executing C50gather_data nrestart
info: Executing C50hptc-lm nrestart
info: Executing C50lvs nrestart
info: Executing C50nagios nrestart
info: Executing C50nat nrestart
info: Executing C51nrpe nrestart
info: Executing C52snmp_traps nrestart
info: Executing C66ibmon nrestart
info: Executing C90slurm nrestart
info: Executing C91swmlogger nrestart
info: Executing C95lsf nrestart
info: Executing C30syslogng_forward crestart
info: Executing C35dhcp crestart
info: Executing C50supermond crestart
info: Executing C90munge crestart
info: Executing C90slurm crestart
info: Executing C95lsf crestart
info: nconfig shut down |
 |
Proceed to “Task 11: Run the startsys Utility to Start the System and Propagate
the Golden Image”.
|