 |
» |
|
|
 |
Output from the cluster_config utility is shown in this section. Depending upon how you want your system configured, respond to the following questions as services are configured on the head node: When you are prompted to enter the number of NFS daemons required on your system; accept the default value. Configuring system wide functions / policies / behaviors
Executing C02ssh_config sconfigure
Executing C10cluster_fstab sconfigure
Executing C20sysparams sconfigure
NFS daemon tuning:
Given that there are 6 nodes in this cluster, enter the number of
NFS daemons that shall be configured to support them [8] : Enter
|
The default number scales according to the number of nodes in the system. The default represent the number of NFS daemons required on the head node to adequately support serving the /hptc_cluster file system. Table 4-5 lists the default values. Table 4-5 Number of NFS Daemons Based on System Size | Number of Nodes | Number of NFS Daemons |
|---|
| 8 | 8 | | 128 | 16 | | 256 | 32 | | 512 | 64 | | 768 | 96 | | 1024 or more | 128 |
Specify the Network Time Protocol (NTP) server. The head node is automatically configured as the system's NTP server if another server is not specified, but you have the option to provide up to four external NTP servers instead. If your HP XC system will be integrated with HP StorageWorks Scalable File Share (HP SFS), the XC and HP SFS systems must be synchronized to a common time server. Therefore, do not take the default response; instead, enter the same external time server that will be used for the HP SFS system. Executing C75mpiic sconfigure
Configuring service specific functions
Executing C05pdsh gconfigure
Executing C08ntp gconfigure
Configuring the following nodes as ntp servers for the cluster:
n16
You must now specify the clock source for the server nodes.
If the nodes have external connections, you may specify up
to 4 external NTP servers. Otherwise, you must use the node's
system clock.
Enter the IP address or host name of the first external NTP server
or leave blank to use the system clock on the NTP server node: Enter
Renaming previous /etc/ntp.conf to /etc/ntp.conf.bak |
Supply the network type if your system has a QsNetII interconnect; the possible choices for your system are displayed, and a default is provided: Enter the network type of your system.
Valid choices are QMS16 or QMS32: [QMS32]: Enter |
The network type reflects the maximum number of ports the fabric topology can support. See Appendix G for information about how to determine the QsNetII network type for your system. Supply the name of the LVS alias if you assigned a login role to one or more nodes. This example uses the alias penguin. This is the name by which users will log in to the system. If you did not assign a login role to any node, you are not asked to supply an LVS alias. Executing C10hptc_cluster_fs gconfigure
Executing C20gmmon gconfigure
Executing C30swmlogger gconfigure
Executing C30syslogng_forward gconfigure
Executing C35dhcp gconfigure
Executing C50cmf gconfigure
Executing C50lvs gconfigure
Enter the name of the cluster alias: penguin |
Enable Web access to the Nagios monitoring application and create a password for the nagiosadmin user. This password does not have to match any other password on your system. Executing C50nagios gconfigure
Would you like to enable web based monitoring? ([y]/n) y
Enter the password for the 'nagiosadmin' web user:
New password: your_nagios_password
Re-type new password: your_nagios_password
Adding password for user nagiosadmin
Executing C50nat gconfigure
Executing C50supermond gconfigure
Executing C51nagios_monitor gconfigure
Executing C60nis gconfigure |
Supply the name or IP address of your external NIS master server and your NIS domain name if you assigned the nis_server role to one or more nodes to configure them as a NIS slave server. If you did not assign a nis_server role to any node, you are not asked to supply this information.  |
Network Information Service (NIS) Configuration
This step sets up one or more NIS servers within the XC system
that are "slaves" to an external NIS "master". The master NIS
server provides the slaves with copies of its NIS maps.
In order to successfully complete this configuration step, the NIS
master must have been previously set to allow slaves to communicate
with it. On Linux systems, this is typically accomplished by adding
the NIS slave hostname(s) to the /var/yp/ypservers file on the NIS
master, and then running 'make'.
In addition, to complete this configuration, you will need to provide
1) the name or IP address of the NIS master, and
2) the NIS domain name hosted by the NIS master
Enter the name or IP address of the external NIS master: [] NIS_IP_address
Enter the NIS domain hosted by the NIS master: [] your_NIS_domain
Executing C90munge gconfigure
Executing C90slurm gconfigure |
Configure SLURM: Do you want to configure SLURM now? (y/n) [y]:y |
Do one of the following: If you intend to install LSF-HPC with SLURM , enter y. If you intend to install standard LSF do not install SLURM; in that case, enter n.
If you are installing SLURM, define a SLURM user name and accept all default responses. Output looks different if you assigned the resource_management role to one or more additional nodes because you will be prompted to assign the master and backup controller nodes.  |
This SLURM configuration needs a special SLURM user. The SLURM
controller daemons will be run by this user, and certain SLURM
runtime files will be owned by this user.
Enter the SLURM username [slurm]: Enter
User 'slurm' does not exist.
If this user account is created here, it will not have login
access. Do you want to create this user? (y/n) [y]: Enter
n16 is the only node with the Resource Management
role. Therefore the SLURM Master Controller daemon will be set up
on this node, and there will be no SLURM Backup Controller.
The current Compute Node configuration is:
NodeName=xc6n[11-16] Procs=2
NOTE: The only Partition created by default is the lsf
partition. If you want additional partitions, configure
them manually in the /hptc_cluster/slurm/etc/slurm.conf file.
The current Node Partition configuration is:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes=xc6n[11-16]
Do you want to enable SLURM-controlled user-access to the
compute nodes? (y/n) [n]: n 1
SLURM configuration complete. Press 'Enter' to continue: Enter
Executing C95lsf gconfigure |
 |
| 1 | By default, all compute nodes in the HP XC system are accessible by any user after their user accounts have been set up. This prompt enables you to restrict individual access to each compute node to the user who currently has the compute node reserved within SLURM. It is important that you assign a login role to each node on which you expect users to be able to log in and use the system. If you answer yes here and configure all nodes with the compute role (the default), but you do not configure any nodes with the login role, non-root users will not be allowed to log in to the system. |
Decide whether or not you want to install LSF as your job management system: Do you want to install LSF locally now? (y|n) [y]: |
Do one of the following: To install LSF-HPC with SLURM or standard LSF, enter y or press the Enter key. Proceed to step 9. If you intend to install another job management system, such as the Maui Scheduler (which is documented in Appendix J), enter n. Proceed to step 12. If at a future time you want to install LSF, rerun the cluster_config utility, and answer y to this question. The remainder of this procedure does not describe how to install any other job management system other than LSF-HPC with SLURM or standard LSF.
Decide the type of LSF to install: There are two types of LSF available to install:
1. Standard LSF: the standard Load Sharing Facility product.
2. LSF-HPC integrated with SLURM: the LSF High Performance
Computing solution integrated with SLURM for XC.
Which LSF product would you like to install (1/2)? [2]: |
Table 4-6 describes characteristics of LSF-HPC with SLURM and standard SLURM to help you decide which type of LSF to install. Table 4-6 Characteristics of LSF-HPC with SLURM and Standard LSF | LSF-HPC With SLURM | Standard LSF |
|---|
Designed to ensure that parallel jobs (MPI jobs) achieve the best performance by dedicating whole nodes to parallel jobs. This works well for systems with 2 processor and 4 processor nodes where jobs are expected to span across nodes. Exclusive node allocation with exclusive user access control. Presents the entire system as a single, large SMP host rather than a large system of many hosts. This simplifies system status commands because information is shown for one host, which makes it desirable for large-scale systems.
| A load-based scheduler, which is ideal for serial jobs. Finds the free resource that is the least loaded and dispatches the job to that node. Sufficient for sites that do not need the type of parallel job support provided by LSF-HPC with SLURM.
|
If you are installing LSF-HPC with SLURM, your first decision is where to assign the primary LSF node; this decision is not required for standard LSF. If more than one node is assigned the resource_management role, you are prompted to identify the primary LSF node, as follows: Here is the set of nodes from which to select the Primary
XC LSF-HPC node: n[15-16]
Enter the Primary XC LSF-HPC node [n16] : n16 |
If only one node is assigned the resource_management role, the following is displayed: n16 is the only node with the Resource Management role,
and it is the Primary LSF-HPC node. |
Provide responses to install and configure LSF. This requires you to supply information about the primary LSF administrator and administrator's password. The user name lsfadmin is the default user name for the primary LSF administrator. If you accept the default user name and a NIS account exists with the same name, LSF-HPC with SLURM will be configured with the existing NIS account. You will not be prompted to supply a password for the lsfadmin account. Otherwise, accept all default answers. Output is similar to the following:  |
What name shall LSF use to uniquely identify this system?
No existing host names are allowed, and the name must be
less than 39 characters with no whitespace:
LSF System Name [hptclsf]: Enter
Enter the name of the Primary LSF Administrator. You can
configure additional administrators later, but this user
must exist now in order to be given ownership of the files
to be installed. If this user does not exist, it will be
created locally [lsfadmin]: Enter
The lsfadmin user does not exist. Do you want to
create this user now? (y/n) [y] Enter
Changing password for user lsfadmin.
New UNIX password: your_lsfadmin_password
Retype new UNIX password: your_lsfadmin_password
passwd: all authentication tokens updated successfully.
Executing the Platform LSF-HPC installation script (hpcinstall)...
Logging installation sequence in
/opt/hptc/lsf/files/lsfhpc/install-20051216023643/hpc6.1_hpcinstall/Install.log
1) linux2.6-glibc2.3-ia32e-slurm
Press 1 or Enter to install this host type: Enter 1 |
 |
| 1 | The sample command output was obtained from an Opteron-based system. Thus, the tar file name is linux2.6-glibc2.3-amd64-slurm (the string amd64 signifies an Opteron- or Xeon-based architecture). When an Itanium-based system is configured, the string ia64 is included in the file name. |
Follow along with the remainder of the system configuration process. This sample output is provided for your information only; there is nothing else you have to do in this step. Despite some of the messages shown in the command output, everything you need to install, configure, and verify LSF and SLURM is described in this document.  |
Pre-installation check report saved as text file:
/opt/hptc/lsf/files/lsfhpc/install-20051216023643/hpc6.1_hpcinstall/prechk.rpt.
... Done LSF pre-installation check.
... Done installing hpc binary files "linux2.6-glibc2.3-ia32e-slurm".
... LSF configuration is done.
hpcinstall is done.
To complete your hpc installation and get your
cluster "hptclsf" up and running, follow the steps in
"/opt/hptc/lsf/files/lsfhpc/install-20051216023643/hpc6.1_hpcinstall/ \
hpc_getting_started.html".
After setting up your LSF server hosts and verifying
your cluster "hptclsf" is running correctly,
see "/opt/hptc/lsf/top/6.1/hpc_quick_admin.html"
to learn more about your new LSF cluster.
***Begin LSF-HPC Post-Processing***
Created '/hptc_cluster/lsf/tmp'...
Editing /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf...
Moving /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf
to /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf.old.6490...
Editing /opt/hptc/lsf/top/conf/lsf.conf...
Moving /opt/hptc/lsf/top/conf/lsf.conf
to /opt/hptc/lsf/top/conf/lsf.conf.old.6490...
Editing /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params...
Moving /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params
to /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params.old.6490...
Replaced default lsb.queues with a preconfigured lsb.queues.
C95lsf finished |
 |
After LSF is installed and configured, the golden image is created, and all other system services are configured and started. Output looks similar to the following:  |
Configuring the image replication environment
Initializing 172.20.0.16 as golden client
Creating the golden image (takes approximately 10 minutes)
**Do not interrupt this process or else the golden image will be incomplete**
Setting up the bootserver
Linking client nodes to their autoinstall script
Initializing service persistence
Sanitizing services in the golden image
Creating golden image 'tar' file (takes approximately 10-15 minutes)
Verifying integrity of golden image 'tar' file
Image replication environment configuration complete.
info: nconfig started
info: Executing on head node
info: Executing C02network nconfigure
info: Executing C04iptables nconfigure
info: Executing C06nfs_server nconfigure
info: Executing C08ntp nconfigure
info: Executing C10hptc_cluster_fs nconfigure
info: Executing C10hptc_cluster_fs_client nconfigure
info: Executing C20gmmon nconfigure
info: Executing C30swmlogger nconfigure
info: Executing C30syslogng_forward nconfigure
info: Executing C40hpasm nconfigure
info: Executing C50cmf nconfigure
info: Executing C50collectl nconfigure
info: Executing C50gather_data nconfigure
info: Executing C50hptc-lm nconfigure
info: Executing C50nagios nconfigure
info: Executing C50nat nconfigure
info: Executing C50supermond nconfigure
info: Executing C51nagios_monitor nconfigure
info: Executing C51nrpe nconfigure
info: Executing C90munge nconfigure
info: Executing C90slurm nconfigure
info: Executing C95lsf nconfigure
info: Executing C30syslogng_forward cconfigure
info: Executing C35dhcp cconfigure
info: Executing C50supermond cconfigure
info: Executing C90munge cconfigure
info: Executing C90slurm cconfigure
info: Executing C95lsf cconfigure
info: nconfig shut down
info: nconfig started
info: Executing on head node
info: Executing C02network nrestart
info: Executing C04iptables nrestart
info: Executing C06nfs_server nrestart
info: Executing C08ntp nrestart
info: Executing C10hptc_cluster_fs nrestart
info: Executing C10hptc_cluster_fs_client nrestart
info: Executing C20gmmon nrestart
info: Executing C30swmlogger nrestart
info: Executing C30syslogng_forward nrestart
info: Executing C40hpasm nrestart
info: Executing C50cmf nrestart
info: Executing C50collectl nrestart
info: Executing C50gather_data nrestart
info: Executing C50hptc-lm nrestart
info: Executing C50nagios nrestart
info: Executing C50nat nrestart
info: Executing C50supermond nrestart
info: Executing C51nagios_monitor nrestart
info: Executing C51nrpe nrestart
info: Executing C90munge nrestart
info: Executing C90slurm nrestart
info: Executing C95lsf nrestart
info: Executing C30syslogng_forward crestart
info: Executing C35dhcp crestart
info: Executing C50supermond crestart
info: Executing C90munge crestart
info: Executing C90slurm crestart
info: Executing C95lsf crestart
info: nconfig shut down |
 |
Proceed to “Task 8: Modify SLURM Characteristics (Optional)” to modify the SLURM configuration file. This task is optional.
|