scored
(8)
from within a
scout
(1)
environment. scored
fork
()'s and
exec
()'s user
processes, so scored must run as root
so it can set
the user ID of user processes. Here is a very simple example of starting
a Multi-User Environment for the cluster group 'pcc
', which
consists of four compute hosts, comp0
, comp1
,
comp2
and comp3
. The
scoreboard(8)
is executing on host 'serv1
' so we need to set the
SCBDSERV
environment variable:
The startup of$ /bin/su - # export SCBDSERV=serv1 # export PATH=$PATH:/opt/score/bin:/opt/score/sbin:/opt/score/deploy # scout -g pcc SCOUT: Spawn done. SCOUT: session started # scored SYSLOG: Timeslice is set to 500[ms] SYSLOG: Cluster[0]: comp0.trc.rwcp.or.jp@0...comp3.trc.rwcp.or.jp@3 SYSLOG: BIN=linux, CPUGEN=pentium-iii, SMP=1, SPEED=500 SYSLOG: Network[0]: myrinet/myrinet SYSLOG: SCore-D network: myrinet/myrinet SYSLOG: SCore-D server: comp3.trc.rwcp.or.jp:9901
scored
will take a few seconds to complete.
If you are already executing the Compute Host Lock Client,
msgb
(1), then
you will see the node blocks in the msgb
window change from
blue to red.
The first line of SYSLOG output shows the timeslice set to 500ms. This can
be changed with the -ts time_slice
option on the
scored
command line. Cluster information follows. In this case,
there is one cluster designated as Cluster[0]
consisting of 4
hosts. comp0.trc.rwcp.or.jp
is host 0, running to
comp3.trc.rwcp.or.jp
as host 3. Following this line is CPU
information about the hosts in the cluster. The binary type is
linux
, the CPU generation name is pentium-iii
.
Each host in the cluster is a uni-processor (SMP
is set to
1
in this case), and the processor speed is 500
.
This could mean 500 MHz but it is chosen at the administrator's discretion.
There is one Myrinet network designated as Network[0]
. The
default network for SCore-D can only be this network, in this case. If there
were multiple networks then the default network will be the first network in
the list. The SCore-D server is, by default, the last host in the cluster,
in this case, comp3.trc.rwcp.or.jp
.
scored
is now running on the cluster group and users can submit
jobs to be executed on the cluster using commands such as
scrun
(1) or
mpirun
(1).
User's program executable files are copied by scored
to the
compute hosts. The files do not need to be located in a network file
system, however, user executable files must be readable for the
copy to take place. Users executable files are copied into the
/var/scored
directory.
A checkpoint image is also stored in this
directory if the user program requests to be checkpointed.
The directory must be located in a file system where there is
enough disk space to hold those files. The directory may be
created when scored is firstly invoked on the cluster hosts. If
the system administrator wants to have the directory in another file
system, then the administrator must create a symbolic link before
scored is run. User files are removed when a
parallel process is terminated.
There are several options available for scored. You can read the
scored
(8)
man page for complete details.