scrun is a front-end program for
scored(8) that manages a
variety of cluster resources. User programs running on a cluster must
be invoked via the scrun program.
Firstly, scrun invokes the user program, specified by
file, on the host where scrun was executed. This
is done in such a way to get the required resource information. Then,
scrun tries to login to scored. After login, scrun
becomes a front-end process in order to control job status of the user
program running on the cluster. When the user program finishes,
scrun also terminates.
Valid arguments to scrun could be SCore options. The options are
various resource specifications to SCore-D and/or options for language runtime
systems on which user programs rely. In this manual page, only SCore-D
options are described. The language system options must be consulted where
those systems are installed.
If the first argument of scrun does not begin with the minus
(-) character, or the third argument when SCore options are specified,
then the next argument must be the filename of the program to be executed on
the cluster. The specified executable file is copied and then invoked by
scored on all allocated nodes in the cluster. Arguments following
the filename are passed to the invocation of the executable file on all nodes.
The executable file must have read permission so that scrun can read
the file and copy it to cluster nodes. The file must also be an executable
file on the host where scrun is invoked, so that scrun can
execute the file to get resource information.
The scrun program can also submit a parallel job to cluster
hosts which have a different OS and/or CPU from the host where
scrun is invoked. In this case, at least two executable files
must be present, one for the scrun local invocation, and
another for cluster execution. To allow for this situation, the
executable files must be compiled with the SCore
smake(1) commands
(not make or gmake).
In this case, the executable file must be a symbolic link to the
.wrapper script which will be automatically created by the
smake command. It is the users
responsibility to have consistent heterogeneous executable files
compiled from the same source code.
GENERAL FORMAT OF SCORE OPTIONS
The first character must be a minus (-), followed by keyword and
value pairs, each pair is separated by a comma (,). The keyword is
a predefined SCore literal and its associated value are separated by the equal
(=) character. Here is an example:
scrun -nodes=2,cpulimit=4 a.out
In this case, two SCore options are specified, one is the "nodes"
option and another the "cpulimit" option. The nodes option
has a value of "2", and cpulimit has a value of "4".
If the same keywords are listed in the SCore options, then the leftmost one is
taken. The value of the SCORE_OPTIONS environment variable is taken
as the default option setting.
SINGLE USER MODE AND MULTIPLE USER MODE
The SCore-D operating system is designed to run multiple jobs at a time in
a time-sharing manner. However, it has a single user mode to allow users
to use the cluster exclusively. This is useful when users want to evaluate
programs. When scrun is invoked in a SCOUT environment (see
scout(1)) and no scored options are
specified, then scrun firstly invokes scored on the cluster
within the SCOUT environment, and then the user program will be executed on
the invoked scored. When the user program terminates scored
also terminates.
If group option is sepcified, then scrun creates the
scout environment on the hosts
specified by the value of the group option and then user
program is executed in the single user mode. When the file
option is specified and a set of hostnames is listed in the file
specified as the option value, then the user program is executed in
the single user mode in that host group. The checkpoint options is
enabled when the group or file options is specified.
If scored is already running on a cluster, then the user must specify,
with the scored option, the SCore-D server host which is accepting
user logins or the hosts where SCore-D is running. User can also
specify host group name by the scored option to specify the
set of hosts where SCore-D is running on. Precisely, the value of the
scored option is in the format which
scorehosts can accept.
RESOURCE SPECIFICATION
SCore-D manages a variety of cluster resources, such as node, networks, disks,
etc. In this section, those resource specification options are described.
-
nodes=[hosts][xprocs][.[bintype][.cpugen[.speed]]]
-
hosts is the number of hosts or nodes in a cluster required
to run a user program. procs is the number of processes to
be invoked on a SMP cluster. If procs is not present, and
allocated hosts are in a SMP cluster, then the number of allocated hosts might
be the number of hosts divided by the number of processors
in the SMP. If the procs number is specified, then that
number of processes on each SMP host is invoked if possible. If the number of
requested nodes is less than the total number of nodes in the partition, then
SCore-D allocates nodes such that node loads are balanced. The
bintype option specifies the binary type to be run on a
heterogeneous cluster. The name of the binary type comes from the smake or
Hmake command and the .wrapper script. If you want to binary type name,
please see glossary.
On a heterogeneous cluster, users can specify CPU types by the
cpugen option. Possible values for cpugen
are specified in
scorehosts.db(5),
which is a database containing all cluster information. The
speed option values are also specified in
scorehosts.db.
-
network=network_name[+network_name]...
-
Users can specify the network (PM device) by the network
option to allocate the network for user program execution on a
cluster. Valid network_name(s) are specified in
scorehosts.db(5).
Users can also specify multiple networks for a user program parallel
execution.
- priority=number
-
Scheduling priority can be specified with the
priority option. The smaller the value, the higher the
priority. A job having a higher value will be scheduled more often.
- monitor[=monitor_type]
-
Attach a real-time user program execution monitor. Valid types for
monitor_type are: load, comm,
memory, disk, usr0, usr1,
all and ALL.
load attaches a CPU activity monitor, comm attaches a
communication activity monitor. memory and disk
options attach memory and disk usage monitors, respectively. The usage
value is scaled to limit values, if specified. Otherwise they are
scaled to the values of available free space when SCore-D is invoked.
usr0 and usr1 options attach monitors displaying the
values set by user program (See
sc_set_monitor()
).
If the monitor option has no
value, then load and communication monitors are attached.
If user specifies all, then CPU, communication, memory
usage and disk usage monitors are attached. If user specifies
ALL, then all six monitors are attached.
User must have an accesible X window server and the DISPLAY
environment variable must be set correctly.
- debug[=number]
-
The MPC++ or MPICH-SCore runtime system is programmed to detect exception
signals such as SIGSEGV. When an exection signal is raised,
the runtime system asks SCore-D to attach a GDB (GNU debugger) process
to debug the user program. If the
debug option is specified, and the user program is running in
time sharing priority, then SCore-D creates a GDB process. Otherwise,
the user program will be killed.
The number options limits the number of debugger processes
attached at the same time. The default value is 4 and the maximun
number is limited to 10. If the DISPLAY environment variable
is set, then SCore-D creates an xterm process in which GDB
process runs. If the DISPLAY environment varibale is absent
or having no value, but score.gdb file exists in the current
directory and the file is readable from cluster hosts, then the
GDB process will read the file and execute the GDB commands written in
the file. If score.gdb file is not accesible, then the
GDB process will execute only the backtrace GDB command.
- stat[istics][=stat_type]
-
When a user program terminates, scored outputs resource usage
information to standard error or the scrun process. The default is
to only output summary information unless stat_type is
specified. Valid types for stat_type are: all and
detail. If either of these types are used then individual node
information will be output.
- scored=scored-server [multi-user mode only]
-
Specify SCore-D server hostname to login SCore-D which is already
running with multi-user mode. If this option is not specified, then
SCore-D is invoked by scrun in single-user mode.
- group=hostgroup [single-user mode only]
-
Firstly a SCOUT environment is created according to the
specified hostgroup, then user program is invoked in the
SCOUT environment. Checkpoint is enabled with this options in the
single-user mode.
- file=filename [single-user mode only]
-
Firstly a SCOUT environment is created according to the list of
hostnames written in the file filename, then user program is
invoked in the SCOUT environment. Checkpoint is enabled with this
options in the single-user mode.
- corebinding=bits[:bits..]
-
Specifying the core (processor) and process binding. The value is the
colon (:) separated list of bit patterns. The left most pattern is the
binding pattern of process 0. Each bit pattern specifies the possible
cores (processors) to be executed. The LSB is core zero (0). For
example, the bit pattern of 3 means the process will be executed on
core zero or one. The bit pattern can be specified in decimal, octal
if prefixed by zero, or hexa-decimal if prefixed by "0x".
- restart
-
Cluster hosts sometimes crash and running jobs are killed unexpectedly. If the
restart option is set, user's program execution will be restarted from
the beginning when scored is restarted with the -restart
option. Note that this restart option is valid while the scrun
process is alive. When the user kills the
scrun process, restart never happens.
- checkpoint[=interval]
-
This option is similar to the restart option, but user's program
execution contexts are saved to local disk at a specified time
interval. If the interval value is immediately followed by a character,
'm', 'h' or 'd', then the unit of the
interval is minute, hour or day, respectively.
When scored is restarted, program execution continues from where the
more recent checkpoint was taken. This restart will only take place
while the scrun process is alive.
If you want to checkpoint with single user mode,
you must execute with group or file option without
the SCOUT environment.
- cpulimit=limit
-
Specify the time limit (in seconds) of a user program to run.
- memorylimit=limit
-
This option specifies memory limit (in MB). This option is
effective when SCore-D is running in multi-user mode.
- disklimit=limit
-
This option specifies disk limit (in MB). This option is
effective when SCore-D is running in multi-user mode.
- wait
-
If the wait option is specified and login to SCore-D
because of specified resource is temporarily unavailable, then login
is postponed until specified resource is available. This option is
only effective for SCore-D running in multi-user mode.
- message[=mode]
-
Control output messages produced by the SCore system at runtime. Valid
modes for mode are: concise and
quiet. The default is to output all messages.
concise suppresses normal messages so only
warning and error messages are output. quiet supresses all
messages except for error messages.
- resource
-
When the resource option is specified, scrun tries to
investigate SCore options and resource requests of user program(s), and
then the SCore options, resource requests, and pathname(s) of user
program(s) will be displayed and then exit. User program(s) will not
run on a cluster.
- passhup
-
Before version 5.6, when the scrun process receives the
SIGHUP signal, the signal is broadcasted to the processes
running on cluster hosts. In SCore 5.6, however, when the scrun
process receives the SIGHUP signal, then the standard outputs
are redirected to the file named scrun PID.output, so that
the scrun can survive even when the shell through which the scrun
process is invoked terminates. This options is for backward
compatibility, and if this options is specified, the output
redirection will not take place and the SIGHUP signal will be
broadcasted to the processes running on cluster hosts.
- ts=timeslice [single-user mode only]
-
Specify the time interval (in milli-second) interrupting user program
execution to detect a deadlock situation.
JOB CONTROL and SIGNALS
The job status of user program execution on a cluster is linked with the job
status of scrun. Users can suspend, resume, or kill parallel jobs
running on a cluster similar to a normal UNIX command by typing "^Z",
"fg", and "^C". Further, if the output of scrun is stopped by "^S",
eventually cluster execution can be suspended until scrun output
is allowed by "^Q". Typing "^\" triggers checkpointing, instead of creating a core file, and waiting for its
restart when SCore-D unexpectedly terminates (system down).
Some UNIX signals delivered to the scrun process will be
forwarded and broadcasted to the processes running in a cluster. The
forwarded signals are SIGINT, SIGABRT, SIGTERM, SIGURG,
SIGWINCH, SIGUSR1, and SIGUSR2.
INPUT/OUTPUT REDIRECTION
Similar to the Unix shell, the standard inputs and/or outputs of a
parallel process can be redirected to files. When a user program
specified in the scrun arguments is followed by the
":=" symbol and a filename, then the standard inputs of the
parallel process derived from the user program are the file. If the
symbol is "=:," then the standard outputs are the
file. If the symbol is "=::," then the outputs are appended
to the file.
Note that the open files are local and located in compute
hosts. Further, if the filename is a basename, there is no "/" in its
name, then the files are created in an SCore-D working directory
located on compute hosts, and they are removed when the parallel job
is terminated. If the filename is an abosolute pathname, then the
files are created on specified pathname. No relative pathname is
allowed.
On an SMP cluster and the output redirection pathname is absolute,
only the first process in a compute host will be redirected to the
specified file, and the other processes will output to the
/dev/null.
PARALLEL JOB
Similar to the Unix shell, scrun not only supports simple
commands, but also pipelined commands and sequenced commands.
Pipelined commands are separated with the "==" symbol, and
sequenced commands with the "::" symbol. Parallel processes
in a parallel job are allocated in the same partition (set of hosts)
in a cluster. Pipelined
parallel processes having the same node number but belong to different
pipelined parallel processes are connected with the Unix pipe, just
like pipelined commands under the Unix shell, and they are scheduled
at the same time. Parallel processes are executed in sequence
when they are separated with sequential symbol(s) ("::").
Sequential programs, such as normal Unix commands or C/C++ programs
can run on a cluster via the
system(6) comand, just like
the way of the Unix system(3) function. This
system command can be used for house keeping of a cluster.
Combining the scatter(6)
command, user parallel program and
gather(6) command in
serial, users can move necessary data file back and forth between
users' workstation and cluster hosts.