scrun

execute (submit) an SCore parallel job

SYNOPSIS

scrun [OPTIONS] JOB

DESCRIPTION

The scrun command submits an SCore job. An SCore parallel job consists of set of parallel processes which is a set of Linux processes. The set of processes in a parallel nprocess is derived from one or more parallel program.

OPTION SYNTAX

The leading character of OPTIONS must be hyphen (-) followed by a comma (,) separated list of keyword and value pairs. keyword and its associated value (is any) must be separated by the equal (=) symbol.

Valid Option Syntax

The following option examples in which three options (opt0, opt1 and opt2) are specified are all valid and equivalent.
  1. scrun -opt0=val0 -opt1=val1 -opt2 a.out arg0 arg1
  2. scrun -opt0=val0,opt1=val1,opt2 a.out arg0 arg1
  3. scrun -opt0=val0,opt1=val1 -opt2 a.out arg0 arg1

NODE OPTION

The node specification can be a number (nodes=1024, for example) which is the number of processes to be involved in the parallel execution of the job, or can be in the form of NxM (nodes=32x8, for example), where N is the number of compute hosts and the M is the number of processes in a compute host. The total number of invoked processes is N times M. Note that nodes=4x8 and nodes=8x4 will have the same numbers of processes, but different node allocations.
It is also possible to specfy the number of hosts, instead of nodes, to be involved in the parallel computation. The nodes option can be specified with / symbol instead of the -b 'x' symbol. Assume a cluster having 128 hosts and each host has 8 cores. The node option of nodes=1024 is equivalent to the host option of hosts=32, because each host has 8 cores and 32 time 8 is 1024. Or the number of cores can explicitly be specified with the slash '/' symbol, hosts=1024/8, with the meaning of the number of hosts is obtained from the number of nodes (processes) devided by the number of processes in a host. The option keyword of node(s) and host(s) have the same meaning. However, if you want to specify the number of hosts, then the 'x' character after the number of hosts explicitly does so.
If you are aware of SMP architecture and want to allocate processes on different sockets. Then you can specify -nodes=8x2x1 or -hosts=16/2/1, then two processes in a host, each process on a different socket are allocated and 16 processes in total run on 8 hosts.

SCHEDULER OPTION

If the scrun command is invoked with the group option, then the scrun command firstly creates a SCOUT environment having the hosts specified with the group option. And then the job is executed in the host group. Or, if the scrun command is invoked in a already existing SCOUT environment but no group option, then the job is executed within the SCOUT environment.

PARALLEL JOB AND SPAWN

An SCore parallel job may consists of two or more parallel processes. The execution of the parallel proesses can be serialized or can run concurrently. The first example below is the serialized execution of two parallel processes and the latter example is the piped parallel processes running in parallel. See the IO REDIRECTION section to get more details of the piped parallel processes.
  1. scrun a.out :: b.out
  2. scrun a.out == b.out

A parallel process may consists of two or more spawns. A spawn is also a set of Linux processes derived from a parallel program. The all processes in a parallel process, no matter if it has one spawn or multiple spawns, share a network channel and communicate with the other processes. Syntactically spawns are separated by the : symbol and each spawn may accept option to specify the number of process to run. To distiguish the job option which affects entire SCore parallel job and the spawn option, the :: symbol can be used.
  1. scrun -nodes=16 :: -nodes=8 a.out : -nodes=8 b.out
  2. scrun -nodes=16 :: -nodes=8 a.out : b.out

In both cases in the above example, an SCore parallel process consisting with 8 processes derived from the program a.out and 8 processes derived from the program b.out. In the latter exmaple, the node option of the second spawn is omitted so that the rest of the empty cores are allocated.

IO REDIRECTION AND PIPED PARALLEL PROCESS

Like Linux shell programs, SCore supports input and/or output redirection. The standard input of each process of a aparallel process can be redirected to a file with the := symbol and the output can be redirected to a file with the =: symbol. If the output should be appended to a file, then the symbol =:: is used. The target file of this redirection is assumed to be located on each host where each process of a parallel process resides. Two or more parallel processes can be connected with pipes with the == symbol. With the piped parallel processes, the standard output of each process in leading parallel proess is piped and the standard input of each process in tariling parallel process. Here are some examples.
  1. scrun a.out := file.input
  2. scrun a.out =: file.output
  3. scrun a.out =:: file.append
  4. scrun a.out == b.out

If the input or otuput filename does not contain any slash (/), character and the catwalk option (see below) is specified and the file is not found on the compute node(s), then the file will be automatically copied to or from the local (server) node. If the filename does not contains any slash (/) but the catwalk option is not specified, then the file will be located in somewhere in the SCore working directory where the redirected files are accessible throughout the job.

EXIT CODE

The exit code of the scrun command is the exit code of the first process (rank 0). See also the errexit option described below.

STANDARD INPUT

(not yet implemented)

CATWALK FILE STAGING SYSTEM

when the catwalk option with the value of colon (:) separated list of directory names is specified, then the on-demand file staging system, Catwalk, is enabled. When a job tries to open a file but the file does not exist on the compute node and the file is located in one of the specified directories of the local (server) node, then the file is copied into current directory of the compute node, and the open() system call succeeds. When the job terminates, the created files in the current directory of compute nodes are copied back to the current directory of the local (server) node. When the filename includes one or more slash (/) character(s), then catwalk ignores the file and nothing special happens.

UMASK

The umask setting of the scrun process is passed to the parallel job execution on compute hosts.

SIGNAL

SIGINT, SIGTSTP, SIGHUP, SIGABRT, SIGFPE, SIGUSR1, SIGUSR2, and SIGWINCH signals will be broadcasted to all remote processes. Frequent SIGINT (^C) or SIGTSTP (^Z) signals force to kill or suspend the scrun process without waiting for the termination nor suspension of the parallel job. The SIGQUIT signal is reserved for the checkpoint trigger and the SIGHUP is automatically delivered to the parallel job when a deadlock state is detected.

NETWORK

The network option is to specify which network is to be used and the scrdnet option is to specify the network for SCore-D parallel process which manages user job. The value of the network option is either the one declared as a network record in the scorehosts.db(5) file, or one of the PMX device names. When the name can not be found in the scorehosts.db file, then the name is assumed to be a PMX device name. The PMX device name can be followed by device specific option string separated with the colon (:) symbol described below.

sctp

Network interface name (i.e. eth0) can be sspecified after the colon (:) symbol.

ethernet

Ethernet device name, such as eth0, eth1, can be specified after the colon (:) symbol.

etherhxb

(none)

mx

NIC number can be specified after the colon (:) symbol.

infiniband

Device name can be specified after the colon (:) sumbol.


OTHER OPTIONS

group=HOSTGROUP

file=HOSTGROUP

hostfile=HOSTGROUP

nodefile=HOSTGROUP

Specifying the host (node) group. See scorehosts(1) for more details on how to specify HOSTGROUP.

nodes=NODESPEC

hosts=NODESPEC

Specifying number of nodes (processes). See above NODE OPTION section.

openmp=NUMTHREADS

numthreads=NUMTHREADS

If specified, the OMP_NUM_THREADS environment variable having the same value with this option will be set on the compute node execution.

wdir=DIRPATH

Specify the current working directory of the parallel job running on compute hosts.

JOBID=JID

When the scrun command is invoked via batch scheduler, such as PBS or SGE, then the JID (job ID) given by the scheduler must be specified.

corebind[=N0[:N1[:N2:...]]]

Specifying the bit patterns representing the binding between a processes and cores. The bit pattern expression can be a decimal, octal (prefixed by '0'), or hexa-decimal number (prefixed by '0x'). The left-most number is the bit pattern of the first process in the host group and the second number is for the second process, and so on. If the bit pattern is zero, then the corresponding process is bound and can run on any core. If the nodes option explicitly specfies the socket allocation, then the bit pattern is effective only within the cores of the socket. When the option has no associated value, then one core will be allocated for each process exclusively.

numa[=yes|no]

On NUMA compute nodes, memory allocation of each proess on a compute node can be controlled by the numa option. If specified, memory regions which are local to the core bound to the process will be allocated. If the option value is no, then no NUMA memory allocation takes place. Default is no.

monitor=MONOPTS

If this monitor options is set and its associated value is not no, then the real-time activity monitor will be invoked. It is a X-Window based program to display a parameter of each process of the job. If the option value is -b cpu then the business of processes, comm for communication frequency, mem for memory usage, and disk for disk usage. If the option value is usr1 or usr2 then the value set by the user program will be displayed. Multiple option values can be specified by concatenating the values with the plus (+) symbol separator(s).

stat[istics]

When this option specified, then the some statistics values of each process of the last parallel process in a parallel job will be displayed.

X

When the scrun command is invoked with the group option and the parallel job requires X-Window to run, then this X option enables the X-Window relay of the scout envieonment. In this case, the DISPLAY environment must be set properly, the -display option of an X application will not work, since the value of the DISPLAY environment is changed.

catwalk[=DIR0[:DIR1[:DIR2...]]]

Catwalk on-demand file staging system is enabled. When a user program tries to open a file but the file does not exit on the compute node, then the file is copied from the directories specified on the local (server) node. When the job terminates then the newly created files on compute nodes are copied back to the local (server) node.

errexit=N

When the job is complex, job consisting of invoking multiple programs, and one of the program exited with the code larger than the value specified by this option, then the job is terminated without invoking tariling programs. The defaul value if zero.

ENVIRONMENT

SCBDSERV

Scoreboard server. If this environment variable is not set, then the network option must be specified.

SCORE_OPTIONS

Additional scrun options

PM_DLPATH

Specifying the path for the dynamic loading of the PMX device libraries.

PM_DYN_all

When this environment variable is defined, then the PMX library try to load PMX device libraries dynamically.


SEE ALSO

scout(1), scorehosts(1), scoregroups(1), scope(1), scorehosts.db(5).
CREDIT
This document is a part of the SCore cluster system software developed at PC Cluster Consortium, Japan. Copyright (C) 2003-2008 PC Cluster Consortium.