scrun

execute (submit) an SCore parallel job More...

SYNOPSIS

scrun [OPTIONS] JOB

DESCRIPTION

The scrun command submits an SCore job. An SCore parallel job consists of set of parallel processes which is a set of Linux processes. The set of processes in a parallel nprocess is derived from one or more parallel program.

OPTION SYNTAX

The leading character of OPTIONS must be hyphen (-) followed by a comma (,) separated list of keyword and value pairs. keyword and its associated value (is any) must be separated by the equal (=) symbol.

Valid Option Syntax

The following option examples in which three options (opt0, opt1 and opt2) are specified are all valid and equivalent.
  1. scrun -opt0=val0 -opt1=val1 -opt2 a.out arg0 arg1
  2. scrun -opt0=val0,opt1=val1,opt2 a.out arg0 arg1
  3. scrun -opt0=val0,opt1=val1 -opt2 a.out arg0 arg1

NODE OPTION

The node specification can be a number (nodes=1024, for example) which is the number of processes to be involved in the parallel execution of the job, or can be in the form of NxM (nodes=32x8, for example), where N is the number of compute hosts and the M is the number of processes in a compute host. The total number of invoked processes is N times M. Note that nodes=4x8 and nodes=8x4 will have the same numbers of processes, but different node allocations.
It is also possible to specfy the number of hosts, instead of nodes, to be involved in the parallel computation. The nodes option can be specified with / symbol instead of the -b x symbol. Assume a cluster having 128 hosts and each host has 8 cores. The node option of nodes=1025 is equivalent to the host option of hosts=32, because each host has 8 cores and 32 time 8 is 1024. Or the number of cores can explicitly be specified with the / symbol, hosts=1024/8, with the meaning of the number of hosts is obtained from the number of nodes (processes) devided by the number of processes in a host. The option keyword of node(s) and host(s) have the same meaning. However, if you want to specify the number of hosts, then the x character after the number of hosts explicitly does so.
If you are aware of SMP architecture and want to allocate processes on different sockets. Then you can specify -nodes=8x2x1 or -hosts=16/2/1, then two processes in a host, each process on a different socket are allocated and 16 processes in total run on 8 hosts.

SCHEDULER OPTION

If the scrun command is invoked with the group option, then the scrun command firstly creates a SCOUT environment having the hosts specified with the group option. And then the job is executed in the host group. Or, if the scrun command is invoked in a already existing SCOUT environment but no group option, then the job is executed within the SCOUT environment.

PARALLEL JOB AND SPAWN

An SCore parallel job may consists of two or more parallel processes. The execution of the parallel proesses can be serialized or can run concurrently. The first example below is the serialized execution of two parallel processes and the latter example is the piped parallel processes running in parallel. See the IO REDIRECTION section to get more details of the piped parallel processes.
  1. scrun a.out :: b.out
  2. scrun a.out == b.out

A parallel process may consists of two or more spawns. A spawn is also a set of Linux processes derived from a parallel program. The all processes in a parallel process, no matter if it has one spawn or multiple spawns, share a network channel and communicate with the other processes. Syntactically spawns are separated by the : symbol and each spawn may accept option to specify the number of process to run. To distiguish the job option which affects entire SCore parallel job and the spawn option, the :: symbol can be used.
  1. scrun -nodes=16 :: -nodes=8 a.out : -nodes=8 b.out
  2. scrun -nodes=16 :: -nodes=8 a.out : b.out

In both cases in the above example, an SCore parallel process consisting with 8 processes derived from the program a.out and 8 processes derived from the program b.out. In the latter exmaple, the node option of the second spawn is omitted so that the rest of the empty cores out of 16 specifued in the job option "-nodes=16" are allocated.

IO REDIRECTION AND PIPED PARALLEL PROCESS

Like Linux shell programs, SCore supports input and/or output redirection. The standard input of each process of a aparallel process can be redirected to a file with the := symbol and the output can be redirected to a file with the =: symbol. If the output should be appended to a file, then the symbol =:: is used. The target file of this redirection is assumed to be located on each host where each process of a parallel process resides. Two or more parallel proess can be connected with pipes with the == sumbol. With the piped parallel processes, the standard output of each process in leading parallel proess is piped and the standard input of each process in tariling parallel process. Here are some examples.
  1. scrun a.out := file.input
  2. scrun a.out =: file.output
  3. scrun a.out =:: file.append
  4. scrun a.out == b.out

STANDARD INPUT

(not yet implemented)

FILE STAGING

(not yet implemented)

OTHER OPTIONS

group=HOSTGROUP

file=HOSTGROUP

hostfile=HOSTGROUP

nodefile=HOSTGROUP

Specifying the host (node) group. See scorehosts(1) for more details on how to specify HOSTGROUP.

nodes=NODESPEC

hosts=NODESPEC

Specifying number of nodes (processes)

network=NETNAMES

Specifying the name of PMX network to be used for user program to run with. The network name can be the name declared in the scorehosts.db file or the name of a PMX device. specify multiple networks, the network names are separated by the plus (+) symbol.

scrdnet=NETNAME

Specifying the name of PMX network to be used by SCore-D. The network name can be the name declared in the scorehosts.db file or the name of a PMX device.

wdir=DIRPATH

Current working directory

JOBID=JID

When the scrun command is invoked via batch scheduler, such as PBS or SGE, then the JID (job ID) given by the scheduler must be specified.

corebind=N0:N1:N2: ...

Specifying the bit patterns representing the binding between a processes and cores. The bit pattern expression can be a decimal, octal (prefixed by '0'), or hexa-decimal number (prefixed by '0x'). The left-most number is the bit pattern of the first process in the host and second number is for the second process, and so on. If the bit pattern is zero, then the corresponding process is bound and can run on any core. If the nodes option includes the socket specification, then the bit pattern is effective only with the cores in a socket.

monitor=MONOPTS

If this monitor options is set and its associated value is not no, then the real-time activity monitor will be invoked. It is a X-Window based program to display a parameter of each process of the job. If the option value is -b cpu then the business of processes, comm for communication frequency, mem for memory usage, and disk for disk usage. If the value if usr1 or then the value set by the user program will be displayed. Multiple option values can be specified by concatenating the values separated by plus (+) sumbol.

monitor=MONOPTS

When this option specified, then the some statistics values of each process of the last parallel process in a parallel job will be displayed.

SIGNAL

SIGINT, SIGHUP, SIGABRT, SIGFPE, SIGTERM, SIGUSR1, SIGUSR2, and SIGWINCH signals will be broadcasted to all remote processes. However highyly frequent SIGINT (^C) and SIGTSTP (^Z) signals can kill or suspend the scrun process, rather than broardcasting.

ENVIRONMENT

SCBDSERV

Scoreboard server. If this environment variable is not set, then the network option must be specified.

SCORE_OPTIONS

Additional scrun options

PM_DLPATH

Specifying the path for the dynamic loading of the PMX device libraries.

PM_DYN_all

When this environment variable is defined, then the PMX library try to load PMX device libraries dynamically.


SEE ALSO

scout(1), scorehosts(1), scoregroups(1), scope(1), scorehosts.db(5).
CREDIT
This document is a part of the SCore cluster system software developed at PC Cluster Consortium, Japan. Copyright (C) 2003-2008 PC Cluster Consortium.