scrun(1)

SCRUN(1)

USER COMMANDS

SCRUN(1)

NAME

scrun - SCore front-end process to run a user parallel program on a cluster

SYNOPSIS

scrun [-SCoreOptions] file [program_options]

DESCRIPTION

scrun is a front-end program for scored(8) that manages a variety of cluster resources. User programs running on a cluster must be invoked via the scrun program.

Firstly, scrun invokes the user program, specified by file, on the host where scrun was executed. This is done in such a way to get the required resource information. Then, scrun tries to login to scored. After login, scrun becomes a front-end process in order to control job status of the user program running on the cluster. When the user program finishes, scrun also terminates.

Valid arguments to scrun could be SCore options. The options are various resource specifications to SCore-D and/or options for language runtime systems on which user programs rely. In this manual page, only SCore-D options are described. The language system options must be consulted where those systems are installed.

If the first argument of scrun does not begin with the minus (-) character, or the third argument when SCore options are specified, then the next argument must be the filename of the program to be executed on the cluster. The specified executable file is copied and then invoked by scored on all allocated nodes in the cluster. Arguments following the filename are passed to the invocation of the executable file on all nodes.

The executable file must have read permission so that scrun can read the file and copy it to cluster nodes. The file must also be an executable file on the host where scrun is invoked, so that scrun can execute the file to get resource information.

The scrun program can also submit a parallel job to cluster hosts which have a different OS and/or CPU from the host where scrun is invoked. In this case, at least two executable files must be present, one for the scrun local invocation, and another for cluster execution. To allow for this situation, the executable files must be compiled with the SCore smake(1) commands (not make or gmake). In this case, the executable file must be a symbolic link to the .wrapper script which will be automatically created by the smake command. It is the users responsibility to have consistent heterogeneous executable files compiled from the same source code.

GENERAL FORMAT OF SCORE OPTIONS

The first character must be a minus (-), followed by keyword and value pairs, each pair is separated by a comma (,). The keyword is a predefined SCore literal and its associated value are separated by the equal (=) character. Here is an example:

scrun -nodes=2,cpulimit=4 a.out

In this case, two SCore options are specified, one is the "nodes" option and another the "cpulimit" option. The nodes option has a value of "2", and cpulimit has a value of "4".

If the same keywords are listed in the SCore options, then the leftmost one is taken. The value of the SCORE_OPTIONS environment variable is taken as the default option setting.

SINGLE USER MODE AND MULTIPLE USER MODE

The SCore-D operating system is designed to run multiple jobs at a time in a time-sharing manner. However, it has a single user mode to allow users to use the cluster exclusively. This is useful when users want to evaluate programs. When scrun is invoked in a SCOUT environment (see scout(1)) and no scored options are specified, then scrun firstly invokes scored on the cluster within the SCOUT environment, and then the user program will be executed on the invoked scored. When the user program terminates scored also terminates.

If group option is sepcified, then scrun creates the scout environment on the hosts specified by the value of the group option and then user program is executed in the single user mode. When the file option is specified and a set of hostnames is listed in the file specified as the option value, then the user program is executed in the single user mode in that host group. The checkpoint options is enabled when the group or file options is specified.

If scored is already running on a cluster, then the user must specify, with the scored option, the SCore-D server host which is accepting user logins or the hosts where SCore-D is running. User can also specify host group name by the scored option to specify the set of hosts where SCore-D is running on. Precisely, the value of the scored option is in the format which scorehosts can accept.

RESOURCE SPECIFICATION

SCore-D manages a variety of cluster resources, such as node, networks, disks, etc. In this section, those resource specification options are described.

nodes=[hosts][xprocs][.[bintype][.cpugen[.speed]]]: hosts is the number of hosts or nodes in a cluster required to run a user program. procs is the number of processes to be invoked on a SMP cluster. If procs is not present, and allocated hosts are in a SMP cluster, then the number of allocated hosts might be the number of hosts divided by the number of processors in the SMP. If the procs number is specified, then that number of processes on each SMP host is invoked if possible. If the number of requested nodes is less than the total number of nodes in the partition, then SCore-D allocates nodes such that node loads are balanced. The bintype option specifies the binary type to be run on a heterogeneous cluster. The name of the binary type comes from the smake or Hmake command and the .wrapper script. If you want to binary type name, please see glossary.
On a heterogeneous cluster, users can specify CPU types by the cpugen option. Possible values for cpugen are specified in scorehosts.db(5), which is a database containing all cluster information. The speed option values are also specified in scorehosts.db.
network=network_name[+network_name]...: Users can specify the network (PM device) by the network option to allocate the network for user program execution on a cluster. Valid network_name(s) are specified in scorehosts.db(5). Users can also specify multiple networks for a user program parallel execution.
priority=number: Scheduling priority can be specified with the priority option. The smaller the value, the higher the priority. A job having a higher value will be scheduled more often.
monitor[=monitor_type]: Attach a real-time user program execution monitor. Valid types for monitor_type are: load, comm, memory, disk, usr0, usr1, all and ALL. load attaches a CPU activity monitor, comm attaches a communication activity monitor. memory and disk options attach memory and disk usage monitors, respectively. The usage value is scaled to limit values, if specified. Otherwise they are scaled to the values of available free space when SCore-D is invoked. usr0 and usr1 options attach monitors displaying the values set by user program (See sc_set_monitor()). If the monitor option has no value, then load and communication monitors are attached. If user specifies all, then CPU, communication, memory usage and disk usage monitors are attached. If user specifies ALL, then all six monitors are attached. User must have an accesible X window server and the DISPLAY environment variable must be set correctly.
debug[=number]: The MPC++ or MPICH-SCore runtime system is programmed to detect exception signals such as SIGSEGV. When an exection signal is raised, the runtime system asks SCore-D to attach a GDB (GNU debugger) process to debug the user program. If the debug option is specified, and the user program is running in time sharing priority, then SCore-D creates a GDB process. Otherwise, the user program will be killed. The number options limits the number of debugger processes attached at the same time. The default value is 4 and the maximun number is limited to 10. If the DISPLAY environment variable is set, then SCore-D creates an xterm process in which GDB process runs. If the DISPLAY environment varibale is absent or having no value, but score.gdb file exists in the current directory and the file is readable from cluster hosts, then the GDB process will read the file and execute the GDB commands written in the file. If score.gdb file is not accesible, then the GDB process will execute only the backtrace GDB command.
stat[istics][=stat_type]: When a user program terminates, scored outputs resource usage information to standard error or the scrun process. The default is to only output summary information unless stat_type is specified. Valid types for stat_type are: all and detail. If either of these types are used then individual node information will be output.
scored=scored-server [multi-user mode only]: Specify SCore-D server hostname to login SCore-D which is already running with multi-user mode. If this option is not specified, then SCore-D is invoked by scrun in single-user mode.
group=hostgroup [single-user mode only]: Firstly a SCOUT environment is created according to the specified hostgroup, then user program is invoked in the SCOUT environment. Checkpoint is enabled with this options in the single-user mode.
file=filename [single-user mode only]: Firstly a SCOUT environment is created according to the list of hostnames written in the file filename, then user program is invoked in the SCOUT environment. Checkpoint is enabled with this options in the single-user mode.
corebinding=bits[:bits..]: Specifying the core (processor) and process binding. The value is the colon (:) separated list of bit patterns. The left most pattern is the binding pattern of process 0. Each bit pattern specifies the possible cores (processors) to be executed. The LSB is core zero (0). For example, the bit pattern of 3 means the process will be executed on core zero or one. The bit pattern can be specified in decimal, octal if prefixed by zero, or hexa-decimal if prefixed by "0x".
restart: Cluster hosts sometimes crash and running jobs are killed unexpectedly. If the restart option is set, user's program execution will be restarted from the beginning when scored is restarted with the -restart option. Note that this restart option is valid while the scrun process is alive. When the user kills the scrun process, restart never happens.
checkpoint[=interval]: This option is similar to the restart option, but user's program execution contexts are saved to local disk at a specified time interval. If the interval value is immediately followed by a character, 'm', 'h' or 'd', then the unit of the interval is minute, hour or day, respectively. When scored is restarted, program execution continues from where the more recent checkpoint was taken. This restart will only take place while the scrun process is alive. If you want to checkpoint with single user mode, you must execute with group or file option without the SCOUT environment.
cpulimit=limit: Specify the time limit (in seconds) of a user program to run.
memorylimit=limit: This option specifies memory limit (in MB). This option is effective when SCore-D is running in multi-user mode.
disklimit=limit: This option specifies disk limit (in MB). This option is effective when SCore-D is running in multi-user mode.
wait: If the wait option is specified and login to SCore-D because of specified resource is temporarily unavailable, then login is postponed until specified resource is available. This option is only effective for SCore-D running in multi-user mode.
message[=mode]: Control output messages produced by the SCore system at runtime. Valid modes for mode are: concise and quiet. The default is to output all messages. concise suppresses normal messages so only warning and error messages are output. quiet supresses all messages except for error messages.
resource: When the resource option is specified, scrun tries to investigate SCore options and resource requests of user program(s), and then the SCore options, resource requests, and pathname(s) of user program(s) will be displayed and then exit. User program(s) will not run on a cluster.
passhup: Before version 5.6, when the scrun process receives the SIGHUP signal, the signal is broadcasted to the processes running on cluster hosts. In SCore 5.6, however, when the scrun process receives the SIGHUP signal, then the standard outputs are redirected to the file named scrun PID.output, so that the scrun can survive even when the shell through which the scrun process is invoked terminates. This options is for backward compatibility, and if this options is specified, the output redirection will not take place and the SIGHUP signal will be broadcasted to the processes running on cluster hosts.
ts=timeslice [single-user mode only]: Specify the time interval (in milli-second) interrupting user program execution to detect a deadlock situation.

JOB CONTROL and SIGNALS

The job status of user program execution on a cluster is linked with the job status of scrun. Users can suspend, resume, or kill parallel jobs running on a cluster similar to a normal UNIX command by typing "^Z", "fg", and "^C". Further, if the output of scrun is stopped by "^S", eventually cluster execution can be suspended until scrun output is allowed by "^Q". Typing "^\" triggers checkpointing, instead of creating a core file, and waiting for its restart when SCore-D unexpectedly terminates (system down).
Some UNIX signals delivered to the scrun process will be forwarded and broadcasted to the processes running in a cluster. The forwarded signals are SIGINT, SIGABRT, SIGTERM, SIGURG, SIGWINCH, SIGUSR1, and SIGUSR2.

INPUT/OUTPUT REDIRECTION

Similar to the Unix shell, the standard inputs and/or outputs of a parallel process can be redirected to files. When a user program specified in the scrun arguments is followed by the ":=" symbol and a filename, then the standard inputs of the parallel process derived from the user program are the file. If the symbol is "=:," then the standard outputs are the file. If the symbol is "=::," then the outputs are appended to the file.
Note that the open files are local and located in compute hosts. Further, if the filename is a basename, there is no "/" in its name, then the files are created in an SCore-D working directory located on compute hosts, and they are removed when the parallel job is terminated. If the filename is an abosolute pathname, then the files are created on specified pathname. No relative pathname is allowed.
On an SMP cluster and the output redirection pathname is absolute, only the first process in a compute host will be redirected to the specified file, and the other processes will output to the /dev/null.

PARALLEL JOB

Similar to the Unix shell, scrun not only supports simple commands, but also pipelined commands and sequenced commands. Pipelined commands are separated with the "==" symbol, and sequenced commands with the "::" symbol. Parallel processes in a parallel job are allocated in the same partition (set of hosts) in a cluster. Pipelined parallel processes having the same node number but belong to different pipelined parallel processes are connected with the Unix pipe, just like pipelined commands under the Unix shell, and they are scheduled at the same time. Parallel processes are executed in sequence when they are separated with sequential symbol(s) ("::").
Sequential programs, such as normal Unix commands or C/C++ programs can run on a cluster via the system(6) comand, just like the way of the Unix system(3) function. This system command can be used for house keeping of a cluster.
Combining the scatter(6) command, user parallel program and gather(6) command in serial, users can move necessary data file back and forth between users' workstation and cluster hosts.

ENVIRONMENTS

DISPLAY
Specify X Window server.

SCORE_OPTIONS
Default scrun options can be set in this environment variable. Its value must be a list of pairs of a option name and an associated value separated by an equal (=) symbol. Each pair is separated by a comma (,). No space (blank) character must be included. The minus (-) symbol should NOT be placed at the begenning of the environment value.

SEE ALSO

scorehosts.db(5), SCore-D Job Scheduling Scheme, smake(1), scout(1), gather(6), scatter(6), system(6), scored(8), sc_set_monitor()(3), mpc++(1), mpirun scorehosts