SCRUN(1) | SCRUN(1) |
Firstly, scrun invokes the user program, specified by file, on the host where scrun was executed. This is done in such a way to get the required resource information. Then, scrun tries to login to scored. After login, scrun becomes a front-end process in order to control job status of the user program running on the cluster. When the user program finishes, scrun also terminates.
Valid arguments to scrun could be SCore options. The options are various resource specifications to SCore-D and/or options for language runtime systems on which user programs rely. In this manual page, only SCore-D options are described. The language system options must be consulted where those systems are installed.
If the first argument of scrun does not begin with the minus (-) character, or the third argument when SCore options are specified, then the next argument must be the filename of the program to be executed on the cluster. The specified executable file is copied and then invoked by scored on all allocated nodes in the cluster. Arguments following the filename are passed to the invocation of the executable file on all nodes.
The executable file must have read permission so that scrun can read the file and copy it to cluster nodes. The file must also be an executable file on the host where scrun is invoked, so that scrun can execute the file to get resource information.
The scrun program can also submit a parallel job to cluster hosts which have a different OS and/or CPU from the host where scrun is invoked. In this case, at least two executable files must be present, one for the scrun local invocation, and another for cluster execution. To allow for this situation, the executable files must be compiled with the SCore smake(1) or Hmake(1) commands (not make or gmake). In this case, the executable file must be a symbolic link to the .wrapper script which will be automatically created by the smake or Hmake command. It is the users responsibility to have consistent heterogeneous executable files compiled from the same source code.
GENERAL FORMAT OF SCORE OPTIONS
The first character must be a minus (-), followed by keyword and value pairs, each pair is separated by a comma (,). The keyword is a predefined SCore literal and its associated value are separated by the equal (=) character. Here is an example:
scrun -nodes=2,cpulimit=4 a.outIn this case, two SCore options are specified, one is the "nodes" option and another the "cpulimit" option. The nodes option has a value of "2", and cpulimit has a value of "4".
If the same keywords are listed in the SCore options, then the leftmost one is taken. The value of the SCORE_OPTIONS environment variable is taken as the default option setting.
SINGLE USER MODE AND MULTIPLE USER MODE
The SCore-D operating system is designed to run multiple jobs at a time in a time-sharing manner. However, it has a single user mode to allow users to use the cluster exclusively. This is useful when users want to evaluate programs. When scrun is invoked in a SCOUT environment (see scout(1)) and no scored options are specified, then scrun firstly invokes scored on the cluster within the SCOUT environment, and then the user program will be executed on the invoked scored. When the user program terminates scored also terminates.
If group option is sepcified, then scrun creates the scout environment on the hosts specified by the value of the group option and then user program is executed in the single user mode. When the file option is specified and a set of hostnames is listed in the file specified as the option value, then the user program is executed in the single user mode in that host group. The checkpoint options is enabled when the group or file options is specified.
If scored is already running on a cluster, then the user must specify, with the scored option, the SCore-D server host which is accepting user logins or the hosts where SCore-D is running. User can also specify host group name by the scored option to specify the set of hosts where SCore-D is running on. Precisely, the value of the scored option is in the format which scorehosts can accept.
RESOURCE SPECIFICATION
SCore-D manages a variety of cluster resources, such as node, networks, disks, etc. In this section, those resource specification options are described.
On a heterogeneous cluster, users can specify CPU types by the cpugen option. Possible values for cpugen are specified in scorehosts.db(5), which is a database containing all cluster information. The speed option values are also specified in scorehosts.db.
Proessor Type OS Binary Type i386 TurboLinux i386-turbo-linux i386 SuSE Linux i386-suse-linux alpha SuSE Linux alpha-suse-linux i386 Redhat Linux 7.x i386-redhat7-linux2_4 i386 Redhat Linux 8.x i386-redhat8-linux2_4 ia64 Redhat Linux 7.x ia64-redhat7-linux2_4 alpha Redhat Linux alpha-redhat-linux i386 NetBSD i386-unknown-netbsd Sparc SunOS4 sparc-sun-sunos4 Sparc SunOS5 sparc-sun-sunos5
sc_set_monitor()
).
If the monitor option has no
value, then load and communication monitors are attached.
If user specifies all, then CPU, communication, memory
usage and disk usage monitors are attached. If user specifies
ALL, then all six monitors are attached.
User must have an accesible X window server and the DISPLAY
environment variable must be set correctly.
The job status of user program execution on a cluster is linked with the job status of scrun. Users can suspend, resume, or kill parallel jobs running on a cluster similar to a normal UNIX command by typing "^Z", "fg", and "^C". Further, if the output of scrun is stopped by "^S", eventually cluster execution can be suspended until scrun output is allowed by "^Q". Typing "^\" triggers checkpointing, instead of creating a core file, and waiting for its restart when SCore-D unexpectedly terminates (system down).
Some UNIX signals delivered to the scrun process will be forwarded and broadcasted to the processes running in a cluster. The forwarded signals are SIGINT, SIGABRT, SIGTERM, SIGURG, SIGWINCH, SIGUSR1, and SIGUSR2.
Similar to the Unix shell, the standard inputs and/or outputs of a parallel process can be redirected to files. When a user program specified in the scrun arguments is followed by the ":=" symbol and a filename, then the standard inputs of the parallel process derived from the user program are the file. If the symbol is "=:," then the standard outputs are the file. If the symbol is "=::," then the outputs are appended to the file.
Note that the open files are local and located in compute hosts. Further, if the filename is a basename, there is no "/" in its name, then the files are created in an SCore-D working directory located on compute hosts, and they are removed when the parallel job is terminated. If the filename is an abosolute pathname, then the files are created on specified pathname. No relative pathname is allowed.
On an SMP cluster and the output redirection pathname is absolute, only the first process in a compute host will be redirected to the specified file, and the other processes will output to the /dev/null.
Similar to the Unix shell, scrun not only supports simple commands, but also pipelined commands and sequenced commands. Pipelined commands are separated with the "==" symbol, and sequenced commands with the "::" symbol. Parallel processes in a parallel job are allocated in the same partition (set of hosts) in a cluster. Pipelined parallel processes having the same node number but belong to different pipelined parallel processes are connected with the Unix pipe, just like pipelined commands under the Unix shell, and they are scheduled at the same time. Parallel processes are executed in sequence when they are separated with sequential symbol(s) ("::").
Sequential programs, such as normal Unix commands or C/C++ programs can run on a cluster via the system(6) comand, just like the way of the Unix system(3) function. This system command can be used for house keeping of a cluster.
Combining the scatter(6) command, user parallel program and gather(6) command in serial, users can move necessary data file back and forth between users' workstation and cluster hosts.
sc_set_monitor()
(3),
mpc++(1),
mpirun
scorehosts