The binary generated by an MPI compiler provided by MPICH-SCore runs on the SCore-D operating system only. Thus you should prepare for one of the Single user environment or the Multiple user environment of SCore-D operating system.
scout
shell
program. scout
provides a remote shell environment.
Then, you may run your MPI application using
mpirun
on the shell.
For example, when you run the application using four nodes:
$ setenv SCBDSERV server.pccluster.org
$ msgb -group pcc&
$ scout -g pcc
[comp0-3]:
SCOUT(3.1.0): Ready.
$
. . .
. . .
$ mpirun -np 4 ./mpi_program args ...
. . .
. . .
$ exit
SCOUT: session done
$
MPICH-SCore supports clusters consists of SMP nodes. Runtime code of the MPICH-SCore spawns multiple MPI processes on a SMP node.
You may specify the number of MPI processes on a SMP node using-np
option of
mpirun
. For example, when you need to run eight MPI processes on
four dual processor machines, specifies the options as follows:
$ mpirun -np 4x2 ./my_mpi_program args...
When you do not specify the number of MPI processes on a SMP node,
MPICH-SCore spawns MPI processes for each processor of the cluster.
Thus the "-np 4
" is equal to "-np 2x2
"
on a cluster consists of dual processor nodes.
You may use scrun
instead of mpirun
.
For example, when you run the MPI application using 16 processors on eight dual processor nodes:
$ scrun -nodes=8x2 ./my_mpi_program args...
When you do not specify -nodes
option for scrun
,
MPICH-SCore spawns MPI processes for each processor of all nodes reserved by scout
.
Here we show an example to run the sample application alltoall,
that measures performance of MPI_Alltoall
function.
This application requires two command line arguments. The first is message length for the all-to-all communication.
The second is number of iteration. The result consists of three fields, number of MPI processes,
message length and elapsed time for one all-to-all communication. Unit of the time is micro seconds:
$ setenv SCBDSERV server.pccluster.org
$ msgb -group pcc&
$ scout -g pcc
[comp0-3]:
SCOUT(3.1.0): Ready.
$
. . .
. . .
$ scrun -nodes=4x2 alltoall 3000 10000
SCORE: Connected (jid=1)
<0:0> SCORE: 8 nodes (4x2) ready.
8 3000 1052.230600
$
. . .
. . .
$ exit
SCOUT: session done
$
You may run the application just issuing mpirun
when using the multiple user environment.
Remember you should specify the hostname of SCore-D server as follows:
$ mpirun -np 4x2 -score scored=comp3.pccluster.org ./mpi_program args...
You are able to specify the hostname using
environment variable SCORE_OPTIONS
instead of the mpirun
option:
$ export SCORE_OPTIONS=scored=comp3.pccluster.org
$ mpirun -np 4x2 ./mpi_program args...
You may also use scrun
instead of
mpirun
:
$ export SCORE_OPTIONS=scored=comp3.pccluster.org
$ scrun -nodes=4x2 ./mpi_program args...
The way to specify the number of MPI processes in an SMP node is same to that for the single user environment. See the previous section.
MPICH-SCore transfers MPI messages using three protocols described below. The runtime code choose the one of them by the message size sending. You may change the border lines for choosing the protocol. The change makes possible to improve the performance of some applications.
The threshold to change the eager protocol from the short protocol is minimum length of MTU values of PM device.s
You can change the borderline between the eager protocol and the rendezvous protocol.
The default is 16 kbytes. To change this value, use mpi_eager
option
when running the application.
For example, to specify 300 kbytes as the borderline use mpi_eager
option as follows:
$ mpirun -np 4x2 -score mpi_eager=300000 ./mpi_program args...
Alternatively, you can use scrun
:
$ scrun -nodes=4x2,mpi_eager=300000 ./mpi_program args...
Currently, the remote memory access (RMA) facility of PM is supported on
PM/Myrinet and PM/Shmem, which is a PM that supports inter process communication within an SMP node.
You are able to use PM RMA when using the rendezvous protocol only.
To enable PM RMA, use mpi_zerocopy
option:
$ mpirun -np 4x2 -score mpi_zerocopy=on mpi_program args...Alternatively, you can use
scrun
as follows:
$ scrun -nodes=4x2,mpi_zerocopy=on ./mpi_program args...
Some RMA implementations, such as PM/Myrinet transmit data using DMA only. We call the message transfer using such a RMA the Zero-copy transfer since no memory copy by CPU is required when transmitting. MPICH-SCore realizes Zero-copy transfer when using RMA of PM/Myrinet. Zero-copy transfer improves maximum bandwidth of point-to-point message transfer because it reduces congestion of memory access. Zero-copy transfer is effective for some application. However it is not so effective for others since it involves overhead to synchronize the sender and the receiver.
The message transfer using PM/Shmem RMA is the One-copy transfer. Since PM/Shmem realizes copy between virtual memory spaces using the PM/Shmem device driver, the RMA is implemented as one copy.
MPICH-SCore version 1.0 (ch_score): Compilation of an MPI application
mpic++(1), mpicc(1), mpif77(1), mpirun(1) scrun(1)