Running an MPI application

Choice of the compiler environment

If you want to run MPI program on the SCore environment, you must run same MPI environment when the program ciompiled.

You can specify MPI implementation by -mpi option or SCORE_MPI encvironment variable. Currently, you can use mpich-1.2.5 or yampi. If you cannot specify the implementation, mpich-1.2.5 is used.

Environment

The binary generated by an MPI compiler provided by SCore runs on the SCore-D operating system only. Thus you should prepare for one of the Single user environment or the Multiple user environment of SCore-D operating system.

Running in the Single user environment

First you need to run the scout shell program. scout provides a remote shell environment. Then, you may run your MPI application using mpirun on the shell. For example, when you run the application using four nodes:

$ setenv SCBDSERV server.pccluster.org $ msgb -group pcc& $ scout -g pcc [comp0-3]: SCOUT(6.0.0): Ready. $ . . . . . . $ mpirun -np 4 ./mpi_program args ... . . . . . . $ exit SCOUT: session done $

If you want to specify scrun options, please use SCORE_OPTIONS environment variable.

$ SCORE_OPTIONS=group=pcc $ export SCORE_OPTIONS $ mpirun -np 4 ./mpi_program args ...

Running the sample application in a single user environment

Here we show an example to run the sample application alltoall, that measures performance of MPI_Alltoall function. This application requires two command line arguments. The first is message length for the all-to-all communication. The second is number of iteration. The result consists of three fields, number of MPI processes, message length and elapsed time for one all-to-all communication. Unit of the time is micro seconds:

$ SCBDSERV=server.pccluster.org $ export SCBDSERV $ msgb -group pcc& $ scout -g pcc [comp0-3]: SCOUT(6.0.0): Ready. $ . . . . . . $ mpirun -np 8 alltoall 3000 10000 SCORE: Connected (jid=1) <0:0> SCORE: 8 nodes (4x2) ready. 8 3000 1052.230600 $ . . . . . . $ exit SCOUT: session done $

Running in the Multiple user environment

You may run the application just issuing mpirun when using the multiple user environment. Remember you should specify the hostname of SCore-D server by SCORE_OPTIONS environment variable as follows:

$ SCORE_OPTIONS=scored=comp3.pccluster.org $ export SCORE_OPTIONS $ mpirun -np 4x2 ./mpi_program args... Runtime options for MPICH-SCore Specify scrun options MPICH-score use scrun options on mpirun with -score option. $ mpirun -np 8 -score scored=comp3 ./mpi_program args... Specify hosts by process on the SMP scluster On MPICH-SCore, -np option can specified nxm to run n hosts and m processes to each hosts. $ mpirun -np 4x1 ./mpi_program args... Execute by scrun MPICH-SCore program can run with scrun command. Sliding border lines to change the protocols MPICH-SCore transfers MPI messages using three protocols described below. The runtime code choose the one of them by the message size sending. You may change the border lines for choosing the protocol. The change makes possible to improve the performance of some applications. Short protocol transfers a MPI header and message body within a single packet of PM, the low level message passing system of SCore. The total size of the MPI header and the MPI message is limited to MTU of PM. The sender does not wait ready of the receiver. Thus a message unexpected by the receiver stops at a temporary buffer of the receiver. Eager protocol transfers a MPI header and message body using multiple packets of PM. The sender does not wait ready of the receiver as the short protocol. Thus a message unexpected by the receiver stops at a temporary buffer of the receiver. Rendezvous protocol transfers a MPI message synchronously. The sender waits ready of the receiver. Then the message body is carried. The remote memory access (RMA) facility of PM is used when the user enable it. Otherwise MPICH-SCore transfers the message body dividing it to multiple packets of PM. The threshold to change the eager protocol from the short protocol is minimum length of MTU values of PM device.s You can change the borderline between the eager protocol and the rendezvous protocol. The default is 16 kbytes. To change this value, use mpi_eager option when running the application. For example, to specify 300 kbytes as the borderline use mpi_eager option as follows: $ mpirun -np 4x2 -score mpi_eager=300000 ./mpi_program args... Alternatively, you can use scrun: $ scrun -nodes=4x2,mpi_eager=300000 ./mpi_program args... Using PM Remote Memory Access facility (Zero-copy/One-copy transfer) Currently, the remote memory access (RMA) facility of PM is supported on PM/Myrinet and PM/Shmem, which is a PM that supports inter process communication within an SMP node. You are able to use PM RMA when using the rendezvous protocol only. To enable PM RMA, use mpi_zerocopy option: $ mpirun -np 4x2 -score mpi_zerocopy=on mpi_program args... Alternatively, you can use scrun as follows: $ scrun -nodes=4x2,mpi_zerocopy=on ./mpi_program args... Some RMA implementations, such as PM/Myrinet transmit data using DMA only. We call the message transfer using such a RMA the Zero-copy transfer since no memory copy by CPU is required when transmitting. MPICH-SCore realizes Zero-copy transfer when using RMA of PM/Myrinet. Zero-copy transfer improves maximum bandwidth of point-to-point message transfer because it reduces congestion of memory access. Zero-copy transfer is effective for some application. However it is not so effective for others since it involves overhead to synchronize the sender and the receiver. The message transfer using PM/Shmem RMA is the One-copy transfer. Since PM/Shmem realizes copy between virtual memory spaces using the PM/Shmem device driver, the RMA is implemented as one copy. Runtime options for YAMPI Spoecify scrun options mpirun on YAMPI-SCore can specify scrun options with -scrun option. $ mpirun -np 8 -scrun scored=comp3 ./mpi_program args... Specify hosts by process on the SMP scluster On YAMPI-SCore, -np option can specified nxm to run n hosts and m processes to each hosts. $ mpirun -np 4x1 ./mpi_program args... Execute by scrun If you set environment variable _YAMPI_ARCH to PM, you can execute YAMPI-SCore programs with scrun command. $ _YAMPI_ARCH=PM $ export _YAMPI_ARCH $ scrun -nodes=4x1,group=pcc ./mpi_program args... Sliding border lines to change the rendezvous protocols YAMPI-SCore change to rendezvous protocol with _YAMPI_RSIZE environment variable. If you don't set _YAMPI_RSIZE, YAMPI-SCore don't used rendezvous protocol. $ _YAMPI_RSIZE=1024 $ export _YAMPI_RSIZE $ mpirun -np 4x2 ./mpi_program args... See also MPI-SCore: Compilation of an MPI application mpic++(1), mpicc(1), mpif77(1), mpirun(1) scrun(1) Providing Optional Compilers