PVM-SCore

PVM Overview and Introduction

The PVM software provides a unified framework within which parallel programs can be developed in an efficient and straightforward manner using existing hardware. PVM enables a collection of heterogeneous computer systems to be viewed as a single parallel virtual machine. PVM transparently handles all message routing, data conversion, and task scheduling across a network of incompatible computer architectures.

The PVM computing model is simple yet very general, and accommodates a wide variety of application program structures. The programming interface is deliberately straightforward, thus permitting simple program structures to be implemented in an intuitive manner. The user writes his application as a collection of cooperating tasks. Tasks access PVM resources through a library of standard interface routines. These routines allow the initiation and termination of tasks across the network as well as communication and synchronization between tasks. The PVM message-passing primitives are oriented towards heterogeneous operation, involving strongly typed constructs for buffering and transmission. Communication constructs include those for sending and receiving data structures as well as high-level primitives such as broadcast, barrier synchronization, and global sum.

PVM tasks may possess arbitrary control and dependency structures. In other words, at any point in the execution of a concurrent application, any task in existence may start or stop other tasks or add or delete computers from the virtual machine. Any process may communicate and/or synchronize with any other. Any specific control and dependency structure may be implemented under the PVM system by appropriate use of PVM constructs and host language control-flow statements.

Owing to its ubiquitous nature (specifically, the virtual machine concept) and also because of its simple but complete programming interface, the PVM system has gained widespread acceptance in the high-performance scientific computing community.

The PVM System

PVM (Parallel Virtual Machine) is a byproduct of an ongoing heterogeneous network computing research project involving the authors and their institutions. The general goals of this project are to investigate issues in, and develop solutions for, heterogeneous concurrent computing. PVM is an integrated set of software tools and libraries that emulates a general-purpose, flexible, heterogeneous concurrent computing framework on interconnected computers of varied architecture. The overall objective of the PVM system is to to enable such a collection of computers to be used cooperatively for concurrent or parallel computation. Detailed descriptions and discussions of the concepts, logistics, and methodologies involved in this network-based computing process are contained in the remainder of the book. Briefly, the principles upon which PVM is based include the following:

User-configured host pool : The application's computational tasks execute on a set of machines that are selected by the user for a given run of the PVM program. Both single-CPU machines and hardware multiprocessors (including shared-memory and distributed-memory computers) may be part of the host pool. The host pool may be altered by adding and deleting machines during operation (an important feature for fault tolerance).
Translucent access to hardware: Application programs either may view the hardware environment as an attributeless collection of virtual processing elements or may choose to exploit the capabilities of specific machines in the host pool by positioning certain computational tasks on the most appropriate computers.
Process-based computation: The unit of parallelism in PVM is a task (often but not always a Unix process), an independent sequential thread of control that alternates between communication and computation. No process-to-processor mapping is implied or enforced by PVM; in particular, multiple tasks may execute on a single processor.
Explicit message-passing model: Collections of computational tasks, each performing a part of an application's workload using data-, functional-, or hybrid decomposition, cooperate by explicitly sending and receiving messages to one another. Message size is limited only by the amount of available memory.
Heterogeneity support: The PVM system supports heterogeneity in terms of machines, networks, and applications. With regard to message passing, PVM permits messages containing more than one datatype to be exchanged between machines having different data representations.
Multiprocessor support: PVM uses the native message-passing facilities on multiprocessors to take advantage of the underlying hardware. Vendors often supply their own optimized PVM for their systems, which can still communicate with the public PVM version.

The PVM system is composed of two parts. The first part is a daemon, called pvmd3 and sometimes abbreviated pvmd , that resides on all the computers making up the virtual machine. (An example of a daemon program is the mail program that runs in the background and handles all the incoming and outgoing electronic mail on a computer.) Pvmd3 is designed so any user with a valid login can install this daemon on a machine. When a user wishes to run a PVM application, he first creates a virtual machine by starting up PVM. (Chapter 3 details how this is done.) The PVM application can then be started from a Unix prompt on any of the hosts. Multiple users can configure overlapping virtual machines, and each user can execute several PVM applications simultaneously.

The second part of the system is a library of PVM interface routines. It contains a functionally complete repertoire of primitives that are needed for cooperation between tasks of an application. This library contains user-callable routines for message passing, spawning processes, coordinating tasks, and modifying the virtual machine. The PVM computing model is based on the notion that an application consists of several tasks. Each task is responsible for a part of the application's computational workload. Sometimes an application is parallelized along its functions; that is, each task performs a different function, for example, input, problem setup, solution, output, and display. This process is often called functional parallelism. A more common method of parallelizing an application is called data parallelism . In this method all the tasks are the same, but each one only knows and solves a small part of the data. This is also referred to as the SPMD (single-program multiple-data) model of computing. PVM supports either or a mixture of these methods. Depending on their functions, tasks may execute in parallel and may need to synchronize or exchange data, although this is not always the case.

The PVM system currently supports C, C++, and Fortran languages. This set of language interfaces have been included based on the observation that the predominant majority of target applications are written in C and Fortran, with an emerging trend in experimenting with object-based languages and methodologies.

The C and C++ language bindings for the PVM user interface library are implemented as functions, following the general conventions used by most C systems, including Unix-like operating systems. To elaborate, function arguments are a combination of value parameters and pointers as appropriate, and function result values indicate the outcome of the call. In addition, macro definitions are used for system constants, and global variables such as errno and pvm_errno are the mechanism for discriminating between multiple possible outcomes. Application programs written in C and C++ access PVM library functions by linking against an archival library (libpvm3.a) that is part of the standard distribution.

Fortran language bindings are implemented as subroutines rather than as functions. This approach was taken because some compilers on the supported architectures would not reliably interface Fortran functions with C functions. One immediate implication of this is that an additional argument is introduced into each PVM library call for status results to be returned to the invoking program. Also, library routines for the placement and retrieval of typed data in message buffers are unified, with an additional parameter indicating the datatype. Apart from these differences (and the standard naming prefixes - pvm_ for C, and pvmf for Fortran), a one-to-one correspondence exists between the two language bindings. Fortran interfaces to PVM are implemented as library stubs that in turn invoke the corresponding C routines, after casting and/or dereferencing arguments as appropriate. Thus, Fortran applications are required to link against the stubs library (libfpvm3.a) as well as the C library.

All PVM tasks are identified by an integer task identifier (TID). Messages are sent to and received from tids. Since tids must be unique across the entire virtual machine, they are supplied by the local pvmd and are not user chosen. Although PVM encodes information into each TID (see Chapter 7 for details) the user is expected to treat the tids as opaque integer identifiers. PVM contains several routines that return TID values so that the user application can identify other tasks in the system. There are applications where it is natural to think of a group of tasks. And there are cases where a user would like to identify his tasks by the numbers 0 - (p - 1), where p is the number of tasks. PVM includes the concept of user named groups. When a task joins a group, it is assigned a unique ``instance'' number in that group. Instance numbers start at 0 and count up. In keeping with the PVM philosophy, the group functions are designed to be very general and transparent to the user. For example, any PVM task can join or leave any group at any time without having to inform any other task in the affected groups. Also, groups can overlap, and tasks can broadcast messages to groups of which they are not a member. Details of the available group functions are given in Chapter 5. To use any of the group functions, a program must be linked with libgpvm3.a.

The general paradigm for application programming with PVM is as follows. A user writes one or more sequential programs in C, C++, or Fortran 77 that contain embedded calls to the PVM library. Each program corresponds to a task making up the application. These programs are compiled for each architecture in the host pool, and the resulting object files are placed at a location accessible from machines in the host pool. To execute an application, a user typically starts one copy of one task (usually the ``master'' or ``initiating'' task) by hand from a machine within the host pool. This process subsequently starts other PVM tasks, eventually resulting in a collection of active tasks that then compute locally and exchange messages with each other to solve the problem. Note that while the above is a typical scenario, as many tasks as appropriate may be started manually. As mentioned earlier, tasks interact through explicit message passing, identifying each other with a system-assigned, opaque TID.

#include 
main()
{
    int  cc, tid, msgtag;
    char buf[100];
    printf("i'm t%x\n", pvm_mytid());
    cc = pvm_spawn("hello_other", (char**)0, 0, "", 1, tid);
    if (cc == 1) {
        msgtag = 1;
        pvm_recv(tid, msgtag);
        pvm_upkstr(buf);
        printf("from t%x: %s\n", tid, buf);
    } else
        printf("can't start hello_other\n");
    pvm_exit();
}

The above PVM program hello.c is the body of the PVM program hello, a simple example that illustrates the basic concepts of PVM programming. This program is intended to be invoked manually; after printing its task id (obtained with pvm_mytid()), it initiates a copy of another program called hello_other using the pvm_spawn() function. A successful spawn causes the program to execute a blocking receive using pvm_recv. After receiving the message, the program prints the message sent by its counterpart, as well its task id; the buffer is extracted from the message using pvm_upkstr. The final pvm_exit call dissociates the program from the PVM system.

#include "pvm3.h"
main()
{
    int  ptid, msgtag;
    char buf[100];
    ptid = pvm_parent();
    strcpy(buf, "hello, world from ");
    gethostname(buf + strlen(buf), 64);
    msgtag = 1;
    pvm_initsend(PvmDataDefault);
    pvm_pkstr(buf);
    pvm_send(ptid, msgtag);
    pvm_exit();
}

The above figure is a listing of the ``slave'' or spawned program; its first PVM action is to obtain the task id of the ``master'' using the pvm_parent call. This program then obtains its hostname and transmits it to the master using the three-call sequence - pvm_initsend to initialize the send buffer; pvm_pkstr to place a string, in a strongly typed and architecture-independent manner, into the send buffer; and pvm_send to transmit it to the destination process specified by ptid, ``tagging'' the message with the number 1.

How to make PVM Programs

The first step is to invoke a small script which will set some environment variables, also copy the example programs into your own area:

    % cp -r $PVM_ROOT/examples $HOME/pvm3/examples
    % cd $HOME/pvm3/examples

The examples directory contains a Makefile.aimk and Readme file that describe how to build the examples. PVM supplies an architecture-independent make, aimk, that automatically determines PVM_ARCH and links any operating system specific libraries to your application. aimk was automatically added to your $PATH when you placed the cshrc.stub in your .cshrc file. Using aimk allows you to leave the source code and makefile unchanged as you compile across different architectures.

The master/slave programming model is the most popular model used in distributed computing. (In the general parallel programming arena, the SPMD model is more popular.) To compile the master/slave C example, type

    % aimk master slave

If you prefer to work with Fortran, compile the Fortran version with

    % aimk fmaster fslave

Depending on the location of PVM_ROOT, the INCLUDE statement at the top of the Fortran examples may need to be changed. If PVM_ROOT is not HOME/pvm3, then change the include to point to $PVM_ROOT/include/fpvm3.h. Note that PVM_ROOT is not expanded inside the Fortran, so you must insert the actual path.

The makefile moves the executables to $HOME/pvm3/bin/PVM_ARCH, which is the default location PVM will look for them on all hosts. If your file system is not common across all your PVM hosts, then you will have to build or copy (depending on the architectures) these executables on all your PVM hosts.

Running PVM Programs under SCore

First you need to allocate a partition which is large enough for your application (scout -g ). Then you can submit your jobs with scrun:

    % scrun -nodes=Nx1 pvmd -e X application argv

will allocate N Processors and will start X instances of your application. If N > X then pvm_spawn will use the remaining processors to place jobs to.

PVM User Interface

In this chapter we give a brief description of the routines in the PVM 3 user library. This chapter is organized by the functions of the routines. For example, in the section on Message Passing is a discussion of all the routines for sending and receiving data from one PVM task to another and a description of PVM's message passing options. The calling syntax of the C and Fortran PVM routines are highlighted by boxes in each section.

An alphabetical listing of all the routines is given in Appendix B. Appendix B contains a detailed description of each routine, including a description of each argument in each routine, the possible error codes a routine may return, and the possible reasons for the error. Each listing also includes examples of both C and Fortran use.

In PVM 3 all PVM tasks are identified by an integer supplied by the local pvmd. In the following descriptions this task identifier is called TID. It is similar to the process ID (PID) used in the Unix system and is assumed to be opaque to the user, in that the value of the TID has no special significance to him. In fact, PVM encodes information into the TID for its own internal use. Details of this encoding can be found in Chapter 7.

All the PVM routines are written in C. C++ applications can link to the PVM library. Fortran applications can call these routines through a Fortran 77 interface supplied with the PVM 3 source. This interface translates arguments, which are passed by reference in Fortran, to their values if needed by the underlying C routines. The interface also takes into account Fortran character string representations and the various naming conventions that different Fortran compilers use to call C functions.

The PVM communication model assumes that any task can send a message to any other PVM task and that there is no limit to the size or number of such messages. While all hosts have physical memory limitations that limits potential buffer space, the communication model does not restrict itself to a particular machine's limitations and assumes sufficient memory is available. The PVM communication model provides asynchronous blocking send, asynchronous blocking receive, and nonblocking receive functions. In our terminology, a blocking send returns as soon as the send buffer is free for reuse, and an asynchronous send does not depend on the receiver calling a matching receive before the send can return. There are options in PVM 3 that request that data be transferred directly from task to task. In this case, if the message is large, the sender may block until the receiver has called a matching receive.

A nonblocking receive immediately returns with either the data or a flag that the data has not arrived, while a blocking receive returns only when the data is in the receive buffer. In addition to these point-to-point communication functions, the model supports multicast to a set of tasks and broadcast to a user-defined group of tasks. There are also functions to perform global max, global sum, etc., across a user-defined group of tasks. Wildcards can be specified in the receive for the source and label, allowing either or both of these contexts to be ignored. A routine can be called to return information about received messages.

The PVM model guarantees that message order is preserved. If task 1 sends message A to task 2, then task 1 sends message B to task 2, message A will arrive at task 2 before message B. Moreover, if both messages arrive before task 2 does a receive, then a wildcard receive will always return message A.

Message buffers are allocated dynamically. Therefore, the maximum message size that can be sent or received is limited only by the amount of available memory on a given host. There is only limited flow control built into PVM 3.3. PVM may give the user a can't get memory error when the sum of incoming messages exceeds the available memory, but PVM does not tell other tasks to stop sending to this host.

Process Control

    int tid = pvm_mytid( void )
    call pvmfmytid( tid )

The routine pvm_mytid() returns the TID of this process and can be called multiple times. It enrolls this process into PVM if this is the first PVM call. Any PVM system call (not just pvm_mytid) will enroll a task in PVM if the task is not enrolled before the call, but it is common practice to call pvm_mytid first to perform the enrolling.

    int info = pvm_exit( void )
    call pvmfexit( info )

The routine pvm_exit() tells the local pvmd that this process is leaving PVM. This routine does not kill the process, which can continue to perform tasks just like any other UNIX process. Users typically call pvm_exit right before exiting their C programs and right before STOP in their Fortran programs.

     int numt = pvm_spawn(char *task, char **argv, int flag,
                          char *where, int ntask, int *tids )
     call pvmfspawn( task, flag, where, ntask, tids, numt )

The routine pvm_spawn() starts up ntask copies of an executable file task on the virtual machine. argv is a pointer to an array of arguments to task with the end of the array specified by NULL. If task takes no arguments, then argv is NULL. The flag argument is used to specify options, and is a sum of:

Value	Option	Meaning
0	PvmTaskDefault	PVM chooses where to spawn processes.
1	PvmTaskHost	where argument is a particular host to spawn on.
2	PvmTaskArch	where argument is a PVM_ARCH to spawn on.
4	PvmTaskDebug	starts tasks under a debugger.
8	PvmTaskTrace	trace data is generated.
16	PvmMppFront	starts tasks on MPP front-end.
32	PvmHostCompl	complements host set in where.

These names are predefined in pvm3/include/pvm3.h. In Fortran all the names are predefined in parameter statements which can be found in the include file pvm3/include/fpvm3.h.

PvmTaskTrace is a new feature in PVM 3.3. It causes spawned tasks to generate trace events . PvmTasktrace is used by XPVM (see Chapter 8). Otherwise, the user must specify where the trace events are sent in pvm_setopt().

On return, numt is set to the number of tasks successfully spawned or an error code if no tasks could be started. If tasks were started, then pvm_spawn() returns a vector of the spawned tasks' tids; and if some tasks could not be started, the corresponding error codes are placed in the last ntask - numt positions of the vector.

The pvm_spawn() call can also start tasks on multiprocessors. In the case of the Intel iPSC/860 the following restrictions apply. Each spawn call gets a subcube of size ntask and loads the program task on all of these nodes. The iPSC/860 OS has an allocation limit of 10 subcubes across all users, so it is better to start a block of tasks on an iPSC/860 with a single pvm_spawn() call rather than several calls. Two different blocks of tasks spawned separately on the iPSC/860 can still communicate with each other as well as any other PVM tasks even though they are in separate subcubes. The iPSC/860 OS has a restriction that messages going from the nodes to the outside world be less than 256 Kbytes.

    int info = pvm_kill( int tid )
    call pvmfkill( tid, info )

The routine pvm_kill() kills some other PVM task identified by TID. This routine is not designed to kill the calling task, which should be accomplished by calling pvm_exit() followed by exit().

    int info = pvm_catchout( FILE *ff )
    call pvmfcatchout( onoff )

The default is to have PVM write the stderr and stdout of spawned tasks to the log file /tmp/pvml.<uid>. The routine pvm_catchout causes the calling task to catch output from tasks subsequently spawned. Characters printed on stdout or stderr in children tasks are collected by the pvmds and sent in control messages to the parent task, which tags each line and appends it to the specified file (in C) or standard output (in Fortran). Each of the prints is prepended with information about which task generated the print, and the end of the print is marked to help separate outputs coming from several tasks at once.

If pvm_exit is called by the parent while output collection is in effect, it will block until all tasks sending it output have exited, in order to print all their output. To avoid this, one can turn off the output collection by calling pvm_catchout(0) before calling pvm_exit.

New capabilities in PVM 3.3 include the ability to register special PVM tasks to handle the jobs of adding new hosts, mapping tasks to hosts, and starting new tasks. This creates an interface for advanced batch schedulers (examples include Condor [7], DQS [6], and LSF [4]) to plug into PVM and run PVM jobs in batch mode. These register routines also create an interface for debugger writers to develop sophisticated debuggers for PVM.

The routine names are pvm_reg_rm(), pvm_reg_hoster(), and pvm_reg_tasker(). These are advanced functions not meant for the average PVM user and thus are not presented in detail here. Specifics can be found in Appendix B.

Information

    int tid = pvm_parent( void )
    call pvmfparent( tid )

The routine pvm_parent() returns the TID of the process that spawned this task or the value of PvmNoParent if not created by pvm_spawn().

    int dtid = pvm_tidtohost( int tid )
    call pvmftidtohost( tid, dtid )

The routine pvm_tidtohost() returns the TID dtid of the daemon running on the same host as TID. This routine is useful for determining on which host a given task is running. More general information about the entire virtual machine, including the textual name of the configured hosts, can be obtained by using the following functions:

    int info = pvm_config(int *nhost, int *narch,
                          struct pvmhostinfo **hostp )
    call pvmfconfig( nhost, narch, dtid, name, arch, speed, info)

The routine pvm_config() returns information about the virtual machine including the number of hosts, nhost, and the number of different data formats, narch. hostp is a pointer to a user declaried array of pvmhostinfo structures. The array should be of size at least nhost. On return, each pvmhostinfo structure contains the pvmd TID, host name, name of the architecture, and relative CPU speed for that host in the configuration.

The Fortran function returns information about one host per call and cycles through all the hosts. Thus, if pvmfconfig is called nhost times, the entire virtual machine will be represented. The Fortran interface works by saving a copy of the hostp array and returning one entry per call. All the hosts must be cycled through before a new hostp array is obtained. Thus, if the virtual machine is changing during these calls, then the change will appear in the nhost and narch parameters, but not in the host information. Presently, there is no way to reset pvmfconfig() and force it to restart the cycle when it is in the middle.

    int info  pvm_tasks(int which, int *ntask,
                        struct pvmtaskinfo **taskp )
    call pvmftasks(which, ntask, tid, ptid, dtid,
                   flag, aout, info )

The routine pvm_tasks() returns information about the PVM tasks running on the virtual machine. The integer which specifies which tasks to return information about. The present options are (0), which means all tasks, a pvmd TID (dtid), which means tasks running on that host, or a TID, which means just the given task.

The number of tasks is returned in ntask. taskp is a pointer to an array of pvmtaskinfo structures. The array is of size ntask. Each pvmtaskinfo structure contains the TID, pvmd TID, parent TID, a status flag, and the spawned file name. (PVM doesn't know the file name of manually started tasks and so leaves these blank.) The Fortran function returns information about one task per call and cycles through all the tasks. Thus, if where = 0, and pvmftasks is called ntask times, all tasks will be represented. The Fortran implementation assumes that the task pool is not changing while it cycles through the tasks. If the pool changes, these changes will not appear until the next cycle of ntask calls begins.

Examples of the use of pvm_config and pvm_tasks can be found in the source to the PVM console, which is just a PVM task itself. Examples of the use of the Fortran versions of these routines can be found in the source pvm3/examples/testall.f.

Signaling

    int info = pvm_sendsig( int tid, int signum )
    call pvmfsendsig( tid, signum, info )

    int info = pvm_notify( int what, int msgtag, int cnt, int tids )
    call pvmfnotify( what, msgtag, cnt, tids, info )

The routine pvm_sendsig() sends a signal signum to another PVM task identified by TID. The routine pvm_notify requests PVM to notify the caller on detecting certain events. The present options are as follows:

PvmTaskExit
notify if a task exits.
PvmHostDelete
notify if a host is deleted (or fails).
PvmHostAdd
notify if a host is added.

In response to a notify request, some number of messages (see Appendix B) are sent by PVM back to the calling task. The messages are tagged with the user supplied msgtag. The tids array specifies who to monitor when using TaskExit or HostDelete. The array contains nothing when using HostAdd. If required, the routines pvm_config and pvm_tasks can be used to obtain task and pvmd tids.

If the host on which task A is running fails, and task B has asked to be notified if task A exits, then task B will be notified even though the exit was caused indirectly by the host failure.

Setting and Getting Options

    int oldval = pvm_setopt( int what, int val )
    int val = pvm_getopt( int what )

    call pvmfsetopt( what, val, oldval )
    call pvmfgetopt( what, val )

The routine pvm_setopt is a general-purpose function that allows the user to set or get options in the PVM system. In PVM 3, pvm_setopt can be used to set several options, including automatic error message printing, debugging level, and communication routing method for all subsequent PVM calls. pvm_setopt returns the previous value of set in oldval. The PVM 3.3 what can have the following values:

Option	Value	Meaning
PvmRoute	1	routing policy
PvmDebugMask	2	debugmask
PvmAutoErr	3	auto error reporting
PvmOutputTid	4	stdout destination for children
PvmOutputCode	5	output msgtag
PvmTraceTid	6	trace destination for children
PvmTraceCode	7	trace msgtag
PvmFragSize	8	message fragment size
PvmResvTids	9	allow messages to reserved tags and tids
PvmSelfOutputTid	10	stdout destination for self
PvmSelfOutputCode	11	output msgtag
PvmSelfTraceTid	12	trace destination for self
PvmSelfTraceCode	13	trace msgtag

See Appendix B for allowable values for these options. Future expansions to this list are planned.

The most popular use of pvm_setopt is to enable direct route communication between PVM tasks. As a general rule of thumb, PVM communication bandwidth over a network doubles by calling

    pvm_setopt( PvmRoute, PvmRouteDirect );

The drawback is that this faster communication method is not scalable under Unix; hence, it may not work if the application involves over 60 tasks that communicate randomly with each other. If it doesn't work, PVM automatically switches back to the default communication method. It can be called multiple times during an application to selectively set up direct task-to-task communication links, but typical use is to call it once after the initial call to pvm_mytid().

Message Passing

Sending a message comprises three steps in PVM. First, a send buffer must be initialized by a call to pvm_initsend() or pvm_mkbuf(). Second, the message must be ``packed'' into this buffer using any number and combination of pvm_pk*() routines. (In Fortran all message packing is done with the pvmfpack() subroutine.) Third, the completed message is sent to another process by calling the pvm_send() routine or multicast with the pvm_mcast() routine.

A message is received by calling either a blocking or nonblocking receive routine and then ``unpacking'' each of the packed items from the receive buffer. The receive routines can be set to accept any message, or any message from a specified source, or any message with a specified message tag, or only messages with a given message tag from a given source. There is also a probe function that returns whether a message has arrived, but does not actually receive it.

If required, other receive contexts can be handled by PVM 3. The routine pvm_recvf() allows users to define their own receive contexts that will be used by the subsequent PVM receive routines.

Message Buffers

    int bufid = pvm_initsend( int encoding )
    call pvmfinitsend( encoding, bufid )

If the user is using only a single send buffer (and this is the typical case) then pvm_initsend() is the only required buffer routine. It is called before packing a new message into the buffer. The routine pvm_initsend clears the send buffer and creates a new one for packing a new message. The encoding scheme used for this packing is set by encoding. The new buffer identifier is returned in bufid.

The encoding options are as follows:

PvmDataDefault
XDR encoding is used by default because PVM cannot know whether the user is going to add a heterogeneous machine before this message is sent. If the user knows that the next message will be sent only to a machine that understands the native format, then he can use PvmDataRaw encoding and save on encoding costs.
PvmDataRaw
no encoding is done. Messages are sent in their original format. If the receiving process cannot read this format, it will return an error during unpacking.
PvmDataInPlace
data left in place to save on packing costs. Buffer contains only sizes and pointers to the items to be sent. When pvm_send() is called, the items are copied directly out of the user's memory. This option decreases the number of times the message is copied at the expense of requiring the user to not modify the items between the time they are packed and the time they are sent. One use of this option would be to call pack once and modify and send certain items (arrays) multiple times during an application. An example would be passing of boundary regions in a discretized PDE implementation.

The following message buffer routines are required only if the user wishes to manage multiple message buffers inside an application. Multiple message buffers are not required for most message passing between processes. In PVM 3 there is one active send buffer and one active receive buffer per process at any given moment. The developer may create any number of message buffers and switch between them for the packing and sending of data. The packing, sending, receiving, and unpacking routines affect only the active buffers.

    int bufid = pvm_mkbuf( int encoding )
    call pvmfmkbuf( encoding, bufid )

The routine pvm_mkbuf creates a new empty send buffer and specifies the encoding method used for packing messages. It returns a buffer identifier bufid.

    int info = pvm_freebuf( int bufid )
    call pvmffreebuf( bufid, info )

The routine pvm_freebuf() disposes of the buffer with identifier bufid. This should be done after a message has been sent and is no longer needed. Call pvm_mkbuf() to create a buffer for a new message if required. Neither of these calls is required when using pvm_initsend(), which performs these functions for the user.

    int bufid = pvm_getsbuf( void )
    call pvmfgetsbuf( bufid )
    int bufid = pvm_getrbuf( void )
    call pvmfgetrbuf( bufid )

pvm_getsbuf() returns the active send buffer identifier. pvm_getrbuf() returns the active receive buffer identifier.

    int oldbuf  = pvm_setsbuf( int bufid )
    call pvmfsetrbuf( bufid, oldbuf )
    int oldbuf  = pvm_setrbuf( int bufid )
    call pvmfsetrbuf( bufid, oldbuf )

These routines set the active send (or receive) buffer to bufid, save the state of the previous buffer, and return the previous active buffer identifier oldbuf.

If bufid is set to 0 in pvm_setsbuf() or pvm_setrbuf(), then the present buffer is saved and there is no active buffer. This feature can be used to save the present state of an application's messages so that a math library or graphical interface which also uses PVM messages will not interfere with the state of the application's buffers. After they complete, the application's buffers can be reset to active.

It is possible to forward messages without repacking them by using the message buffer routines. This is illustrated by the following fragment.

    bufid = pvm_recv( src, tag );
    oldid = pvm_setsbuf( bufid );
    info = pvm_send( dst, tag );
    info = pvm_freebuf( oldid );

Packing Data

Each of the following C routines packs an array of the given data type into the active send buffer. They can be called multiple times to pack data into a single message. Thus, a message can contain several arrays each with a different data type. C structures must be passed by packing their individual elements. There is no limit to the complexity of the packed messages, but an application should unpack the messages exactly as they were packed. Although this is not strictly required, it is a safe programming practice.

The arguments for each of the routines are a pointer to the first item to be packed, nitem which is the total number of items to pack from this array, and stride which is the stride to use when packing. A stride of 1 means a contiguous vector is packed, a stride of 2 means every other item is packed, and so on. An exception is pvm_pkstr() which by definition packs a NULL terminated character string and thus does not need nitem or stride arguments.

    int info = pvm_pkbyte(char *cp, int nitem, int stride)
    int info = pvm_pkcplx(float *xp, int nitem, int stride)
    int info = pvm_pkdcplx(double *zp, int nitem, int stride)
    int info = pvm_pkdouble( double *dp, int nitem, int stride)
    int info = pvm_pkfloat(float *fp, int nitem, int stride)
    int info = pvm_pkint(int *np,int nitem, int stride)
    int info = pvm_pklong(long *np, int nitem, int stride)
    int info = pvm_pkshort(short *np, int nitem, int stride)
    int info = pvm_pkstr(char *cp)
    int info = pvm_packf(const char *fmt, ...)

PVM also supplies a packing routine that uses a printf-like format expression to specify what data to pack and how to pack it into the send buffer. All variables are passed as addresses if count and stride are specified; otherwise, variables are assumed to be values. A description of the format syntax is given in Appendix B.

A single Fortran subroutine handles all the packing functions of the above C routines.

    call pvmfpack( what, xp, nitem, stride, info )

The argument xp is the first item of the array to be packed. Note that in Fortran the number of characters in a string to be packed must be specified in nitem. The integer what specifies the type of data to be packed. The supported options are as follows:

STRING     0       REAL4      4
BYTE1      1       COMPLEX8   5
INTEGER2   2       REAL8      6
INTEGER4   3       COMPLEX16 7

These names have been predefined in parameter statements in the include file pvm3/include/fpvm3.h. Some vendors may >extend this list to include 64-bit architectures in their PVM implementations. We will be adding INTEGER8, REAL16, etc., as soon as XDR support for these data types is available.

Sending and Receiving Data

    int info = pvm_send( int tid, int msgtag )
    call pvmfsend( tid, msgtag, info )
    int info = pvm_mcast( int *tids, int ntask, int msgtag )
    call pvmfmcast( ntask, tids, msgtag, info )

The routine pvm_send() labels the message with an integer identifier msgtag and sends it immediately to the process TID.

The routine pvm_mcast() labels the message with an integer identifier msgtag and broadcasts the message to all tasks specified in the integer array tids (except itself). The tids array is of length ntask.

    int info = pvm_psend( int tid, int msgtag,
                          void *vp, int cnt, int type )
    call pvmfpsend( tid, msgtag, xp, cnt, type, info )

The routine pvm_psend() packs and sends an array of the specified datatype to the task identified by TID. The defined datatypes for Fortran are the same as for pvmfpack(). In C the type argument can be any of the following:

PVM_STR	PVM_INT	PVM_FLOAT	PVM_DCPLX
PVM_BYTE	PVM_LONG	PVM_CPLX	PVM_UINT
PVM_SHORT	PVM_USHORT	PVM_DOUBLE	PVM_ULONG

PVM contains several methods of receiving messages at a task. There is no function matching in PVM, for example, that a pvm_psend must be matched with a pvm_precv. Any of the following routines can be called for any incoming message no matter how it was sent (or multicast).

    int bufid = pvm_recv( int tid, int msgtag )
    call pvmfrecv( tid, msgtag, bufid )

This blocking receive routine will wait until a message with label msgtag has arrived from TID. A value of -1 in msgtag or TID matches anything (wildcard). It then places the message in a new active receive buffer that is created. The previous active receive buffer is cleared unless it has been saved with a pvm_setrbuf() call.

    int bufid = pvm_nrecv( int tid, int msgtag )
    call pvmfnrecv( tid, msgtag, bufid )

If the requested message has not arrived, then the nonblocking receive pvm_nrecv() returns bufid = 0. This routine can be called multiple times for the same message to check whether it has arrived, while performing useful work between calls. When no more useful work can be performed, the blocking receive pvm_recv() can be called for the same message. If a message with label msgtag has arrived from TID, pvm_nrecv() places this message in a new active receive buffer (which it creates) and returns the ID of this buffer. The previous active receive buffer is cleared unless it has been saved with a pvm_setrbuf() call. A value of -1 in msgtag or TID matches anything (wildcard).

    int bufid = pvm_probe( int tid, int msgtag )
    call pvmfprobe( tid, msgtag, bufid )

If the requested message has not arrived, then pvm_probe() returns bufid = 0. Otherwise, it returns a bufid for the message, but does not ``receive'' it. This routine can be called multiple times for the same message to check whether it has arrived, while performing useful work between calls. In addition, pvm_bufinfo() can be called with the returned bufid to determine information about the message before receiving it.

    int bufid = pvm_trecv( int tid, int msgtag, struct timeval *tmout )
    call pvmftrecv( tid, msgtag, sec, usec, bufid )

PVM also supplies a timeout version of receive. Consider the case where a message is never going to arrive (because of error or failure); the routine pvm_recv would block forever. To avoid such situations, the user may wish to give up after waiting for a fixed amount of time. The routine pvm_trecv() allows the user to specify a timeout period. If the timeout period is set very large, then pvm_trecv acts like pvm_recv. If the timeout period is set to zero, then pvm_trecv acts like pvm_nrecv. Thus, pvm_trecv fills the gap between the blocking and nonblocking receive functions.

    int info = pvm_bufinfo( int bufid, int *bytes, int *msgtag, int *tid )
    call pvmfbufinfo( bufid, bytes, msgtag, tid, info )

The routine pvm_bufinfo() returns msgtag, source TID, and length in bytes of the message identified by bufid. It can be used to determine the label and source of a message that was received with wildcards specified.

    int info = pvm_precv( int tid, int msgtag, void *vp, int cnt,
                          int type, int *rtid, int *rtag, int *rcnt )
    call pvmfprecv( tid, msgtag, xp, cnt, type, rtid, rtag, rcnt, info )

The routine pvm_precv() combines the functions of a blocking receive and unpacking the received buffer. It does not return a bufid. Instead, it returns the actual values of TID, msgtag, and cnt.

    int (*old)() = pvm_recvf( int (*new)(int buf, int tid, int tag) )

The routine pvm_recvf() modifies the receive context used by the receive functions and can be used to extend PVM. The default receive context is to match on source and message tag. This can be modified to any user-defined comparison function. (See Appendix B for an example of creating a probe function with pvm_recf().) There is no Fortran interface routine for pvm_recvf().

Unpacking Data

The following C routines unpack (multiple) data types from the active receive buffer. In an application they should match their corresponding pack routines in type, number of items, and stride. nitem is the number of items of the given type to unpack, and stride is the stride.

    int info = pvm_upkbyte(char *cp, int nitem,int stride )
    int info = pvm_upkcplx(float *xp, int nitem, int stride )
    int info = pvm_upkdcplx(double *zp, int nitem, int stride )
    int info = pvm_upkdouble( double *dp, int nitem, int stride )
    int info = pvm_upkfloat(float *fp, int nitem, int stride)
    int info = pvm_upkint(int *np, int nitem, int stride )
    int info = pvm_upklong(long *np, int nitem, int stride )
    int info = pvm_upkshort(short *np, int nitem, int stride)
    int info = pvm_upkstr(char *cp)
    int info = pvm_unpackf( const char *fmt, ... )

The routine pvm_unpackf() uses a printf-like format expression to specify what data to unpack and how to unpack it from the receive buffer.

A single Fortran subroutine handles all the unpacking functions of the above C routines.

    call pvmfunpack( what, xp, nitem, stride, info )

The argument xp is the array to be unpacked into. The integer argument what specifies the type of data to be unpacked. (Same what
options as for pvmfpack()).

Dynamic Process Groups

The dynamic process group functions are built on top of the core PVM routines. A separate library libgpvm3.a must be linked with user programs that make use of any of the group functions. The pvmd does not perform the group functions. This task is handled by a group server that is automatically started when the first group function is invoked. There is some debate about how groups should be handled in a message-passing interface. The issues include efficiency and reliability, and there are tradeoffs between static versus dynamic groups. Some people argue that only tasks in a group can call group functions.

In keeping with the PVM philosophy, the group functions are designed to be very general and transparent to the user, at some cost in efficiency. Any PVM task can join or leave any group at any time without having to inform any other task in the affected groups. Tasks can broadcast messages to groups of which they are not a member. In general, any PVM task may call any of the following group functions at any time. The exceptions are pvm_lvgroup(), pvm_barrier(), and pvm_reduce(), which by their nature require the calling task to be a member of the specified group.

    int inum = pvm_joingroup( char *group )
    int info = pvm_lvgroup( char *group )

    call pvmfjoingroup( group, inum )
    call pvmflvgroup( group, info )

These routines allow a task to join or leave a user named group. The first call to pvm_joingroup() creates a group with name group and puts the calling task in this group. pvm_joingroup() returns the instance number (inum) of the process in this group. Instance numbers run from 0 to the number of group members minus 1. In PVM 3, a task can join multiple groups.

If a process leaves a group and then rejoins it, that process may receive a different instance number. Instance numbers are recycled so a task joining a group will get the lowest available instance number. But if multiple tasks are joining a group, there is no guarantee that a task will be assigned its previous instance number.

To assist the user in maintaining a continuous set of instance numbers despite joining and leaving, the pvm_lvgroup() function does not return until the task is confirmed to have left. A pvm_joingroup() called after this return will assign the vacant instance number to the new task. It is the user's responsibility to maintain a contiguous set of instance numbers if the algorithm requires it. If several tasks leave a group and no tasks join, then there will be gaps in the instance numbers.

    int tid = pvm_gettid( char *group, int inum )
    int inum = pvm_getinst( char *group, int tid )
    int size = pvm_gsize( char *group )

    call pvmfgettid( group, inum, tid )
    call pvmfgetinst( group, tid, inum )
    call pvmfgsize( group, size )

The routine pvm_gettid() returns the TID of the process with a given group name and instance number. pvm_gettid() allows two tasks with no knowledge of each other to get each other's TID simply by joining a common group. The routine pvm_getinst() returns the instance number of TID in the specified group. The routine pvm_gsize() returns the number of members in the specified group.

    int info = pvm_barrier( char *group, int count )
    call pvmfbarrier( group, count, info )

On calling pvm_barrier() the process blocks until count members of a group have called pvm_barrier. In general count should be the total number of members of the group. A count is required because with dynamic process groups PVM cannot know how many members are in a group at a given instant. It is an error for processes to call pvm_barrier with a group it is not a member of. It is also an error if the count arguments across a given barrier call do not match. For example it is an error if one member of a group calls pvm_barrier() with a count of 4, and another member calls pvm_barrier() with a count of 5.

    int info = pvm_bcast( char *group, int msgtag )
    call pvmfbcast( group, msgtag, info )

pvm_bcast() labels the message with an integer identifier msgtag and broadcasts the message to all tasks in the specified group except itself (if it is a member of the group). For pvm_bcast() ``all tasks'' is defined to be those tasks the group server thinks are in the group when the routine is called. If tasks join the group during a broadcast, they may not receive the message. If tasks leave the group during a broadcast, a copy of the message will still be sent to them.

    int info = pvm_reduce(void (*func)(), void *data,
                          int nitem, int datatype,
                          int msgtag, char *group, int root )
    call pvmfreduce(func, data, count, datatype,
                    msgtag, group, root, info )

pvm_reduce() performs a global arithmetic operation across the group, for example, global sum or global max. The result of the reduction operation appears on root. PVM supplies four predefined functions that the user can place in func. These are

    PvmMax
    PvmMin
    PvmSum
    PvmProduct

The reduction operation is performed element-wise on the input data. For example, if the data array contains two floating-point numbers and func is PvmMax, then the result contains two numbers-the global maximum of each group members first number and the global maximum of each member's second number.

In addition users can define their own global operation function to place in func. See Appendix B for details. An example is given in the source code for PVM. For more information see PVM_ROOT/examples/gexamples.

Note: pvm_reduce() does not block. If a task calls pvm_reduce and then leaves the group before the root has called pvm_reduce, an error may occur.