[SCore-users-jp] Re: [SCore-users] MPI and PM at the same time

Bogdan Costescu bogdan.costescu @ iwr.uni-heidelberg.de
2002年 10月 17日 (木) 23:05:06 JST


On Thu, 17 Oct 2002, Atsushi HORI wrote:

> This should not happen if the number of nodes is greater than one.

It does ! Or maybe I don't know how to obtain the data. I've included at 
the end of the message a sample program along with the output that I 
obtain when running here with SCore configured to use both Myrinet and 
shared memory. (If the text is too mangled to be useful I can send it as 
attachement or make it available on a web site).

> Define the number of network sets with the RESOURCE MACRO, like this.
> 
> SCORE_RSRC_NUM_NETS(N)

I've already tried to set directly the score_num_pmnet variable which is
mentioned in the score_initialize() man page and after MPI_Init(), the
number of contexts is always 1. When using this macro, the compiler (with
-Wall) warns that "unused variable `score_resource_num_netsets'" and the
result is always 1 context.

But the real problem is that I can't use this method. The ARMCI library
has to be initialized *after* MPI, so that it already has all processes up
and running. That's why I asked how to obtain another context starting
from the one used by MPI.

In order to have another context, I was trying to get the device used by
the MPI context so that I can call pmOpenContext and get a second context
on this device - that's where I discovered that ->device was NULL and of
course I couldn't use it in the pmOpenContext call. I also tried to get
->device for the "children" contexts attached to real devices which are in
one case only shmem and in the other only myrinet (and I've also tried on
a larger number on nodes to have both shmem and myrinet at the same time,
but the output becomes long - available on request).
Is there any other way of getting another context ? How about using 
pmSaveContext/pmRestoreContext to get a copy of the first context (as we 
want the same connectivity) ? 
What is pmAttachContext used for ? The documentation for pmCreateAttachFd 
says that the fd obtained there could be used in pmAttachContext. But what 
for ? If I have a context I attach a fd to it so that I can use select(2), 
but then I use this fd and a context type to create another context ???

Another strange thing is that in the Myrinet case, the number of nodes 
returned in pmContextConfig.nodes is 1 when I run on 2 nodes as 2x1 (but 
becomes 4 when I run on 4 nodes as 4x1). However, I haven't investigated 
this further, so there might be a logical explanation for it...

---------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <sc.h>
#include <score.h>
#include <errno.h>
#include <string.h>
#include <score_resource.h>

pmContext *mpic, *pc;
pmDevice *pd;

void fatal(char *s, int err) {
printf("#%d: %s %s\n", score_self_node, s, pmErrorString(err)); fflush(stdout);
exit(1);
}

int main(int argc, char **argv)
{
int err, i;
pmContextConfig cc;
pmContext *allc[PM_MAX_NODE];
int allnr[PM_MAX_NODE];

SCORE_RSRC_NUM_NETS(2);

MPI_Init(&argc, &argv);

if (score_num_pmnet < 1) {
	printf("No context !!!\n");
	return 1;
	}
else printf("#%d: Nr. of contexts: %d\n", score_self_node, score_num_pmnet);

mpic = score_pmnet[0];
if ((err = pmGetContextConfig(mpic, &cc)) != PM_SUCCESS) fatal("pmGetContextConfig", err);
printf("#%d:C: device=%p, parent=%p, ref_count=%d, use_count=%d, size=%d\n",
	score_self_node, mpic->device, mpic->parent, mpic->ref_count, mpic->use_count, mpic->size);
printf("#%d:CC: type=%s, nr=%d, nodes=%d, mtu=%d, size=%d, opt=%ld\n", 
	score_self_node, cc.type, cc.number, cc.nodes, cc.mtu, cc.size, cc.option);

for (i = 0; i < cc.nodes; i++) {
	/* pmExtractNode does not work for the node itself !!!*/
	if (i == score_self_node) continue;
	if ((err = pmExtractNode(mpic, i, &allc[i], &allnr[i])) != PM_SUCCESS) fatal("pmExtractNode", err);
	if ((err = pmGetContextConfig(allc[i], &cc)) != PM_SUCCESS) fatal("pmGetContextConfig", err);
	printf("#%d:C: me=%d, device=%p, parent=%p, ref_count=%d, use_count=%d, size=%d\n",
		score_self_node, i, allc[i]->device, allc[i]->parent, allc[i]->ref_count, allc[i]->use_count, allc[i]->size);
	printf("#%d:CC: me=%d, type=%s, nr=%d, nodes=%d, mtu=%d, size=%d, opt=%ld\n", 
		score_self_node, i, cc.type, cc.number, cc.nodes, cc.mtu, cc.size, cc.option);
	fflush(stdout);
	}

MPI_Barrier(MPI_COMM_WORLD);
fflush(stdout);
MPI_Finalize();
return 0;
}

And the output:

[bogdan @ node203 ~/tmp]$ scrun -nodes=1x2 ./z
SCore-D 4.2.1 connected (jid=257).
<0:0> SCORE: 2 nodes (1x2) ready.
#0: Nr. of contexts: 1
#0:C: device=(nil), parent=(nil), ref_count=1, use_count=0, size=8484
#0:CC: type=composite, nr=0, nodes=2, mtu=8192, size=65952, opt=68
#0:C: me=1, device=(nil), parent=0x8530148, ref_count=2, use_count=2, size=276
#0:CC: me=1, type=shmem, nr=21, nodes=2, mtu=8192, size=65568, opt=68
#1: Nr. of contexts: 1
#1:C: device=(nil), parent=(nil), ref_count=1, use_count=0, size=8484
#1:CC: type=composite, nr=0, nodes=2, mtu=8192, size=65952, opt=68
#1:C: me=0, device=(nil), parent=0x8530148, ref_count=2, use_count=2, size=276
#1:CC: me=0, type=shmem, nr=21, nodes=2, mtu=8192, size=65568, opt=68
[bogdan @ node203 ~/tmp]$ 

[bogdan @ node203 ~/tmp]$ scrun -nodes=2x1 ./z
SCore-D 4.2.1 connected (jid=256).
<0:0> SCORE: 2 nodes (2x1) ready.
#0: Nr. of contexts: 1
#0:C: device=(nil), parent=(nil), ref_count=1, use_count=0, size=8484
#0:CC: type=composite, nr=0, nodes=1, mtu=8256, size=164240, opt=94
#1: Nr. of contexts: 1
#1:C: device=(nil), parent=(nil), ref_count=1, use_count=0, size=8484
#1:CC: type=composite, nr=0, nodes=1, mtu=8256, size=164240, opt=94
#1:C: me=0, device=(nil), parent=0x8530148, ref_count=2, use_count=2, size=272
#1:CC: me=0, type=myrinet, nr=0, nodes=2, mtu=8256, size=163856, opt=127
[bogdan @ node203 ~/tmp]$ 



-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu @ IWR.Uni-Heidelberg.De

_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内