[SCore-users] Races inside SCore(?)

Richard Guenther rguenth at tat.physik.uni-tuebingen.de
Tue Dec 3 20:23:18 JST 2002


Hi!

I experience problems using SCore (version 4.2.1 with 100MBit
and 3.3.1 with Myrinet) in
conjunction with the cheetah (v1.1.4) library used by POOMA.
The problem appears if I use a nx2 processor setup and does
not appear in nx1 mode. The problem is all processes spinning
in kernel space (>90% system time) and no progress achieved
anymore. Does this sound familiar to anyone?

Now to elaborate some more. Cheetah presents sort of one-sided
communication interface to the user and at certain points polls
for messages with a construct like (very simplified)

 do {
    MPI_Iprobe(MPI_ANY_SOURCE, tag, comm, &flag, &status);
 } while (!flag);

now, if I insert a sched_yield() or a usleep(100) after the
MPI_Iprobe(), the problem goes away (well, not completely, but
it is a lot harder to reproduce). SCore usually does not
detect any sort of deadlock, but ocasionally it does.

Now the question, may this be a race condition somewhere in the
SCore code that handles multiple processors on one node? Where
should I start to look at to fix the problem?

Thanks for any hints,
   Richard.

PS: please CC me, I'm not on the list.

--
Richard Guenther <richard.guenther at uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/





More information about the SCore-users mailing list