[SCore-users] Races inside SCore(?)
    Richard Guenther 
    rguenth at tat.physik.uni-tuebingen.de
       
    Tue Dec  3 20:23:18 JST 2002
    
    
  
Hi!
I experience problems using SCore (version 4.2.1 with 100MBit
and 3.3.1 with Myrinet) in
conjunction with the cheetah (v1.1.4) library used by POOMA.
The problem appears if I use a nx2 processor setup and does
not appear in nx1 mode. The problem is all processes spinning
in kernel space (>90% system time) and no progress achieved
anymore. Does this sound familiar to anyone?
Now to elaborate some more. Cheetah presents sort of one-sided
communication interface to the user and at certain points polls
for messages with a construct like (very simplified)
 do {
    MPI_Iprobe(MPI_ANY_SOURCE, tag, comm, &flag, &status);
 } while (!flag);
now, if I insert a sched_yield() or a usleep(100) after the
MPI_Iprobe(), the problem goes away (well, not completely, but
it is a lot harder to reproduce). SCore usually does not
detect any sort of deadlock, but ocasionally it does.
Now the question, may this be a race condition somewhere in the
SCore code that handles multiple processors on one node? Where
should I start to look at to fix the problem?
Thanks for any hints,
   Richard.
PS: please CC me, I'm not on the list.
--
Richard Guenther <richard.guenther at uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
    
    
More information about the SCore-users
mailing list