[SCore-users-jp] Re: [SCore-users] Myrinet deadlock

Bogdan Costescu bogdan.costescu @ iwr.uni-heidelberg.de
2003年 3月 5日 (水) 19:30:51 JST


On Wed, 5 Mar 2003, Shinji Sumimoto wrote:

> The default mpich version of mpich is changed from mpich 1.2.0 to mpich 1.2.4.

Yes, I was aware of this.

> Could you build mpich 1.2.0 from source and test it?

As I built from source all user-level stuff, I already got mpi-1.2.0. But 
now I'm wondering how to build the ch_score2 device as this seems not to 
be built by default and I wanted to test it as well.

> If once mpich 1.2.0 is installed, you can choose mpich1.2.0 and mpich1.2.4 by -mpi option.

Actually the -mpi option doesn't seem to work, but I now set my path to 
include first the bin directory of mpi-1.2.0.

> PS$B!'(B How about mpi_zerocopy=on option?

I tried it and it seemed to lower the chances of locking up, but it still 
happens. When it does, I get sometimes:

SCORE: Deadlock detected
<0:0>SCore: *** SIGNAL EXCEPTION eip=0x08299a6b, cr2=0x       0 ***
...

With mpich-1.2.0 I get the same lock-ups. Another thing which is worth 
mentioning is that whenever the jobs are not interruptible and killable 
with pskill and SCoreD has to restart, it always takes down one of the 
nodes. It's not the same node (and with older SCore we didn't have such 
problem), so now because of this and because of independence of MPI 
library I start to suspect the kernel-side.

I'll try next to see if I can get SCore 4.2.1 to work with a newer kernel 
(2.4.18-19 or so, maybe some RedHat variant) to see if the problem comes 
from the newer kernel or from newer SCore.

Thank you for any suggestion!

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu @ IWR.Uni-Heidelberg.De



_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内