[SCore-users-jp] Re: [SCore-users] Myrinet deadlock
Bogdan Costescu
bogdan.costescu @ iwr.uni-heidelberg.de
2003年 3月 5日 (水) 19:30:51 JST
On Wed, 5 Mar 2003, Shinji Sumimoto wrote:
> The default mpich version of mpich is changed from mpich 1.2.0 to mpich 1.2.4.
Yes, I was aware of this.
> Could you build mpich 1.2.0 from source and test it?
As I built from source all user-level stuff, I already got mpi-1.2.0. But
now I'm wondering how to build the ch_score2 device as this seems not to
be built by default and I wanted to test it as well.
> If once mpich 1.2.0 is installed, you can choose mpich1.2.0 and mpich1.2.4 by -mpi option.
Actually the -mpi option doesn't seem to work, but I now set my path to
include first the bin directory of mpi-1.2.0.
> PS$B!'(B How about mpi_zerocopy=on option?
I tried it and it seemed to lower the chances of locking up, but it still
happens. When it does, I get sometimes:
SCORE: Deadlock detected
<0:0>SCore: *** SIGNAL EXCEPTION eip=0x08299a6b, cr2=0x 0 ***
...
With mpich-1.2.0 I get the same lock-ups. Another thing which is worth
mentioning is that whenever the jobs are not interruptible and killable
with pskill and SCoreD has to restart, it always takes down one of the
nodes. It's not the same node (and with older SCore we didn't have such
problem), so now because of this and because of independence of MPI
library I start to suspect the kernel-side.
I'll try next to see if I can get SCore 4.2.1 to work with a newer kernel
(2.4.18-19 or so, maybe some RedHat variant) to see if the problem comes
from the newer kernel or from newer SCore.
Thank you for any suggestion!
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu @ IWR.Uni-Heidelberg.De
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users
SCore-users-jp メーリングリストの案内