[SCore-users-jp] [SCore-users] Copper Myrinet pm problems

Nick Birkett nrcb @ streamline-computing.com
2003年 2月 2日 (日) 03:17:25 JST


Hi, we have just upgraded one of our older clusters to SCore 5.0.1 from 4.x
(I think it was the first SCore to support Myrinet 2000 from 18 months ago - 
RedHat 6.2 dist).

The cluster was working more or less ok under the old SCore system.

The entire system has been re-installed as RedHat 7.2 + SCore 5.0.1.

Hardware: Copper based Myrinet2k (May 2001) and Pentium III dual 866Mhz 
SuperMicro 1U Superservers.

I have run rpmtest and the scstest -network myrinet2k for many hours over all 
compute nodes without problems.

Have run gm1.6.3 codes (e.g PMB) and they work fine.

SCore PM codes are having problems over Myrinet - e.g  running PMB:

<6:0> SCORE:WARNING MPICH/SCore    [buffer=0x8951498, type=1025, from=11, 
size=262144, offset=189520]
<6:0> SCORE:WARNING MPICH/SCore: receive-message-queue:
<6:0> SCORE:WARNING MPICH/SCore    (empty)
<6:0> SCORE:WARNING MPICH/SCore: received-fragment:
<6:0> SCORE:WARNING MPICH/SCore    [buffer=0x40066180, type=1025, from=11, 
size=262144, fragment_size=8240, offset=189521]
<6:0> SCORE:WARNING MPICH/SCore: queued-message:
<6:0> SCORE:WARNING MPICH/SCore    [buffer=0x8951498, type=1025, from=11, 
size=262144, offset=189520]
<6:0> SCORE:WARNING MPICH/SCore: received an invalid fragment (mismatched 
offset)
<6:0> SCORE:PANIC MPICH/SCore: critical error on message transfer
<6:0> Trying to attach GDB (DISPLAY=localhost:10.0): PANIC
SCORE: Program aborted.
SCOUT: Session done.


Lots of buffer mismatch errors. The same binary runs fine over ethernet or 
gigabit on the same hardware (i.e if add the -network=ethernet option then
all ok so it is a Myrinet problem).

We would like to keep SCore as the cluster has some new Xeon Gigabit nodes
but will have to convert to GM if we cannot resolve this.

Looks like a hardware problem (same code runs fine over Score 5.0.1 and fibre
optic Myrinet 2k on Intel Xeon systems).

Thanks,

Nick









 
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内