[SCore-users] Network freezing

Nick Birkett nick at streamline-computing.com
Fri Jul 18 16:14:32 JST 2003


Dear Score. One of our users is trying to run a very large 64 cpu job using 
Myrinet2k (C cards) and Score 5.4.

The jobs uses most of the memory on 32 compute nodes (32x2 job).

<0:0> SCORE: 64 nodes (32x2) ready.
<6> SCORE WARNING: Physical memory might be exhausted.
<13> SCORE WARNING: Physical memory might be exhausted.
<17> SCORE WARNING: Physical memory might be exhausted.
<14> SCORE WARNING: Physical memory might be exhausted.
<10> SCORE WARNING: Physical memory might be exhausted.
<12> SCORE WARNING: Physical memory might be exhausted.
<11> SCORE WARNING: Physical memory might be exhausted.
<22> SCORE WARNING: Physical memory might be exhausted.
<15> SCORE WARNING: Physical memory might be exhausted.
<3> SCORE WARNING: Physical memory might be exhausted.
<20> SCORE WARNING: Physical memory might be exhausted.
<28> SCORE WARNING: Physical memory might be exhausted.
<31> SCORE WARNING: Physical memory might be exhausted.
<25> SCORE WARNING: Physical memory might be exhausted.
<29> SCORE WARNING: Physical memory might be exhausted.
<24> SCORE WARNING: Physical memory might be exhausted.
<30> SCORE WARNING: Physical memory might be exhausted.
<27> SCORE WARNING: Physical memory might be exhausted.
<1> SCORE WARNING: Physical memory might be exhausted.
<16> SCORE WARNING: Physical memory might be exhausted.
<21> SCORE WARNING: Physical memory might be exhausted.
<7> SCORE WARNING: Physical memory might be exhausted.


The job starts to run but then we get an error:

<13> SCore-D:PANIC Network freezing timed out !!
<15> SCore-D:PANIC Network freezing timed out !!
<12> SCore-D:PANIC Network freezing timed out !!
<2> SCore-D:PANIC Network freezing timed out !!
<4> SCore-D:PANIC Network freezing timed out !!
<26> SCore-D:PANIC Network freezing timed out !!


The system has been welll tested using Pallas benchmarks (full suite of tests) 
and has also run for 2 days with the top 500 HPL benchmark (180 Gflops using
all cpus).

Any suggestions as to why we have this problem ? Is it hardware or software ?

Thanks,

Nick



More information about the SCore-users mailing list