[SCore-users-jp] Re: [SCore-users] ULT:PANIC Try to break thread
Jure Jerman
jure.jerman @ rzs-hm.si
2004年 12月 20日 (月) 18:02:12 JST
Hi,
Atsushi HORI wrote:
>
> I think I fixed the bug.
>
It is very nice to read this.
> What did you do when the error happens ? Or, how can I reproduce the
> error ?
This needs a bit of explanation:
- in our operational suite we run few large jobs (on 20+ processors),
binary is produced by lahey (fujitsu) fortran (ver 6.1) and it was NOT relinked
after SCore upgrade, so we suspect that the problem could be connected
to this fact.
- another fact is, that we were not able to reproduce the same error
with another binary. Once we ran thousands of mpitest jobs and after
some thousands of successful runs scored hanged, but not with ULT_PANIC
error.
- the error is occurring quite randomly in it is not reproduceable. One has
to run larger number (5-10) of SCore jobs to get this error.
>
> By the way, some line numbers of the debugger output are different from
> what I am working on.
> Are you sure you upgraded all the nodes ?
We did a clean install of ALL compute nodes and an upgrade of master node
from Redhat 7.3 to Fedora Core 1. SCore software was completly removed from
the master node and a bininstall -server was performed.
Thank you, Jure Jerman
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users
SCore-users-jp メーリングリストの案内