[SCore-users] 128 limit on score 5.0.1
Shinji Sumimoto
s-sumi at bd6.so-net.ne.jp
Sun Jul 6 22:00:49 JST 2003
Hi. Nick.
Could you try with following modification?
In such a hetero cluster, the first score network must cover whole of cluster nodes.
============================================
Original scorehosts.db
comp000.leeds.ac.uk HOST_0 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,ETHER,MYRI,SHMEM smp=2 MSGBSERV
... /* other */
...
comp128.leeds.ac.uk HOST_128 network=myrinet2kII,ethernet,shmem0,shmem1 group=MYRI2 smp=2 MSGBSERV
... /* other */
...
============================================
New scorehosts.db
comp000.leeds.ac.uk HOST_0 network=ethernet,myrinet2k,shmem0,shmem1 group=_scoreall_,ETHER,MYRI,SHMEM smp=2 MSGBSERV
... /* other */
...
comp128.leeds.ac.uk HOST_128 network=ethernet,myrinet2kII,shmem0,shmem1 group=_scoreall_,MYRI2 smp=2 MSGBSERV
... /* other */
...
============================================
If the situation does not change, could you send more information (rpmtest test with -debug 3)?
Shinji.
From: Nick Birkett <nick at streamline-computing.com>
Subject: [SCore-users] 128 limit on score 5.0.1
Date: Sat, 5 Jul 2003 14:29:35 +0100
Message-ID: <200307051429.35075.nick at streamline-computing.com>
nick> Just wondering if we have hit the 128 compute node limit on Score 5.0.1 ?
nick>
nick> We plan to upgrade to 5.4 in the next 4 weeks.
nick>
nick> We have 136 compute nodes configured with 2 Myrinet 2k fibre switches
nick>
nick> 128 hosts on a 128 port switch running batch
nick> 8 hosts on an 8 port switch running mult-user
nick>
nick> There are 2 pm-myrinet.conf files pm-myrinet.conf (128 hosts) and
nick> pm-myrinet2.conf (8 hosts), 2 myrinet groups MYRI and MYRI2 and
nick> 2 Myrinet networks - myrinet2k (128) and myrinet2kII (8 hosts) in
nick> scorehosts.db.
nick>
nick> (The opt/score/etc directory is attached as compressed tar).
nick>
nick> Both systems work ok as long as there are no more than 4 compute
nick> nodes (128,129,130,131) listed in scorehosts.db.
nick>
nick> The compute nodes 132,133,134,135 are commented out in scorehosts.db.
nick> Score multi-user is working fin on comp128-131 (4 hosts) on second switch.
nick>
nick> However the mult-user system always fails the basic ping tests with
nick> [root at snowdon sbin]# ./rpminit comp128 myrinet2kII
nick> [root at snowdon sbin]# ./rpmtest comp128 myrinet2kII -dest 128 -ping
nick> pmGetNodeList: No route to host(113)
nick> [root at snowdon sbin]#
nick>
nick>
nick> Regards,
nick>
nick> Nick
nick>
------
Shinji Sumimoto, Fujitsu Labs
More information about the SCore-users
mailing list