[SCore-users] 128 limit on score 5.0.1

Shinji Sumimoto s-sumi at bd6.so-net.ne.jp
Sun Jul 6 22:00:49 JST 2003


Hi. Nick.

Could you try with following modification?

In such a hetero cluster, the first score network must cover whole of cluster nodes.

============================================
Original scorehosts.db

comp000.leeds.ac.uk	HOST_0 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,ETHER,MYRI,SHMEM smp=2 MSGBSERV
... /* other */
...
comp128.leeds.ac.uk     HOST_128 network=myrinet2kII,ethernet,shmem0,shmem1 group=MYRI2 smp=2 MSGBSERV
... /* other */
...
============================================
New scorehosts.db
comp000.leeds.ac.uk	HOST_0 network=ethernet,myrinet2k,shmem0,shmem1 group=_scoreall_,ETHER,MYRI,SHMEM smp=2 MSGBSERV
... /* other */
...
comp128.leeds.ac.uk     HOST_128 network=ethernet,myrinet2kII,shmem0,shmem1 group=_scoreall_,MYRI2 smp=2 MSGBSERV
... /* other */
...
============================================

If the situation does not change, could you send more information (rpmtest test with -debug 3)?

Shinji.

From: Nick Birkett <nick at streamline-computing.com>
Subject: [SCore-users] 128 limit on score 5.0.1
Date: Sat, 5 Jul 2003 14:29:35 +0100
Message-ID: <200307051429.35075.nick at streamline-computing.com>

nick> Just wondering if we have hit the 128 compute node limit   on Score 5.0.1 ?
nick> 
nick> We plan to upgrade to 5.4 in the next 4 weeks.
nick> 
nick> We have 136 compute nodes configured with 2 Myrinet 2k fibre switches
nick> 
nick> 128 hosts on a 128 port switch running batch 
nick> 8 hosts on an 8 port switch running mult-user
nick> 
nick> There are 2 pm-myrinet.conf files  pm-myrinet.conf (128 hosts) and 
nick> pm-myrinet2.conf (8 hosts),  2 myrinet groups MYRI and MYRI2 and
nick> 2 Myrinet networks - myrinet2k (128) and myrinet2kII (8 hosts) in 
nick> scorehosts.db.
nick> 
nick> (The opt/score/etc directory is attached as compressed tar).
nick> 
nick> Both systems work ok as long as there are no more than 4 compute
nick> nodes (128,129,130,131) listed in scorehosts.db. 
nick> 
nick> The compute nodes 132,133,134,135 are commented out in scorehosts.db.
nick> Score multi-user is working fin on comp128-131 (4 hosts) on second switch.
nick> 
nick> However the mult-user system always fails the basic ping tests with
nick>  [root at snowdon sbin]# ./rpminit comp128 myrinet2kII
nick> [root at snowdon sbin]# ./rpmtest comp128 myrinet2kII -dest 128 -ping
nick> pmGetNodeList: No route to host(113)
nick> [root at snowdon sbin]#
nick> 
nick> 
nick> Regards,
nick> 
nick> Nick
nick> 
------
Shinji Sumimoto, Fujitsu Labs



More information about the SCore-users mailing list