[SCore-users-jp] Re: [SCore-users] Help with an error message

James O'Dell jodell @ ad.brown.edu
2003年 3月 19日 (水) 05:29:22 JST


I found part of my problem. The "Link has been severd message" came
about because my gig interface was not marked UP by ifconfig. I am not
using the gigbit ethernet for anything esle by SCore so it was not UP.
I modified the pm_ethernet scripts to do a "/sbin/ifconfig eth1 up"
and an "/sbin/ifconfig eth1 down" before and after respectively.

rpmtest indicates that both interfaces are now working.

I cannot test with scstest because I somehow broke my msgbserv.
That problem is in another message.

Jim

On Tue, 2003-03-18 at 14:00, James O'Dell wrote:
> Here is my scorehosts.db
> 
> 
> /*
>  *       SCore 5.0 scorehosts.db
>  *		generated by PCCC EIT 5.2
>  */
> 
> /* PM/Myrinet */
> myrinet		type=myrinet \
> 		-firmware:file=/opt/score/share/lanai/lanai.mcp \
> 		-config:file=/opt/score/etc/pm-myrinet.conf
> 
> /* PM/Myrinet */
> myrinet2k	type=myrinet2k \
> 		-firmware:file=/opt/score/share/lanai/lanaiM2k.mcp \
> 		-config:file=/opt/score/etc/pm-myrinet.conf
> 
> /* PM/Ethernet */
> ethernet	type=ethernet \
> 		-config:file=/opt/score/etc/pm-ethernet.conf
> gigaethernet	type=ethernet \
> 		-config:file=/opt/score/etc/pm-gig.conf
> /* PM/Agent */
> udp		type=agent -agent=pmaudp \
> 		-config:file=/opt/score/etc/pm-udp.conf
> 
> /* RHiNET */
> rhinet		type=rhinet \
> 		-firmware:file=/opt/score/share/rhinet/phu_top_0207a.hex \
> 		-config:file=/opt/score/etc/pm-rhinet.conf
> ##
> ##
> #include "/opt/score//etc/ndconf/0"
> #include "/opt/score//etc/ndconf/1"
> #include "/opt/score//etc/ndconf/2"
> #include "/opt/score//etc/ndconf/3"
> #include "/opt/score//etc/ndconf/4"
> #include "/opt/score//etc/ndconf/5"
> #include "/opt/score//etc/ndconf/6"
> #include "/opt/score//etc/ndconf/7"
> #include "/opt/score//etc/ndconf/8"
> #include "/opt/score//etc/ndconf/9"
> #include "/opt/score//etc/ndconf/10"
> #include "/opt/score//etc/ndconf/11"
> ##
> #define MSGBSERV	msgbserv=(kansas-fe.cascv.brown.edu:8764)
> 
> bio-1.cascv.brown.edu	HOST_0 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-2.cascv.brown.edu	HOST_1 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-3.cascv.brown.edu	HOST_2 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-4.cascv.brown.edu	HOST_3 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-5.cascv.brown.edu	HOST_4 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-6.cascv.brown.edu	HOST_5 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-7.cascv.brown.edu	HOST_6 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-8.cascv.brown.edu	HOST_7 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-9.cascv.brown.edu	HOST_8 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-10.cascv.brown.edu	HOST_9 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-11.cascv.brown.edu	HOST_10 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-12.cascv.brown.edu	HOST_11 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> 
> 
> Here is my pm-gig.conf file:
> unit 1
> maxnsend 8
> # Not connected yet
> #0 00:30:48:23:70:CF bio-1.cascv.brown.edu
> #1 00:30:48:23:70:B1 bio-2.cascv.brown.edu
> #2 00:30:48:23:70:D9 bio-3.cascv.brown.edu
> #3 00:30:48:23:70:E3 bio-4.cascv.brown.edu
> 4 00:30:48:23:6E:2B bio-5.cascv.brown.edu
> 5 00:30:48:23:3F:05 bio-6.cascv.brown.edu
> 6 00:30:48:23:3E:51 bio-7.cascv.brown.edu
> 7 00:30:48:23:3E:3D bio-8.cascv.brown.edu
> 8 00:30:48:23:70:EB bio-9.cascv.brown.edu
> 9 00:30:48:23:6F:05 bio-10.cascv.brown.edu
> 10 00:30:48:23:6E:55 bio-11.cascv.brown.edu
> 11 00:30:48:23:70:E1 bio-12.cascv.brown.edu
> 
> I have disabled the first four hosts as we don't have enough room in our
> switch for them.
> 
> I have also edited the pm_ethernet file to start and stop eth1. When I
> run "pm_ethernet stop" and then run "pm_ethernet start" I get the
> messages below.
> 
> [root @ bio-12 init.d]# ./pm_ethernet stop
> Stopping PM/Ethernet: device: eth0
> device: eth1
> 
> [root @ bio-12 init.d]# ./pm_ethernet start
> n Starting PM/Ethernet: 
> device: eth0
> device: eth1
> etherpmctl: ERROR on unit 1: "Link has been severed(67)" Check dmesg
> log!!
> 
> Many thanks for your help!
> 
> Jim
> 
> On Mon, 2003-03-17 at 21:30, Atsushi HORI wrote:
> > Hi,
> > 
> > >1) edit the pm_ehternet file on the nodes to start the gig interface.
> > >2) Add a file pm-gig.conf to the /opt/score/etc directory. This file has
> > >the MAC addresses of the gig cards.
> > >3) Edit the scoredhosts.db file to define gigaethernet,include bu
> > >pm-gig.conf file and define the nodes to have gigabit ethernet.
> > >4) Reboot the server and the compute hosts.
> > 
> > And you must do the following on all cluster hosts;
> > 
> > 5) /etc/rc.d/init.d/pm_ethernet stop
> >    Edit /etc/rc.d/init.d/pm_ethernet
> >    /etc/rc.d/init.d/pm_ethernet start
> > 
> > The pm_sthernet script binds PM unit number and Linux ethernet device 
> > (eth0, eth1, ...).
> > 
> > >Does anyoen know what the following messages mean?
> > >I got them whil running:
> > >
> > >scstest -network gigaethernet
> > >
> > >
> > >bio-11(-1) pmAssociateNodes: Invalid argument(22)
> > >bio-12(-1) pmAssociateNodes: Invalid argument(22)
> > 
> > Send me the files /opt/score/etc/scorehosts.db and 
> > /opt/score/etc/pm-gig.conf.
> > 
> > ----
> > Atsushi HORI
> > Swimmy Software, Inc.
> > 
> _______________________________________________
> SCore-users mailing list
> SCore-users @ pccluster.org
> http://www.pccluster.org/mailman/listinfo/score-users
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内