[SCore-users-jp] Re: [SCore-users] Help with an error message
James O'Dell
jodell @ ad.brown.edu
2003年 3月 19日 (水) 05:29:22 JST
I found part of my problem. The "Link has been severd message" came
about because my gig interface was not marked UP by ifconfig. I am not
using the gigbit ethernet for anything esle by SCore so it was not UP.
I modified the pm_ethernet scripts to do a "/sbin/ifconfig eth1 up"
and an "/sbin/ifconfig eth1 down" before and after respectively.
rpmtest indicates that both interfaces are now working.
I cannot test with scstest because I somehow broke my msgbserv.
That problem is in another message.
Jim
On Tue, 2003-03-18 at 14:00, James O'Dell wrote:
> Here is my scorehosts.db
>
>
> /*
> * SCore 5.0 scorehosts.db
> * generated by PCCC EIT 5.2
> */
>
> /* PM/Myrinet */
> myrinet type=myrinet \
> -firmware:file=/opt/score/share/lanai/lanai.mcp \
> -config:file=/opt/score/etc/pm-myrinet.conf
>
> /* PM/Myrinet */
> myrinet2k type=myrinet2k \
> -firmware:file=/opt/score/share/lanai/lanaiM2k.mcp \
> -config:file=/opt/score/etc/pm-myrinet.conf
>
> /* PM/Ethernet */
> ethernet type=ethernet \
> -config:file=/opt/score/etc/pm-ethernet.conf
> gigaethernet type=ethernet \
> -config:file=/opt/score/etc/pm-gig.conf
> /* PM/Agent */
> udp type=agent -agent=pmaudp \
> -config:file=/opt/score/etc/pm-udp.conf
>
> /* RHiNET */
> rhinet type=rhinet \
> -firmware:file=/opt/score/share/rhinet/phu_top_0207a.hex \
> -config:file=/opt/score/etc/pm-rhinet.conf
> ##
> ##
> #include "/opt/score//etc/ndconf/0"
> #include "/opt/score//etc/ndconf/1"
> #include "/opt/score//etc/ndconf/2"
> #include "/opt/score//etc/ndconf/3"
> #include "/opt/score//etc/ndconf/4"
> #include "/opt/score//etc/ndconf/5"
> #include "/opt/score//etc/ndconf/6"
> #include "/opt/score//etc/ndconf/7"
> #include "/opt/score//etc/ndconf/8"
> #include "/opt/score//etc/ndconf/9"
> #include "/opt/score//etc/ndconf/10"
> #include "/opt/score//etc/ndconf/11"
> ##
> #define MSGBSERV msgbserv=(kansas-fe.cascv.brown.edu:8764)
>
> bio-1.cascv.brown.edu HOST_0 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-2.cascv.brown.edu HOST_1 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-3.cascv.brown.edu HOST_2 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-4.cascv.brown.edu HOST_3 network=ethernet group=_scoreall_,100Mb
> smp=2 MSGBSERV
> bio-5.cascv.brown.edu HOST_4 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-6.cascv.brown.edu HOST_5 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-7.cascv.brown.edu HOST_6 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-8.cascv.brown.edu HOST_7 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-9.cascv.brown.edu HOST_8 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-10.cascv.brown.edu HOST_9 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-11.cascv.brown.edu HOST_10 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
> bio-12.cascv.brown.edu HOST_11 network=ethernet,gigaethernet
> group=_scoreall_,100Mb,gige smp=2 MSGBSERV
>
>
> Here is my pm-gig.conf file:
> unit 1
> maxnsend 8
> # Not connected yet
> #0 00:30:48:23:70:CF bio-1.cascv.brown.edu
> #1 00:30:48:23:70:B1 bio-2.cascv.brown.edu
> #2 00:30:48:23:70:D9 bio-3.cascv.brown.edu
> #3 00:30:48:23:70:E3 bio-4.cascv.brown.edu
> 4 00:30:48:23:6E:2B bio-5.cascv.brown.edu
> 5 00:30:48:23:3F:05 bio-6.cascv.brown.edu
> 6 00:30:48:23:3E:51 bio-7.cascv.brown.edu
> 7 00:30:48:23:3E:3D bio-8.cascv.brown.edu
> 8 00:30:48:23:70:EB bio-9.cascv.brown.edu
> 9 00:30:48:23:6F:05 bio-10.cascv.brown.edu
> 10 00:30:48:23:6E:55 bio-11.cascv.brown.edu
> 11 00:30:48:23:70:E1 bio-12.cascv.brown.edu
>
> I have disabled the first four hosts as we don't have enough room in our
> switch for them.
>
> I have also edited the pm_ethernet file to start and stop eth1. When I
> run "pm_ethernet stop" and then run "pm_ethernet start" I get the
> messages below.
>
> [root @ bio-12 init.d]# ./pm_ethernet stop
> Stopping PM/Ethernet: device: eth0
> device: eth1
>
> [root @ bio-12 init.d]# ./pm_ethernet start
> n Starting PM/Ethernet:
> device: eth0
> device: eth1
> etherpmctl: ERROR on unit 1: "Link has been severed(67)" Check dmesg
> log!!
>
> Many thanks for your help!
>
> Jim
>
> On Mon, 2003-03-17 at 21:30, Atsushi HORI wrote:
> > Hi,
> >
> > >1) edit the pm_ehternet file on the nodes to start the gig interface.
> > >2) Add a file pm-gig.conf to the /opt/score/etc directory. This file has
> > >the MAC addresses of the gig cards.
> > >3) Edit the scoredhosts.db file to define gigaethernet,include bu
> > >pm-gig.conf file and define the nodes to have gigabit ethernet.
> > >4) Reboot the server and the compute hosts.
> >
> > And you must do the following on all cluster hosts;
> >
> > 5) /etc/rc.d/init.d/pm_ethernet stop
> > Edit /etc/rc.d/init.d/pm_ethernet
> > /etc/rc.d/init.d/pm_ethernet start
> >
> > The pm_sthernet script binds PM unit number and Linux ethernet device
> > (eth0, eth1, ...).
> >
> > >Does anyoen know what the following messages mean?
> > >I got them whil running:
> > >
> > >scstest -network gigaethernet
> > >
> > >
> > >bio-11(-1) pmAssociateNodes: Invalid argument(22)
> > >bio-12(-1) pmAssociateNodes: Invalid argument(22)
> >
> > Send me the files /opt/score/etc/scorehosts.db and
> > /opt/score/etc/pm-gig.conf.
> >
> > ----
> > Atsushi HORI
> > Swimmy Software, Inc.
> >
> _______________________________________________
> SCore-users mailing list
> SCore-users @ pccluster.org
> http://www.pccluster.org/mailman/listinfo/score-users
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users
SCore-users-jp メーリングリストの案内