[SCore-users-jp] Fwd: SCore7でのネットワークトランキングについて(再送付)
Shinji Sumimoto
s-sumi @ labs.fujitsu.com
2010年 10月 6日 (水) 21:57:30 JST
坂田様
富士通研の住元です。
PM/Etherhxbの試用大変ありがとうございます。
こちらでも環境を準備して試したところ同様の状況を再現できました。
ノードがダウンする件は Ethernetドライバの送信packetをfreeする関数中でポ
インタがNULLになっているためでした。パッチにより回避可能 (最新のドライ
バでは修正されています)
応答が非常に遅い件は、いくつか要因(送信でe1000ハードがハング等)があり、
Ethernetのドライバの問題もあるようです。 調査と改善のためしばらく時間
を頂きたくお願いします。
済みませんが、よろしくお願いします。
From: "Taro Sakata" <ks2718281828 @ mopera.net>
Subject: Re: [SCore-users-jp] Fwd: SCore7でのネットワークトランキングについて(再送付)
Date: Mon, 4 Oct 2010 22:12:50 +0900
Message-ID: <BD9DBC3904D044479BF88004DCAAB420 @ TVPC>
ks2718281828> 亀山様
ks2718281828>
ks2718281828> pccluster.orgにCCで入れ忘れましたので,再送付します。
ks2718281828>
ks2718281828> > $ scout -g machinefile -e pmxtest -iter 1 -network etherhxb
ks2718281828> >
ks2718281828> > ではどうでしょうか?
ks2718281828>
ks2718281828> 実行結果,状況は下記でした。machine構成は 計算ホスト兼用のserverとcomp1〜comp4の5台構成です。
ks2718281828>
ks2718281828>
ks2718281828> (1)一応走りますが,応答が非常に遅く,最後に下記出力をして応答がなくなったためCtrl-Cで強制終了させました。
ks2718281828> (2)処理中 eth0 側のスイッチのみ反応。eth1側は反応せず。
ks2718281828> (3)途中,comp1がダウン。他は落ちていません。
ks2718281828>
ks2718281828> 以上,よろしくお願いします。
ks2718281828> 坂田
ks2718281828> ---------------------------------------------------------------------
ks2718281828> [taro @ server test]$ scout -g machinefile -e pmxtest -iter 1 -network
ks2718281828> etherhxb
ks2718281828>
ks2718281828> 28/Sep/10 12:27:45 #### PMX Test for [etherhxb,smp=1,key=48] ####
ks2718281828> 28/Sep/10 12:27:45 #### 500 [msec] per step ####
ks2718281828> Testing Two-Sided Communication (MTU is 1372 Bytes)
ks2718281828>
ks2718281828> Receive Polling (ENOBUFS)
ks2718281828> 0.0784 us for 6400000 times iteration
ks2718281828>
ks2718281828> Send Polling (ENOBUFS)
ks2718281828> 32B: 0.0679 us for 7400000 times iteration
ks2718281828> 64B: 0.0679 us for 7400000 times iteration
ks2718281828> 128B: 0.0672 us for 7500000 times iteration
ks2718281828> 256B: 0.0674 us for 7500000 times iteration
ks2718281828> 512B: 0.0674 us for 7500000 times iteration
ks2718281828> 1KB: 0.0674 us for 7500000 times iteration
ks2718281828> 1.3KB: 0.0673 us for 7500000 times iteration
ks2718281828>
ks2718281828> One-Way, Peer-to-Peer, Burst Communication
ks2718281828> [0->1] 32B ..==================================
ks2718281828> ==================================
ks2718281828> ==================================
ks2718281828> Ethernet PM context #3 information sizeof sc=5008
ks2718281828> tx_p=0000e8cc, tx_c=0000e4dc,tx_s=0000e8cc, tx_bp=0074da80, tx_bc=0072e000
ks2718281828>
ks2718281828> dump tx descriptors prod=e8cc(cc) cons=e4dc(dc) sent=e8cc mask=3ff
ks2718281828> [0]= 0: [0001-e411- 20- 5] [0001-e412- 20- 5] [0001-e413- 20- 5]
ks2718281828> [0001-e414- 20- 5]
ks2718281828> [0]= 4: [0001-e415- 20- 5] [0001-e416- 20- 5] [0001-e417- 20- 5]
ks2718281828> [0001-e418- 20- 5]
ks2718281828> [0]= 8: [0001-e419- 20- 5] [0001-e41a- 20- 5] [0001-e41b- 20- 5]
ks2718281828> [0001-e41c- 20- 5]
ks2718281828> [0]= c: [0001-e41d- 20- 5] [0001-e41e- 20- 5] [0001-e41f- 20- 5]
ks2718281828> [0001-e420- 20- 5]
ks2718281828> [0]= 10: [0001-e421- 20- 5] [0001-e422- 20- 5] [0001-e423- 20- 5]
ks2718281828> [0001-e424- 20- 5]
ks2718281828> [0]= 14: [0001-e425- 20- 5] [0001-e426- 20- 5] [0001-e427- 20- 5]
ks2718281828> [0001-e428- 20- 5]
ks2718281828> [0]= 18: [0001-e429- 20- 5] [0001-e42a- 20- 5] [0001-e42b- 20- 5]
ks2718281828> [0001-e42c- 20- 5]
ks2718281828> [0]= 1c: [0001-e42d- 20- 5] [0001-e42e- 20- 5] [0001-e42f- 20- 5]
ks2718281828> [0001-e430- 20- 5]
ks2718281828> [0]= 20: [0001-e431- 20- 5] [0001-e432- 20- 5] [0001-e433- 20- 5]
ks2718281828> [0001-e434- 20- 5]
ks2718281828> [0]= 24: [0001-e435- 20- 5] [0001-e436- 20- 5] [0001-e437- 20- 5]
ks2718281828> [0001-e438- 20- 5]
ks2718281828> [0]= 28: [0001-e439- 20- 5] [0001-e43a- 20- 5] [0001-e43b- 20- 5]
ks2718281828> [0001-e43c- 20- 5]
ks2718281828> [0]= 2c: [0001-e43d- 20- 5] [0001-e43e- 20- 5] [0001-e43f- 20- 5]
ks2718281828> [0001-e440- 20- 5]
ks2718281828>
ks2718281828> 中 略
ks2718281828>
ks2718281828> [0]=3d0: [0001-e3e1- 20- 5] [0001-e3e2- 20- 5] [0001-e3e3- 20- 5]
ks2718281828> [0001-e3e4- 20- 5]
ks2718281828> [0]=3d4: [0001-e3e5- 20- 5] [0001-e3e6- 20- 5] [0001-e3e7- 20- 5]
ks2718281828> [0001-e3e8- 20- 5]
ks2718281828> [0]=3d8: [0001-e3e9- 20- 5] [0001-e3ea- 20- 5] [0001-e3eb- 20- 5]
ks2718281828> [0001-e3ec- 20- 5]
ks2718281828> [0]=3dc: [0001-e3ed- 20- 5] [0001-e3ee- 20- 5] [0001-e3ef- 20- 5]
ks2718281828> [0001-e3f0- 20- 5]
ks2718281828> [0]=3e0: [0001-e3f1- 20- 5] [0001-e3f2- 20- 5] [0001-e3f3- 20- 5]
ks2718281828> [0001-e3f4- 20- 5]
ks2718281828> [0]=3e4: [0001-e3f5- 20- 5] [0001-e3f6- 20- 5] [0001-e3f7- 20- 5]
ks2718281828> [0001-e3f8- 20- 5]
ks2718281828> [0]=3e8: [0001-e3f9- 20- 5] [0001-e3fa- 20- 5] [0001-e3fb- 20- 5]
ks2718281828> [0001-e3fc- 20- 5]
ks2718281828> [0]=3ec: [0001-e3fd- 20- 5] [0001-e3fe- 20- 5] [0001-e3ff- 20- 5]
ks2718281828> [0001-e400- 20- 5]
ks2718281828> [0]=3f0: [0001-e401- 20- 5] [0001-e402- 20- 5] [0001-e403- 20- 5]
ks2718281828> [0001-e404- 20- 5]
ks2718281828> [0]=3f4: [0001-e405- 20- 5] [0001-e406- 20- 5] [0001-e407- 20- 5]
ks2718281828> [0001-e408- 20- 5]
ks2718281828> [0]=3f8: [0001-e409- 20- 5] [0001-e40a- 20- 5] [0001-e40b- 20- 5]
ks2718281828> [0001-e40c- 20- 5]
ks2718281828> [0]=3fc: [0001-e40d- 20- 5] [0001-e40e- 20- 5] [0001-e40f- 20- 5]
ks2718281828> [0001-e410- 20- 5]
ks2718281828> [0]<[0] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[0] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[1] send kackp seq=e4dd, seq_sent=e0ec seq_acked=e0ec, nsend=8,
ks2718281828> seq_sendack=0 stat=200
ks2718281828> [0]>[1] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[2] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[2] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[3] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[3] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[4] send kackp seq=3f1, seq_sent=3f0 seq_acked=3f0, nsend=8,
ks2718281828> seq_sendack=0 stat=0
ks2718281828> [0]>[4] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> ==================================
ks2718281828> ==================================
ks2718281828> ==================================
ks2718281828> Ethernet PM context #3 information sizeof sc=5008
ks2718281828> tx_p=0000e8cc, tx_c=0000e4dc,tx_s=0000e8cc, tx_bp=0074da80, tx_bc=0072e000
ks2718281828>
ks2718281828> dump tx descriptors prod=e8cc(cc) cons=e4dc(dc) sent=e8cc mask=3ff
ks2718281828> [0]= 0: [0001-e411- 20- 5] [0001-e412- 20- 5] [0001-e413- 20- 5]
ks2718281828> [0001-e414- 20- 5]
ks2718281828> [0]= 4: [0001-e415- 20- 5] [0001-e416- 20- 5] [0001-e417- 20- 5]
ks2718281828> [0001-e418- 20- 5]
ks2718281828> [0]= 8: [0001-e419- 20- 5] [0001-e41a- 20- 5] [0001-e41b- 20- 5]
ks2718281828> [0001-e41c- 20- 5]
ks2718281828> [0]= c: [0001-e41d- 20- 5] [0001-e41e- 20- 5] [0001-e41f- 20- 5]
ks2718281828> [0001-e420- 20- 5]
ks2718281828>
ks2718281828> 中 略
ks2718281828>
ks2718281828> [0]=3e8: [0001-e3f9- 20- 5] [0001-e3fa- 20- 5] [0001-e3fb- 20- 5]
ks2718281828> [0001-e3fc- 20- 5]
ks2718281828> [0]=3ec: [0001-e3fd- 20- 5] [0001-e3fe- 20- 5] [0001-e3ff- 20- 5]
ks2718281828> [0001-e400- 20- 5]
ks2718281828> [0]=3f0: [0001-e401- 20- 5] [0001-e402- 20- 5] [0001-e403- 20- 5]
ks2718281828> [0001-e404- 20- 5]
ks2718281828> [0]=3f4: [0001-e405- 20- 5] [0001-e406- 20- 5] [0001-e407- 20- 5]
ks2718281828> [0001-e408- 20- 5]
ks2718281828> [0]=3f8: [0001-e409- 20- 5] [0001-e40a- 20- 5] [0001-e40b- 20- 5]
ks2718281828> [0001-e40c- 20- 5]
ks2718281828> [0]=3fc: [0001-e40d- 20- 5] [0001-e40e- 20- 5] [0001-e40f- 20- 5]
ks2718281828> [0001-e410- 20- 5]
ks2718281828> [0]<[0] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[0] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[1] send kackp seq=e4dd, seq_sent=e0ec seq_acked=e0ec, nsend=8,
ks2718281828> seq_sendack=0 stat=200
ks2718281828> [0]>[1] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[2] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[2] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[3] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[3] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[4] send kackp seq=3f1, seq_sent=3f0 seq_acked=3f0, nsend=8,
ks2718281828> seq_sendack=0 stat=0
ks2718281828> [0]>[4] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> ==================================
ks2718281828> ==================================
ks2718281828> ==================================
ks2718281828> Ethernet PM context #3 information sizeof sc=5008
ks2718281828> tx_p=0000e8cc, tx_c=0000e4dc,tx_s=0000e8cc, tx_bp=0074da80, tx_bc=0072e000
ks2718281828>
ks2718281828> dump tx descriptors prod=e8cc(cc) cons=e4dc(dc) sent=e8cc mask=3ff
ks2718281828> [0]= 0: [0001-e411- 20- 5] [0001-e412- 20- 5] [0001-e413- 20- 5]
ks2718281828> [0001-e414- 20- 5]
ks2718281828> [0]= 4: [0001-e415- 20- 5] [0001-e416- 20- 5] [0001-e417- 20- 5]
ks2718281828> [0001-e418- 20- 5]
ks2718281828> [0]= 8: [0001-e419- 20- 5] [0001-e41a- 20- 5] [0001-e41b- 20- 5]
ks2718281828> [0001-e41c- 20- 5]
ks2718281828> [0]= c: [0001-e41d- 20- 5] [0001-e41e- 20- 5] [0001-e41f- 20- 5]
ks2718281828> [0001-e420- 20- 5]
ks2718281828> [0]= 10: [0001-e421- 20- 5] [0001-e422- 20- 5] [0001-e423- 20- 5]
ks2718281828> [0001-e424- 20- 5]
ks2718281828> [0]= 14: [0001-e425- 20- 5] [0001-e426- 20- 5] [0001-e427- 20- 5]
ks2718281828> [0001-e428- 20- 5]
ks2718281828>
ks2718281828> 中 略
ks2718281828>
ks2718281828> [0]=3e4: [0001-e3f5- 20- 5] [0001-e3f6- 20- 5] [0001-e3f7- 20- 5]
ks2718281828> [0001-e3f8- 20- 5]
ks2718281828> [0]=3e8: [0001-e3f9- 20- 5] [0001-e3fa- 20- 5] [0001-e3fb- 20- 5]
ks2718281828> [0001-e3fc- 20- 5]
ks2718281828> [0]=3ec: [0001-e3fd- 20- 5] [0001-e3fe- 20- 5] [0001-e3ff- 20- 5]
ks2718281828> [0001-e400- 20- 5]
ks2718281828> [0]=3f0: [0001-e401- 20- 5] [0001-e402- 20- 5] [0001-e403- 20- 5]
ks2718281828> [0001-e404- 20- 5]
ks2718281828> [0]=3f4: [0001-e405- 20- 5] [0001-e406- 20- 5] [0001-e407- 20- 5]
ks2718281828> [0001-e408- 20- 5]
ks2718281828> [0]=3f8: [0001-e409- 20- 5] [0001-e40a- 20- 5] [0001-e40b- 20- 5]
ks2718281828> [0001-e40c- 20- 5]
ks2718281828> [0]=3fc: [0001-e40d- 20- 5] [0001-e40e- 20- 5] [0001-e40f- 20- 5]
ks2718281828> [0001-e410- 20- 5]
ks2718281828> [0]<[0] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[0] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[1] send kackp seq=e4dd, seq_sent=e0ec seq_acked=e0ec, nsend=8,
ks2718281828> seq_sendack=0 stat=200
ks2718281828> [0]>[1] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[2] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[2] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[3] send kackp seq=1, seq_sent=0 seq_acked=0, nsend=8, seq_sendack=0
ks2718281828> stat=0
ks2718281828> [0]>[3] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [0]<[4] send kackp seq=3f1, seq_sent=3f0 seq_acked=3f0, nsend=8,
ks2718281828> seq_sendack=0 stat=0
ks2718281828> [0]>[4] recv que prod=1, cons=1[10] idx=1 offset=0 flags=0
ks2718281828> [taro @ server test]$
ks2718281828>
ks2718281828>
ks2718281828> ----------------------------------------------------------------------------------
ks2718281828>
ks2718281828> ----- Original Message -----
ks2718281828> From: "Kameyama Toyohisa" <kameyama @ pccluster.org>
ks2718281828> To: "Taro Sakata" <ks2718281828 @ mopera.net>
ks2718281828> Cc: <score-users-jp @ pccluster.org>
ks2718281828> Sent: Tuesday, September 28, 2010 11:56 AM
ks2718281828> Subject: Re: [SCore-users-jp] Fwd: SCore7でのネットワークトランキングについて
ks2718281828>
ks2718281828>
ks2718281828>
ks2718281828>
ks2718281828> > 亀山です.
ks2718281828> >
ks2718281828> > (09/28/10 11:53), Taro Sakata Wrote:
ks2718281828> >>> ためしに pmxtest を実行してみてください.
ks2718281828> >>> $ pmxtest -iter 1 -network etherxhb
ks2718281828> >>
ks2718281828> >> 実行してみましたところ下記でした。
ks2718281828> >
ks2718281828> > あ, scrun は scout 上で実行していなかったのですね.
ks2718281828> >
ks2718281828> > $ scout -g machinefile -e pmxtest -iter 1 -network etherhxb
ks2718281828> >
ks2718281828> > ではどうでしょうか?
ks2718281828> >
ks2718281828> > Kameyama Toyohisa
ks2718281828> >
ks2718281828>
ks2718281828> ----- Original Message -----
ks2718281828> From: "Kameyama Toyohisa" <kameyama @ pccluster.org>
ks2718281828> To: "Taro Sakata" <ks2718281828 @ mopera.net>
ks2718281828> Cc: <score-users-jp @ pccluster.org>
ks2718281828> Sent: Tuesday, September 28, 2010 11:56 AM
ks2718281828> Subject: Re: [SCore-users-jp] Fwd: SCore7でのネットワークトランキングについて
ks2718281828>
ks2718281828>
ks2718281828>
ks2718281828> > 亀山です.
ks2718281828> >
ks2718281828> > (09/28/10 11:53), Taro Sakata Wrote:
ks2718281828> >>> ためしに pmxtest を実行してみてください.
ks2718281828> >>> $ pmxtest -iter 1 -network etherxhb
ks2718281828> >>
ks2718281828> >> 実行してみましたところ下記でした。
ks2718281828> >
ks2718281828> > あ, scrun は scout 上で実行していなかったのですね.
ks2718281828> >
ks2718281828> > $ scout -g machinefile -e pmxtest -iter 1 -network etherhxb
ks2718281828> >
ks2718281828> > ではどうでしょうか?
ks2718281828> >
ks2718281828> > Kameyama Toyohisa
ks2718281828> >
ks2718281828>
ks2718281828> _______________________________________________
ks2718281828> SCore-users-jp mailing list
ks2718281828> SCore-users-jp @ pccluster.org
ks2718281828> http://www.pccluster.org/mailman/listinfo/score-users-jp
------
Shinji Sumimoto, Fujitsu
SCore-users-jp メーリングリストの案内