From chisaki @ cs.kumamoto-u.ac.jp Sat Jun 1 12:29:11 2002 From: chisaki @ cs.kumamoto-u.ac.jp (Yoshifumi CHISAKI) Date: Sat, 1 Jun 2002 12:29:11 +0900 Subject: [SCore-users-jp] Permission denied Message-ID: <20020601032912.14647@vivaldi.cs.kumamoto-u.ac.jp> 苣木(ちさき)です。 rootで, [root @ parallel-a021 root]# scout -g seg20 SCOUT: Spawning done. SCOUT: session started. [root @ parallel-a021 root]# は動作するのですが, 一般ユーザーでは, [testuser @ parallel-a021 testuser]$ scout -g seg20 Permission denied. で,停止してしまいます。 どこを最初に疑えばよろしいでしょうか? とりあえず, chmod -R go+rw /opt/score chmod -R go+rw /home/testuser は行ったのですが,改善しませんでした。 お知恵を拝借できれば幸いです。 では。 From kameyama @ pccluster.org Mon Jun 3 10:29:07 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Mon, 03 Jun 2002 10:29:07 +0900 Subject: [SCore-users-jp] Permission denied In-Reply-To: Your message of "Sat, 01 Jun 2002 12:29:11 JST." <20020601032912.14647@vivaldi.cs.kumamoto-u.ac.jp> Message-ID: <200206030129.g531T8v10673@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. In article <20020601032912.14647 @ vivaldi.cs.kumamoto-u.ac.jp> Yoshifumi CHISAKI wrotes: > 一般ユーザーでは, > [testuser @ parallel-a021 testuser]$ scout -g seg20 > Permission denied. > > で,停止してしまいます。 > > どこを最初に疑えばよろしいでしょうか? とりあえず, その user が compute host で認識できない 可能性が高そうです. NIS を使用しているのでしたら, NIS にその user が登録されていないとか, NIS を使用していなけば, 各 host の /etc/passwd 及び /etc/shadow に登録 されていないとか... > とりあえず, > > chmod -R go+rw /opt/score > chmod -R go+rw /home/testuser > > は行ったのですが,改善しませんでした。 多分, これは無関係だと思います. (というか, セキュリティ上, 危なくなるだけです.) score のせいかどうか切り分けるのでしたら, % rsh-all -g seg20 date を実行して rsh できるかどうか確認するとか, 各 compute host にその user で rlogin してみるとかしてみてください. from Kameyama Toyohisa From ishikawa @ is.s.u-tokyo.ac.jp Tue Jun 4 21:22:35 2002 From: ishikawa @ is.s.u-tokyo.ac.jp (Yutaka Ishikawa) Date: Tue, 04 Jun 2002 21:22:35 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020604.211708.125107394.nishida@is.s.u-tokyo.ac.jp> References: <20020604.211708.125107394.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> 既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を メーリングリストに流して頂けると幸いです。 石川@隣の部屋から:-) From: NISHIDA Akira > 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE > の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る > には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. > それとも一般の GbE カードで構わないのでしょうか. > > 以上, お手数ですがご教示いただければ幸いです. > > -- > 西田 晃 > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > E-mail : nishida @ is.s.u-tokyo.ac.jp > > > > _______________________________________________ > score-info-jp mailing list > score-info-jp @ pccluster.org > http://www.pccluster.org/mailman/listinfo/score-info-jp From hori @ swimmy-soft.com Tue Jun 4 21:36:52 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Tue, 4 Jun 2002 21:36:52 +0900 Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> References: <20020604.211708.125107394.nishida@is.s.u-tokyo.ac.jp> Message-ID: <3106071412.hori0000@mail.bestsystems.co.jp> 堀@スイミー・ソフトウェアです. SCASH (Omni) は PM/Ethernet で動くんでしたっけ? >既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を >メーリングリストに流して頂けると幸いです。 > >石川@隣の部屋から:-) > >From: NISHIDA Akira >> 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE >> の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る >> には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. >> それとも一般の GbE カードで構わないのでしょうか. From kameyama @ pccluster.org Tue Jun 4 21:44:08 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Tue, 04 Jun 2002 21:44:08 +0900 Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: Your message of "Tue, 04 Jun 2002 21:36:52 JST." <3106071412.hori0000@mail.bestsystems.co.jp> Message-ID: <200206041244.g54Ci8v19568@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. In article <3106071412.hori0000 @ mail.bestsystems.co.jp> Atsushi HORI wrotes: > SCASH (Omni) は PM/Ethernet で動くんでしたっけ? 性能はともかく, 一応は動きます. from Kameyama Toyohisa From nishida @ is.s.u-tokyo.ac.jp Tue Jun 4 21:45:03 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Tue, 04 Jun 2002 21:45:03 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> References: <20020604.211708.125107394.nishida@is.s.u-tokyo.ac.jp> <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> Message-ID: <20020604.214503.46616373.nishida@is.s.u-tokyo.ac.jp> 西田です. > 既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を > メーリングリストに流して頂けると幸いです。 特定のアーキテクチャには依存していないということですね. 了解しました. 評価してみます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp > > 石川@隣の部屋から:-) > > From: NISHIDA Akira > > 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE > > の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る > > には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. > > それとも一般の GbE カードで構わないのでしょうか. > > > > > > 以上, お手数ですがご教示いただければ幸いです. > > > > -- > > 西田 晃 > > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > > E-mail : nishida @ is.s.u-tokyo.ac.jp > > > > > > > > _______________________________________________ > > score-info-jp mailing list > > score-info-jp @ pccluster.org > > http://www.pccluster.org/mailman/listinfo/score-info-jp > From nishida @ is.s.u-tokyo.ac.jp Tue Jun 4 21:56:40 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Tue, 04 Jun 2002 21:56:40 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <200206041244.g54Ci8v19568@yl-dhcp18.is.s.u-tokyo.ac.jp> References: <3106071412.hori0000@mail.bestsystems.co.jp> <200206041244.g54Ci8v19568@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <20020604.215640.74187598.nishida@is.s.u-tokyo.ac.jp> 西田です. > 亀山です. > > In article <3106071412.hori0000 @ mail.bestsystems.co.jp> Atsushi HORI wrotes: > > SCASH (Omni) は PM/Ethernet で動くんでしたっけ? > > 性能はともかく, 一応は動きます. PM/Ethernet ではゼロコピー通信が使えないとのことですので, あまり性能は出ないということでしょうか. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp From msato @ is.tsukuba.ac.jp Wed Jun 5 01:02:13 2002 From: msato @ is.tsukuba.ac.jp (Mitsuhisa Sato) Date: Wed, 05 Jun 2002 01:02:13 +0900 Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020604.215640.74187598.nishida@is.s.u-tokyo.ac.jp> References: <3106071412.hori0000@mail.bestsystems.co.jp> <200206041244.g54Ci8v19568@yl-dhcp18.is.s.u-tokyo.ac.jp> <20020604.215640.74187598.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020605010213P.msato@is.tsukuba.ac.jp> 西田さん、 どのような計算をしようとしているのかによりますが、threadprivateにして やるとか、distribution(mapping)を指定しなくては性能はでないのではない かとおもいます。我々も現在、etherneetでどのくらいの性能がでるのかにつ いて詳しくしらべているところです。 さとう。 From: NISHIDA Akira Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Tue, 04 Jun 2002 21:56:40 +0900 (JST) > 西田です. > > > 亀山です. > > > > In article <3106071412.hori0000 @ mail.bestsystems.co.jp> Atsushi HORI wrotes: > > > SCASH (Omni) は PM/Ethernet で動くんでしたっけ? > > > > 性能はともかく, 一応は動きます. > > PM/Ethernet ではゼロコピー通信が使えないとのことですので, > あまり性能は出ないということでしょうか. From nishida @ is.s.u-tokyo.ac.jp Wed Jun 5 01:34:17 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Wed, 05 Jun 2002 01:34:17 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020605010213P.msato@is.tsukuba.ac.jp> References: <200206041244.g54Ci8v19568@yl-dhcp18.is.s.u-tokyo.ac.jp> <20020604.215640.74187598.nishida@is.s.u-tokyo.ac.jp> <20020605010213P.msato@is.tsukuba.ac.jp> Message-ID: <20020605.013417.125107960.nishida@is.s.u-tokyo.ac.jp> 西田です. (蓬来君にも CC します.) > 西田さん、 > > どのような計算をしようとしているのかによりますが、threadprivateにして > やるとか、distribution(mapping)を指定しなくては性能はでないのではない > かとおもいます。我々も現在、etherneetでどのくらいの性能がでるのかにつ > いて詳しくしらべているところです。 > > さとう。 現在のところ並列化 BLAS の評価等を予定していますが, いろいろ工夫する 必要がありそうですね. とりあえず, NIC 本来の性能が出るかどうか評価してみます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp > From: NISHIDA Akira > Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards > Date: Tue, 04 Jun 2002 21:56:40 +0900 (JST) > > > 西田です. > > > > > 亀山です. > > > > > > In article <3106071412.hori0000 @ mail.bestsystems.co.jp> Atsushi HORI wrotes: > > > > SCASH (Omni) は PM/Ethernet で動くんでしたっけ? > > > > > > 性能はともかく, 一応は動きます. > > > > PM/Ethernet ではゼロコピー通信が使えないとのことですので, > > あまり性能は出ないということでしょうか. > > From s-sumi @ flab.fujitsu.co.jp Wed Jun 5 14:31:10 2002 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Wed, 05 Jun 2002 14:31:10 +0900 (JST) Subject: [SCore-users-jp] NIC In-Reply-To: <20020529190117.7eff0b15.yamanaka@exassia.tmit.ac.jp> References: <20020523160546.4febe64a.yamanaka@exassia.tmit.ac.jp> <3CEC9EB0.2080408@bd6.so-net.ne.jp> <20020529190117.7eff0b15.yamanaka@exassia.tmit.ac.jp> Message-ID: <20020605.143110.719911037.s-sumi@flab.fujitsu.co.jp> 住元です。 返事が遅くなりました。 From: Yamanaka Kenshi Subject: Re: [SCore-users-jp] NIC Date: Wed, 29 May 2002 19:01:17 +0900 Message-ID: <20020529190117.7eff0b15.yamanaka @ exassia.tmit.ac.jp> yamanaka> 山中です。住元様、返事が遅れて申し訳ありません。 yamanaka> おっしゃる通りにRX_RING_SIZEを128にしてモジュールを再コンパイルしました。 yamanaka> しかし、相変わらず、 yamanaka> eth0: Inconsistent Rx descriptor chain. yamanaka> というメッセージが出ています。 yamanaka> 一応プログラム自体は走っているのですが、これがドライバーのエラーと言うことですので、 yamanaka> ネットワークでのデータ転送に信頼性がなくなっていると言うことになるのでしょうか? yamanaka> yamanaka> また空きメモリが極端に少なくなると言うのは、 yamanaka> cat /proc/meminfo yamanaka> 又は、 yamanaka> free yamanaka> の結果でmem freeを見れば良いのでしょうか? yamanaka> だとすると、IntelのNICを挿したノードと極端に違いはないようです。 はい、メモリリークも大丈夫のようですね、、 後は、こちらで試してみるくらいでしょうか? 確かVIAのNICは東大にあったと思いますので、ちょっと試してみます。 ------ Shinji Sumimoto, Fujitsu Labs From s-sumi @ flab.fujitsu.co.jp Wed Jun 5 14:47:46 2002 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Wed, 05 Jun 2002 14:47:46 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020604.214503.46616373.nishida@is.s.u-tokyo.ac.jp> References: <20020604.211708.125107394.nishida@is.s.u-tokyo.ac.jp> <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> <20020604.214503.46616373.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020605.144746.576041248.s-sumi@flab.fujitsu.co.jp> 住元です。 Broadcom 5700 (3Com 966 etc..) については動作確認済です。 ただし、性能を出すためには、デバイスドライバを新しくする必要がある (SCore 5.0.1に入っているのはちょっと古いです)のと、デバイスドライバの パラメータのチューニングを行う必要があります。ちゃんとチューニングする と結構良い性能だったりします。 今日、新しいドライバと初期値を変更したものをコンソーシアムのマシンの CVSにCommitしましたので、必要なら石川研究室に行ってもらってください。 あと、PM/EthernetでのSCASHですが、SCASHが使っているPM/EthernetのpmRead の実装は最適化の余地があり、現在作業中です。 ですので、とりあえず現状ので試して頂いて、是非,結果を測定して教えて頂 ければと思います。 From: NISHIDA Akira Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Tue, 04 Jun 2002 21:45:03 +0900 (JST) Message-ID: <20020604.214503.46616373.nishida @ is.s.u-tokyo.ac.jp> nishida> 西田です. nishida> nishida> > 既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を nishida> > メーリングリストに流して頂けると幸いです。 nishida> nishida> 特定のアーキテクチャには依存していないということですね. 了解しました. nishida> 評価してみます. nishida> nishida> -- nishida> 西田 晃 nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp nishida> nishida> > nishida> > 石川@隣の部屋から:-) nishida> > nishida> > From: NISHIDA Akira nishida> > > 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE nishida> > > の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る nishida> > > には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. nishida> > > それとも一般の GbE カードで構わないのでしょうか. nishida> > nishida> > nishida> > > nishida> > > 以上, お手数ですがご教示いただければ幸いです. nishida> > > nishida> > > -- nishida> > > 西田 晃 nishida> > > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 nishida> > > E-mail : nishida @ is.s.u-tokyo.ac.jp nishida> > > nishida> > > nishida> > > nishida> > > _______________________________________________ nishida> > > score-info-jp mailing list nishida> > > score-info-jp @ pccluster.org nishida> > > http://www.pccluster.org/mailman/listinfo/score-info-jp nishida> > nishida> _______________________________________________ nishida> SCore-users-jp mailing list nishida> SCore-users-jp @ pccluster.org nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp nishida> ------ Shinji Sumimoto, Fujitsu Labs From nishida @ is.s.u-tokyo.ac.jp Wed Jun 5 15:24:25 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Wed, 05 Jun 2002 15:24:25 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020605.144746.576041248.s-sumi@flab.fujitsu.co.jp> References: <20020604.212235.607961858.ishikawa@is.s.u-tokyo.ac.jp> <20020604.214503.46616373.nishida@is.s.u-tokyo.ac.jp> <20020605.144746.576041248.s-sumi@flab.fujitsu.co.jp> Message-ID: <20020605.152425.68541537.nishida@is.s.u-tokyo.ac.jp> 西田です. > 住元です。 > > Broadcom 5700 (3Com 966 etc..) については動作確認済です。 > > ただし、性能を出すためには、デバイスドライバを新しくする必要がある > (SCore 5.0.1に入っているのはちょっと古いです)のと、デバイスドライバの > パラメータのチューニングを行う必要があります。ちゃんとチューニングする > と結構良い性能だったりします。 > 今日、新しいドライバと初期値を変更したものをコンソーシアムのマシンの > CVSにCommitしましたので、必要なら石川研究室に行ってもらってください。 > あと、PM/EthernetでのSCASHですが、SCASHが使っているPM/EthernetのpmRead > の実装は最適化の余地があり、現在作業中です。 > ですので、とりあえず現状ので試して頂いて、是非,結果を測定して教えて頂 > ければと思います。 今新しいファイルを送って頂きました. ありがとうございました. さっそく評価してみます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp > From: NISHIDA Akira > Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards > Date: Tue, 04 Jun 2002 21:45:03 +0900 (JST) > Message-ID: <20020604.214503.46616373.nishida @ is.s.u-tokyo.ac.jp> > > nishida> 西田です. > nishida> > nishida> > 既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を > nishida> > メーリングリストに流して頂けると幸いです。 > nishida> > nishida> 特定のアーキテクチャには依存していないということですね. 了解しました. > nishida> 評価してみます. > nishida> > nishida> -- > nishida> 西田 晃 > nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp > nishida> > nishida> > > nishida> > 石川@隣の部屋から:-) > nishida> > > nishida> > From: NISHIDA Akira > nishida> > > 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE > nishida> > > の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る > nishida> > > には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. > nishida> > > それとも一般の GbE カードで構わないのでしょうか. > nishida> > > nishida> > > nishida> > > > nishida> > > 以上, お手数ですがご教示いただければ幸いです. > nishida> > > > nishida> > > -- > nishida> > > 西田 晃 > nishida> > > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > nishida> > > E-mail : nishida @ is.s.u-tokyo.ac.jp > nishida> > > > nishida> > > > nishida> > > > nishida> > > _______________________________________________ > nishida> > > score-info-jp mailing list > nishida> > > score-info-jp @ pccluster.org > nishida> > > http://www.pccluster.org/mailman/listinfo/score-info-jp > nishida> > > nishida> _______________________________________________ > nishida> SCore-users-jp mailing list > nishida> SCore-users-jp @ pccluster.org > nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp > nishida> > ------ > Shinji Sumimoto, Fujitsu Labs > From chisaki @ cs.kumamoto-u.ac.jp Wed Jun 5 16:21:38 2002 From: chisaki @ cs.kumamoto-u.ac.jp (Yoshifumi CHISAKI) Date: Wed, 5 Jun 2002 16:21:38 +0900 Subject: [SCore-users-jp] Permission denied In-Reply-To: <200206030129.g531T8v10673@yl-dhcp18.is.s.u-tokyo.ac.jp> References: <200206030129.g531T8v10673@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <20020605072139.27037@vivaldi.cs.kumamoto-u.ac.jp> 苣木です。 kameyama @ pccluster.org wrote to 02.6.3 10:29: >亀山です. > >In article <20020601032912.14647 @ vivaldi.cs.kumamoto-u.ac.jp> Yoshifumi CHISAKI wrotes: >> 一般ユーザーでは, >> [testuser @ parallel-a021 testuser]$ scout -g seg20 >> Permission denied. >> >> で,停止してしまいます。 >> >> どこを最初に疑えばよろしいでしょうか? > >とりあえず, > その user が compute host で認識できない >可能性が高そうです. >NIS を使用しているのでしたら, NIS にその user が登録されていないとか, >NIS を使用していなけば, 各 host の /etc/passwd 及び /etc/shadow に登録 >されていないとか... 当たりでした。 ちまちまとインストールして,local userで SCoreが正常に動作することが確認できましたので, NFS と NIS を利用するように移行していました。 結局は,localにあるUIDをちゃんと削除していなかったからでした。 ※サーバーから見てUIDがバラバラになっておりました。 追記: .rhosts ホスト名を書けばパスワードなしでloginできると思って, /home/chisaki/.rhostsを設定しておりましたが, パスワードを聞かれて,???になっておりました。 localにあるUIDでは.rhostsが有効であり, NISのユーザーでは駄目でした。 結局は,/etc/hosts.equivに書き込みました。 基本的なところで躓いていました。 >> とりあえず, >> >> chmod -R go+rw /opt/score >> chmod -R go+rw /home/testuser >> >> は行ったのですが,改善しませんでした。 > >多分, これは無関係だと思います. >(というか, セキュリティ上, 危なくなるだけです.) そうですね。 一応,globalとは切り離していたので, 上記も試してみました。 今は,元に戻しました。 コメントありがとうございました。 また,よろしくお願いいたします。 では,では。 From nishida @ is.s.u-tokyo.ac.jp Thu Jun 6 17:02:06 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Thu, 06 Jun 2002 17:02:06 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020605.152425.68541537.nishida@is.s.u-tokyo.ac.jp> References: <20020604.214503.46616373.nishida@is.s.u-tokyo.ac.jp> <20020605.144746.576041248.s-sumi@flab.fujitsu.co.jp> <20020605.152425.68541537.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020606.170206.45273046.nishida@is.s.u-tokyo.ac.jp> 西田です. 頂いたファイルを使用して, テストを行ってみました. 結果を以下に 添付させて頂きます. いくつか異常と思われる箇所 (★印の部分) があるのですが, これらは 既知の問題でしょうか. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp * テスト環境 SCore 5.0.1 + 送って頂いたファイル ノード構成: dual Xeon 2GHz x 4 out0.is.s.u-tokyo.ac.jp サーバ・計算ホスト out1-3.is.s.u-tokyo.ac.jp 計算ホスト NIC: 3C920, Broadcom GbE (GbE (eth1) のアドレスは 192.168.0.1-4) OS: Red Hat Linux 7.2 に手動でインストール. kernel: linux-2.4.18 + linux2.4.18.score.patch +linux2.4.18.score2.patch * dmesg の出力結果 PM memory support Register pm_memory as major(123) PM/Ethernet: "$Id: pm_ethernet_dev.c,v 1.1.2.1 2002/03/28 03:05:14 kameyama Exp $" PM/Ethernet: register etherpm device as major(122) pmshmem: version = $Id: pm_shmem.c,v 1.1 2002/02/18 11:40:10 kameyama Exp $ pmshmem_init: register pm_shmem as major(124) etherpm0: 16 contexts using 4096KB MEM, maxunit=4, maxnodes=512, mtu=1468, eth1. etherpm0: Interrupt Reaping on eth1, irq 20 * サーバホストの設定 % cat /opt/score/etc/scorehosts.db /* * SCore 3.0 scorehosts.db * This is a sample of scorehosts.db. */ /* PM/Myrinet */ myrinet type=myrinet \ -firmware:file=/opt/score/share/lanai/lanai.mcp \ -config:file=/opt/score/etc/pm-myrinet.conf /* PM/Ethernet */ ethernet type=ethernet \ -config:file=/opt/score/etc/pm-ethernet.conf /* PM/Agent/UDP */ udp type=agent -agent=pmaudp \ -config:file=/opt/score/etc/pm-udp.conf /* PM/SHMEM */ % cat /opt/score/etc/pm-udp.conf # Host Number Host Name [IP Address] 0 out0.is.s.u-tokyo.ac.jp 192.168.0.1 1 out1.is.s.u-tokyo.ac.jp 192.168.0.2 2 out2.is.s.u-tokyo.ac.jp 192.168.0.3 3 out3.is.s.u-tokyo.ac.jp 192.168.0.4 % cat /opt/score/etc/pm-ethernet.conf unit 0 # maxnsend 0 - 32 # maxnsend 0 # backoff 1000 - 20000 (usec) backoff 4800 # checksum (0 if off, 1 is on) checksum 0 # PE MAC address base hostname # comment 0 00:10:18:01:7E:3E out0.is.s.u-tokyo.ac.jp # on eth1 1 00:10:18:01:7E:60 out1.is.s.u-tokyo.ac.jp # on eth1 2 00:10:18:00:07:B6 out2.is.s.u-tokyo.ac.jp # on eth1 3 00:10:18:00:07:D0 out3.is.s.u-tokyo.ac.jp # on eth1 * PM テスト結果 % hostname out0.is.s.u-tokyo.ac.jp % scorehosts -l -g pcc out0.is.s.u-tokyo.ac.jp out1.is.s.u-tokyo.ac.jp out2.is.s.u-tokyo.ac.jp out3.is.s.u-tokyo.ac.jp 4 hosts found. % sceptic -v -g pcc ★ 異常 out0.is.s.u-tokyo.ac.jp: scping FAILED out1.is.s.u-tokyo.ac.jp: scping FAILED out2.is.s.u-tokyo.ac.jp: OK out3.is.s.u-tokyo.ac.jp: scping FAILED out1.is.s.u-tokyo.ac.jp: OK out0.is.s.u-tokyo.ac.jp: OK out3.is.s.u-tokyo.ac.jp: OK All host responding. % msgb -group pcc & [1] 29579 % scout -g pcc SCOUT: Spawning done. SCOUT: session started. scout [out0-3]: SCOUT(5.0.1): Ready. % date Thu Jun 6 14:45:37 JST 2002 % scout date [out0-3]: Thu Jun 6 14:45:38 JST 2002 * PM/Ethernet テスト結果 % cd /opt/score/sbin (% ./rpmtest out1.is.s.u-tokyo.ac.jp ethernet -reply) % ./rpmtest out0.is.s.u-tokyo.ac.jp ethernet -dest 1 -ping 8 0.00096654 % ./scstest -network ethernet ★ 異常 SCSTEST: BURST on ethernet(chan=0,ctx=0,len=16) out0( 0) burst: pmGetSendBuffer: Connection timed out(110) out2( 2) burst: pmGetSendBuffer: Connection timed out(110) out3( 3) burst: pmGetSendBuffer: Connection timed out(110) out1( 1) burst: pmGetSendBuffer: Connection timed out(110) * PM/Agent/UDP テスト結果 % cd /opt/score/sbin % ./rpmtest out0.is.s.u-tokyo.ac.jp udp -iter 10000 -dest 0 -ping 8 4.47301e-05 (./rpmtest out1.is.s.u-tokyo.ac.jp udp -reply) % ./rpmtest out0.is.s.u-tokyo.ac.jp udp -iter 10000 -dest 1 -ping 8 0.000181472 * PM/Shmem テスト結果 % cd /opt/score/sbin % ./rpminit out0.is.s.u-tokyo.ac.jp shmem0 % ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -ping 8 1.21174e-06 (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -reply) % ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread ★ 異常 8 3.0676e+06 * SCore-D テスト結果 Single-User 環境 % printenv |grep SCBD SCBDSERV=out0.is.s.u-tokyo.ac.jp % msgb -group pcc & % cp /opt/score/example/mttl/hello.cc /tmp % cd /tmp % mpc++ -o hello hello.cc % scrun -nodes=1 ./hello ★ 異常 FEP:ERROR Command not found (./hello) % cp /opt/score/example/mpi/cpi.c /tmp % mpicc -o cpi cpi.c -lm % scrun ./cpi ★ 異常 FEP:ERROR Command not found (./cpi) % exit exit SCOUT: Session done. Multi-User 環境 (% /bin/su - Password: [root @ out0 root]# scout -g pcc SCOUT: Spawning done. SCOUT: session started. [root @ out0 root]# scored SYSLOG: /opt/score/deploy/scored SYSLOG: SCore-D 5.0.1 $Id: init.cc,v 1.66 2002/02/13 04:18:40 hori Exp $ SYSLOG: Compile option(s): SYSLOG: SCore-D network: ethernet/ethernet SYSLOG: Cluster[0]: (0..3)x2.i386-redhat7-linux2_4.xeon.2000 SYSLOG: Memory: 1004[MB], Swap: 1993[MB], Disk: 15080[MB] SYSLOG: Network[0]: ethernet/ethernet SYSLOG: Network[1]: udp/agent SYSLOG: Scheduler initiated: Timeslice = 500 [msec] SYSLOG: Queue[0] activated, exclusive scheduling SYSLOG: Queue[1] activated, time-sharing scheduling SYSLOG: Queue[2] activated, time-sharing scheduling SYSLOG: Session ID: 0 SYSLOG: Server Host: out3.is.s.u-tokyo.ac.jp SYSLOG: Backup Host: out1.is.s.u-tokyo.ac.jp SYSLOG: Operated by: root SYSLOG: ========= SCore-D (5.0.1) bootup in SECURE MODE ========) % scrun -scored=out0.is.s.u-tokyo.ac.jp ./hello ★ 異常 FEP:ERROR Command not found (./hello) % setenv SCORE_OPTIONS scored=out0.is.s.u-tokyo.ac.jp % scrun ./cpi ★ 異常 FEP:ERROR Command not found (./cpi) % mpirun ./cpi ★ 異常 FEP:ERROR Command not found (/tmp/./cpi) % su - Password: # sc_console out0.is.s.u-tokyo.ac.jp -c shutdown ★ 異常 Unable to connect with out0.is.s.u-tokyo.ac.jp:9991. (scored を起動したウィンドウで Ctrl-C.) # exit exit SCOUT: Session done. * デモ実行結果 % scout -g pcc SCOUT: Spawning done. SCOUT: session started. % cd /opt/score/demo/mandel % scrun -nodes=4 /opt/score/demo/bin/mandel ★ 異常 FEP: Unable to connect with SCore-D (out0.is.s.u-tokyo.ac.jp) SCore-D 5.0.1 connected. <0> SCORE-D:ERROR pmGetSendBuffer(dest=2,size=1388) timed out <0> ULT: Exception Signal (11) <0> SCORE: Program signaled (SIGTERM). % scrun /opt/score/demo/bin/mandel ★ 異常 FEP: Unable to connect with SCore-D (out0.is.s.u-tokyo.ac.jp) SCore-D 5.0.1 connected. <0> SCORE-D:ERROR pmGetSendBuffer(dest=2,size=1388) timed out <0> ULT: Exception Signal (11) (Ctrl-C で停止) % scrun -nodes=4,scored=out0.is.s.u-tokyo.ac.jp /opt/score/demo/bin/mandel ★ 異常 SCore-D 5.0.1 connected (jid=1). <0> SCORE-D:ERROR pmGetSendBuffer(dest=2,size=1388) timed out <0> ULT: Exception Signal (11) <0> SCORE: Program signaled (SIGTERM). (scored を停止) % mpirun -np 4 /opt/score/demo/bin/pmandel ★ 異常 FEP: Unable to connect with SCore-D (out0.is.s.u-tokyo.ac.jp) SCore-D 5.0.1 connected. <0> SCORE-D:ERROR pmGetSendBuffer(dest=2,size=1388) timed out <0> ULT: Exception Signal (11) <0> SCORE: Program signaled (SIGTERM). (scored を起動) % scrun -nodes=4,scored=out0.is.s.u-tokyo.ac.jp /opt/score/demo/bin/pmandel ★ 異常 SCore-D 5.0.1 connected (jid=1). <0> SCORE-D:ERROR pmGetSendBuffer(dest=2,size=1388) timed out <0> ULT: Exception Signal (11) <0> SCORE: Program signaled (SIGTERM). From: NISHIDA Akira Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Wed, 05 Jun 2002 15:24:25 +0900 (JST) > 西田です. > > > 住元です。 > > > > Broadcom 5700 (3Com 966 etc..) については動作確認済です。 > > > > ただし、性能を出すためには、デバイスドライバを新しくする必要がある > > (SCore 5.0.1に入っているのはちょっと古いです)のと、デバイスドライバの > > パラメータのチューニングを行う必要があります。ちゃんとチューニングする > > と結構良い性能だったりします。 > > 今日、新しいドライバと初期値を変更したものをコンソーシアムのマシンの > > CVSにCommitしましたので、必要なら石川研究室に行ってもらってください。 > > あと、PM/EthernetでのSCASHですが、SCASHが使っているPM/EthernetのpmRead > > の実装は最適化の余地があり、現在作業中です。 > > ですので、とりあえず現状ので試して頂いて、是非,結果を測定して教えて頂 > > ければと思います。 > > 今新しいファイルを送って頂きました. ありがとうございました. > さっそく評価してみます. > > -- > 西田 晃 > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > E-mail : nishida @ is.s.u-tokyo.ac.jp > > > From: NISHIDA Akira > > Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards > > Date: Tue, 04 Jun 2002 21:45:03 +0900 (JST) > > Message-ID: <20020604.214503.46616373.nishida @ is.s.u-tokyo.ac.jp> > > > > nishida> 西田です. > > nishida> > > nishida> > 既に、カードをお持ちならば、まず、性能を計測して頂いて、その結果を > > nishida> > メーリングリストに流して頂けると幸いです。 > > nishida> > > nishida> 特定のアーキテクチャには依存していないということですね. 了解しました. > > nishida> 評価してみます. > > nishida> > > nishida> -- > > nishida> 西田 晃 > > nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > > nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp > > nishida> > > nishida> > > > nishida> > 石川@隣の部屋から:-) > > nishida> > > > nishida> > From: NISHIDA Akira > > nishida> > > 既存のクラスタ上に搭載されている NIC は 3C920 と Broadcom GbE > > nishida> > > の2種類なのですが, ギガビットイーサネット上で十分な通信性能を得る > > nishida> > > には, Syskonnect 製のカードなどを別途調達する必要があるのでしょうか. > > nishida> > > それとも一般の GbE カードで構わないのでしょうか. > > nishida> > > > nishida> > > > nishida> > > > > nishida> > > 以上, お手数ですがご教示いただければ幸いです. > > nishida> > > > > nishida> > > -- > > nishida> > > 西田 晃 > > nishida> > > 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > > nishida> > > E-mail : nishida @ is.s.u-tokyo.ac.jp > > nishida> > > > > nishida> > > > > nishida> > > > > nishida> > > _______________________________________________ > > nishida> > > score-info-jp mailing list > > nishida> > > score-info-jp @ pccluster.org > > nishida> > > http://www.pccluster.org/mailman/listinfo/score-info-jp > > nishida> > > > nishida> _______________________________________________ > > nishida> SCore-users-jp mailing list > > nishida> SCore-users-jp @ pccluster.org > > nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp > > nishida> > > ------ > > Shinji Sumimoto, Fujitsu Labs > > > From nishida @ is.s.u-tokyo.ac.jp Thu Jun 6 19:08:58 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Thu, 06 Jun 2002 19:08:58 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020606.170206.45273046.nishida@is.s.u-tokyo.ac.jp> References: <20020605.144746.576041248.s-sumi@flab.fujitsu.co.jp> <20020605.152425.68541537.nishida@is.s.u-tokyo.ac.jp> <20020606.170206.45273046.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020606.190858.125106802.nishida@is.s.u-tokyo.ac.jp> 西田です. ドライバの設定をチューニング前の状態に戻して頂いたものを使って カーネルを作り直しました. > * PM テスト結果 > % sceptic -v -g pcc ★ 異常 > out0.is.s.u-tokyo.ac.jp: scping FAILED > out1.is.s.u-tokyo.ac.jp: scping FAILED > out2.is.s.u-tokyo.ac.jp: OK > out3.is.s.u-tokyo.ac.jp: scping FAILED > out1.is.s.u-tokyo.ac.jp: OK > out0.is.s.u-tokyo.ac.jp: OK > out3.is.s.u-tokyo.ac.jp: OK > All host responding. % sceptic -v -g pcc out2.is.s.u-tokyo.ac.jp: OK out1.is.s.u-tokyo.ac.jp: OK out0.is.s.u-tokyo.ac.jp: OK out3.is.s.u-tokyo.ac.jp: OK All host responding. ただ, 以下のテストについてはまだ解決していません. shmem の設定に問題があるのでしょうか. > * PM/Ethernet テスト結果 > > % ./scstest -network ethernet ★ 異常 > SCSTEST: BURST on ethernet(chan=0,ctx=0,len=16) > out0( 0) burst: pmGetSendBuffer: Connection timed out(110) > out2( 2) burst: pmGetSendBuffer: Connection timed out(110) > out3( 3) burst: pmGetSendBuffer: Connection timed out(110) > out1( 1) burst: pmGetSendBuffer: Connection timed out(110) % cd /opt/score/deploy % ./scstest -network ethernet SCSTEST: BURST on ethernet(chan=0,ctx=0,len=16) out0( 0) burst: pmGetSendBuffer: Connection timed out(110) out2( 2) burst: pmGetSendBuffer: Connection timed out(110) out3( 3) burst: pmGetSendBuffer: Connection timed out(110) out1( 1) burst: pmGetSendBuffer: Connection timed out(110) > * PM/Shmem テスト結果 > > % ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -ping > 8 1.21174e-06 % ./rpmtest out0 shmem0 -dest 1 -ping [0] : chan=0, nprocs=2, sendbuf=(nil) [0]0: disable=0, kickflag=0, rbuf->sendbuf=(nil), lock_recv=unlocked [0]0: off_w=0, noff_w=0, off_c=0, off_r=0, noff_r=0 [0]1: disable=0, kickflag=0, rbuf->sendbuf=(nil), lock_recv=locked [0]1: off_w=32, noff_w=32, off_c=57145, off_r=0, noff_r=0 pmReceive: Connection timed out(110) > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -reply) > % ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread ★ 異常 > 8 3.0676e+06 % ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread [0] : chan=0, nprocs=2, sendbuf=(nil) [0]0: disable=0, kickflag=0, rbuf->sendbuf=(nil), lock_recv=unlocked [0]0: off_w=32, noff_w=32, off_c=57145, off_r=32, noff_r=32 [0]1: disable=0, kickflag=0, rbuf->sendbuf=(nil), lock_recv=unlocked [0]1: off_w=32, noff_w=32, off_c=57145, off_r=32, noff_r=32 pmRead: Input/output error(5) From kameyama @ pccluster.org Thu Jun 6 19:19:41 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 06 Jun 2002 19:19:41 +0900 Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: Your message of "Thu, 06 Jun 2002 19:08:58 JST." <20020606.190858.125106802.nishida@is.s.u-tokyo.ac.jp> Message-ID: <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. scstest はともかく... In article <20020606.190858.125106802.nishida @ is.s.u-tokyo.ac.jp> NISHIDA Akira wrotes: > > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -reply) (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) -reply ではなく, -vreply を使用してください. from Kameyama Toyohisa From nishida @ is.s.u-tokyo.ac.jp Thu Jun 6 20:10:38 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Thu, 06 Jun 2002 20:10:38 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> References: <20020606.190858.125106802.nishida@is.s.u-tokyo.ac.jp> <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> 西田です. > 亀山です. > scstest はともかく... > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) > -reply ではなく, -vreply を使用してください. 失礼しました. 正しい実行結果を添付します. % ./rpmtest out0 shmem1 -vreply & [1] 8728 nishida @ out0.is.s.u-tokyo.ac.jp{!72 /opt/score/sbin}% ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread 8 3.03308e+06 住元さんにも直接見て頂きましたが, 結局ハードウェアの設定を変えて 切り分けを行うことになりました. 詳細が分かり次第改めて報告させて いただきます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp From s-sumi @ bd6.so-net.ne.jp Sun Jun 9 23:34:35 2002 From: s-sumi @ bd6.so-net.ne.jp (Shinji Sumimoto) Date: Sun, 09 Jun 2002 23:34:35 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> References: <20020606.190858.125106802.nishida@is.s.u-tokyo.ac.jp> <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> 西田様 住元です。 利用されているマシン環境がXeonのデュアルプロセッサと言うことですので、 シングルプロセッサで動作するかを確認して頂けないでしょうか? From: NISHIDA Akira Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Thu, 06 Jun 2002 20:10:38 +0900 (JST) Message-ID: <20020606.201038.68541076.nishida @ is.s.u-tokyo.ac.jp> nishida> 西田です. nishida> nishida> > 亀山です. nishida> > scstest はともかく... nishida> > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) nishida> > -reply ではなく, -vreply を使用してください. nishida> nishida> 失礼しました. 正しい実行結果を添付します. nishida> nishida> % ./rpmtest out0 shmem1 -vreply & nishida> [1] 8728 nishida> nishida @ out0.is.s.u-tokyo.ac.jp{!72 /opt/score/sbin}% ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread nishida> 8 3.03308e+06 nishida> nishida> 住元さんにも直接見て頂きましたが, 結局ハードウェアの設定を変えて nishida> 切り分けを行うことになりました. 詳細が分かり次第改めて報告させて nishida> いただきます. nishida> nishida> -- nishida> 西田 晃 nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp nishida> nishida> nishida> nishida> nishida> nishida> nishida> nishida> _______________________________________________ nishida> SCore-users-jp mailing list nishida> SCore-users-jp @ pccluster.org nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp nishida> ----- 住元 真司 E-Mail: s-sumi @ bd6.so-net.ne.jp From nishida @ is.s.u-tokyo.ac.jp Mon Jun 10 10:32:50 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Mon, 10 Jun 2002 10:32:50 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> References: <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> Message-ID: <20020610.103250.68538017.nishida@is.s.u-tokyo.ac.jp> 住元様 > 西田様 > > 住元です。 > > 利用されているマシン環境がXeonのデュアルプロセッサと言うことですので、 > シングルプロセッサで動作するかを確認して頂けないでしょうか? 了解しました. 試してみます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp > From: NISHIDA Akira > Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards > Date: Thu, 06 Jun 2002 20:10:38 +0900 (JST) > Message-ID: <20020606.201038.68541076.nishida @ is.s.u-tokyo.ac.jp> > > nishida> 西田です. > nishida> > nishida> > 亀山です. > nishida> > scstest はともかく... > nishida> > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) > nishida> > -reply ではなく, -vreply を使用してください. > nishida> > nishida> 失礼しました. 正しい実行結果を添付します. > nishida> > nishida> % ./rpmtest out0 shmem1 -vreply & > nishida> [1] 8728 > nishida> nishida @ out0.is.s.u-tokyo.ac.jp{!72 /opt/score/sbin}% ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread > nishida> 8 3.03308e+06 > nishida> > nishida> 住元さんにも直接見て頂きましたが, 結局ハードウェアの設定を変えて > nishida> 切り分けを行うことになりました. 詳細が分かり次第改めて報告させて > nishida> いただきます. > nishida> > nishida> -- > nishida> 西田 晃 > nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> _______________________________________________ > nishida> SCore-users-jp mailing list > nishida> SCore-users-jp @ pccluster.org > nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp > nishida> > ----- > 住元 真司 E-Mail: s-sumi @ bd6.so-net.ne.jp > _______________________________________________ > SCore-users-jp mailing list > SCore-users-jp @ pccluster.org > http://www.pccluster.org/mailman/listinfo/score-users-jp > From nishida @ is.s.u-tokyo.ac.jp Mon Jun 10 16:14:42 2002 From: nishida @ is.s.u-tokyo.ac.jp (NISHIDA Akira) Date: Mon, 10 Jun 2002 16:14:42 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> References: <200206061019.g56AJfv31361@yl-dhcp18.is.s.u-tokyo.ac.jp> <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> Message-ID: <20020610.161442.63240949.nishida@is.s.u-tokyo.ac.jp> 西田です. 標記の件についてですが, 確認した限りでは以下のような状況でした. 1. SMP カーネル -> single : 改善せず. 2. GbE スイッチ -> クロスケーブルで2ノードを接続 : 改善せず. 3. GbE -> 100BASE-T : デモまで正常に動作. Broadcom のドライバに問題があるように思われます. -- 西田 晃 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 E-mail : nishida @ is.s.u-tokyo.ac.jp From: Shinji Sumimoto Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Sun, 09 Jun 2002 23:34:35 +0900 (JST) > 西田様 > > 住元です。 > > 利用されているマシン環境がXeonのデュアルプロセッサと言うことですので、 > シングルプロセッサで動作するかを確認して頂けないでしょうか? > > From: NISHIDA Akira > Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards > Date: Thu, 06 Jun 2002 20:10:38 +0900 (JST) > Message-ID: <20020606.201038.68541076.nishida @ is.s.u-tokyo.ac.jp> > > nishida> 西田です. > nishida> > nishida> > 亀山です. > nishida> > scstest はともかく... > nishida> > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) > nishida> > -reply ではなく, -vreply を使用してください. > nishida> > nishida> 失礼しました. 正しい実行結果を添付します. > nishida> > nishida> % ./rpmtest out0 shmem1 -vreply & > nishida> [1] 8728 > nishida> nishida @ out0.is.s.u-tokyo.ac.jp{!72 /opt/score/sbin}% ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread > nishida> 8 3.03308e+06 > nishida> > nishida> 住元さんにも直接見て頂きましたが, 結局ハードウェアの設定を変えて > nishida> 切り分けを行うことになりました. 詳細が分かり次第改めて報告させて > nishida> いただきます. > nishida> > nishida> -- > nishida> 西田 晃 > nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 > nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> > nishida> _______________________________________________ > nishida> SCore-users-jp mailing list > nishida> SCore-users-jp @ pccluster.org > nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp > nishida> > ----- > 住元 真司 E-Mail: s-sumi @ bd6.so-net.ne.jp > From s-sumi @ flab.fujitsu.co.jp Mon Jun 10 16:53:17 2002 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Mon, 10 Jun 2002 16:53:17 +0900 (JST) Subject: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards In-Reply-To: <20020610.161442.63240949.nishida@is.s.u-tokyo.ac.jp> References: <20020606.201038.68541076.nishida@is.s.u-tokyo.ac.jp> <20020609.233435.730584917.s-sumi@bd6.so-net.ne.jp> <20020610.161442.63240949.nishida@is.s.u-tokyo.ac.jp> Message-ID: <20020610.165317.730583203.s-sumi@flab.fujitsu.co.jp> 西田様 住元です。 From: NISHIDA Akira Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards Date: Mon, 10 Jun 2002 16:14:42 +0900 (JST) Message-ID: <20020610.161442.63240949.nishida @ is.s.u-tokyo.ac.jp> nishida> 西田です. nishida> nishida> 標記の件についてですが, 確認した限りでは以下のような状況でした. nishida> nishida> 1. SMP カーネル -> single : 改善せず. nishida> 2. GbE スイッチ -> クロスケーブルで2ノードを接続 : 改善せず. nishida> 3. GbE -> 100BASE-T : デモまで正常に動作. nishida> nishida> Broadcom のドライバに問題があるように思われます. 情報どうもありがとうございます。Broadcomハードウェア、あるいはデバイス ドライバの問題のようですね、、今週、東大に行くついでがありますので、ま た試させてください。 nishida> nishida> -- nishida> 西田 晃 nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp nishida> nishida> nishida> From: Shinji Sumimoto nishida> Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards nishida> Date: Sun, 09 Jun 2002 23:34:35 +0900 (JST) nishida> nishida> > 西田様 nishida> > nishida> > 住元です。 nishida> > nishida> > 利用されているマシン環境がXeonのデュアルプロセッサと言うことですので、 nishida> > シングルプロセッサで動作するかを確認して頂けないでしょうか? nishida> > nishida> > From: NISHIDA Akira nishida> > Subject: Re: [SCore-users-jp] Re: [score-info-jp] SCore support for GbE cards nishida> > Date: Thu, 06 Jun 2002 20:10:38 +0900 (JST) nishida> > Message-ID: <20020606.201038.68541076.nishida @ is.s.u-tokyo.ac.jp> nishida> > nishida> > nishida> 西田です. nishida> > nishida> nishida> > nishida> > 亀山です. nishida> > nishida> > scstest はともかく... nishida> > nishida> > (%./rpmtest out0.is.s.u-tokyo.ac.jp shmem1 -vreply) nishida> > nishida> > -reply ではなく, -vreply を使用してください. nishida> > nishida> nishida> > nishida> 失礼しました. 正しい実行結果を添付します. nishida> > nishida> nishida> > nishida> % ./rpmtest out0 shmem1 -vreply & nishida> > nishida> [1] 8728 nishida> > nishida> nishida @ out0.is.s.u-tokyo.ac.jp{!72 /opt/score/sbin}% ./rpmtest out0.is.s.u-tokyo.ac.jp shmem0 -dest 1 -vread nishida> > nishida> 8 3.03308e+06 nishida> > nishida> nishida> > nishida> 住元さんにも直接見て頂きましたが, 結局ハードウェアの設定を変えて nishida> > nishida> 切り分けを行うことになりました. 詳細が分かり次第改めて報告させて nishida> > nishida> いただきます. nishida> > nishida> nishida> > nishida> -- nishida> > nishida> 西田 晃 nishida> > nishida> 東京大学 大学院情報理工学系研究科コンピュータ科学専攻 nishida> > nishida> E-mail : nishida @ is.s.u-tokyo.ac.jp nishida> > nishida> nishida> > nishida> nishida> > nishida> nishida> > nishida> nishida> > nishida> nishida> > nishida> nishida> > nishida> nishida> > nishida> _______________________________________________ nishida> > nishida> SCore-users-jp mailing list nishida> > nishida> SCore-users-jp @ pccluster.org nishida> > nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp nishida> > nishida> nishida> > ----- nishida> > 住元 真司 E-Mail: s-sumi @ bd6.so-net.ne.jp nishida> > nishida> _______________________________________________ nishida> SCore-users-jp mailing list nishida> SCore-users-jp @ pccluster.org nishida> http://www.pccluster.org/mailman/listinfo/score-users-jp nishida> nishida> ------ Shinji Sumimoto, Fujitsu Labs From Yamamoto.Takaya @ wrc.melco.co.jp Tue Jun 11 18:50:24 2002 From: Yamamoto.Takaya @ wrc.melco.co.jp (Takaya Yamamoto) Date: Tue, 11 Jun 2002 18:50:24 +0900 Subject: [SCore-users-jp] PC=?ISO-2022-JP?B?GyRCJS8laSU5JT8kTiVQJUMlLyUiJUMlVxsoQlBD?= =?ISO-2022-JP?B?GyRCJEskRCQkJEYbKEI=?= Message-ID: <5.0.2.5.2.20020611182142.00bbd8f0@133.141.16.40> 三菱電機 山本です。 SCoreの機能には、稼動中のPCのうち1台が故障で停止しても、 登録してある予備のPCがその代わりに動く機能があったと思います。 そこで質問ですが、 ・PCが1台故障で停止した場合でも、実行中のプログラムは動きつづけるのでしょうか? ・故障停止した後予備のPCが動作開始するまで、どのぐらい時間がかかるのでしょうか? ・実行中のプログラムを止めることなく、停止したPCの接続をはずして、  そのはずした部分に新たに別の予備のPCを接続することは可能でしょうか? よろしくお願いします。 以上 From kameyama @ pccluster.org Tue Jun 11 19:30:40 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Tue, 11 Jun 2002 19:30:40 +0900 Subject: [SCore-users-jp] =?ISO-2022-JP?B?UEMbJEIlLyVpJTklPyROGyhC?= =?ISO-2022-JP?B?GyRCJVAlQyUvJSIlQyVXGyhCUEMbJEIkSyREJCQkRhsoQg==?= In-Reply-To: Your message of "Tue, 11 Jun 2002 18:50:24 JST." <5.0.2.5.2.20020611182142.00bbd8f0@133.141.16.40> Message-ID: <200206111030.g5BAUev06863@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. In article <5.0.2.5.2.20020611182142.00bbd8f0 @ 133.141.16.40> Takaya Yamamoto wrotes: > そこで質問ですが、 > ・PCが1台故障で停止した場合でも、実行中のプログラムは動きつづけるのでしょうか > ? これは SCore-D の multi user mode でのみ可能です. (予備の PC はすでに動いていなければなりませんけど...) scored を再起動して, checkpoint をとったところから restart します. 詳しくは http://www.pccluster.org/score/dist/score/html/ja/reference/scored/auto.html を参照してください. 実行中のプログラムがどこから再開するかは checkpoint の間隔に依存します. > ・故障停止した後予備のPCが動作開始するまで、どのぐらい時間がかかるのでしょう > か? これは sc_watch の監視タイマの時間間隔に依存します. > ・実行中のプログラムを止めることなく、停止したPCの接続をはずして、 >  そのはずした部分に新たに別の予備のPCを接続することは可能でしょうか? これは上記の質問に関連したものでしょうか? それとも独立 (予備マシンを登録していない場合, 新しい PC を接続して その PC で再開する) でしょうか? 前者でしたら, その PC は使用されていないので接続することに問題は 無いと思います. その PC を実際に使用するのは別のタイミングになりますが... 後者も sc_watch のやることを手動で行えば良いのでなんとかなると思います. どちらにしろ PM/Ethernet を使用する場合は ethernet の config file を修正する必要がありそうですが... from Kameyama Toyohisa From liu @ mpcnet.co.jp Tue Jun 11 20:07:04 2002 From: liu @ mpcnet.co.jp (=?iso-2022-jp?B?GyRCTi0zWD82GyhCXChMSVUgWFVFWkhFTlwp?=) Date: Tue, 11 Jun 2002 20:07:04 +0900 Subject: [SCore-users-jp] =?iso-2022-jp?B?UmU6IFtTQ29yZS11c2Vycy1qcF0gUEMbJEIlLyVpJTklPyROGyhC?= =?iso-2022-jp?B?GyRCJVAlQyUvJSIlQyVXGyhCUEMbJEIkSyREJCQkRhsoQg==?= References: <200206111030.g5BAUev06863@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <200206111508.g5BF8cS04254@pccluster.org> 三菱プレシジョンの劉と申します. 古いSCoreのインストールについて質問をさせていただきます. 諸事情があって,現在,RedHatLinux6.2を使って,SCore3.3.2を Dual-Athronクラスタにインストールしようとしています. PCのCPUとマザーボードは以下のものを使っています.   CPU:AMD Athron MP 2000+ ×2   M/B:Tyan TigerMPX (AMD 760MPX chipset) Linuxのインストールは普通に終わって,再起動すると PCI_IDE:unknown IDE controller on PCI bus 00 device 39, VID=1022, DID=7441 PCI_IDE:not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio のところで止まってしまいます. しかし,インストール時作成した起動フロッピーディスクから起動すると,普通に起動できます. SCoreよりはLinuxのインストールの問題ですが,解決方法を教えて頂きたいです. どうぞ宜しくお願い致します. From kameyama @ pccluster.org Wed Jun 12 10:02:08 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Wed, 12 Jun 2002 10:02:08 +0900 Subject: [SCore-users-jp] Re: [SCore-users-jp] =?ISO-2022-JP?B?UEM=?= =?ISO-2022-JP?B?GyRCJS8laSU5JT8kTiVQJUMlLyUiJUMlVxsoQlBD?= =?ISO-2022-JP?B?GyRCJEskRCQkJEYbKEI=?= In-Reply-To: Your message of "Tue, 11 Jun 2002 20:07:04 JST." <200206111508.g5BF8cS04254@pccluster.org> Message-ID: <200206120102.g5C128v10619@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. In article <200206111508.g5BF8cS04254 @ pccluster.org> 劉学振\(LIU XUEZHEN\) wrotes: > 諸事情があって,現在,RedHatLinux6.2を使って,SCore3.3.2を > Dual-Athronクラスタにインストールしようとしています. ということは, kernel は 2.2.16 base ですね. 最近の Athlon (しかも dual) でうごかなくても不思議は無いような... redhat 6.2 の hardware compatiblity list http://www.redhat.com/support/hardware/intel/62/rh6.2-hcl-i.ld-2.html には AMD の最新のマザーボードは動かない可能性があるようなことが 書かれていますし... > しかし,インストール時作成した起動フロッピーディスクから起動すると,普通に起 > 動できます. ということは, 動く可能性があるとしたら以下ぐらいでしょうか? 1. uni processor カーネルもしくは smp カーネルで nosmp オプションをつける. (つまり CPU は 1 つしか使用できない) 2. SMP kernel で noapic オプションをつける. from Kameyama Toyohisa From nrcb @ streamline-computing.com Thu Jun 13 16:22:22 2002 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Thu, 13 Jun 2002 08:22:22 +0100 Subject: [SCore-users-jp] [SCore-users] GLOBAL ARRAYS Message-ID: <200206130722.g5D7MMp02150@zeralda.streamline.com> Is there anyone using the Global arrays toolkit under SCore ? http://www.emsl.pnl.gov:2080/docs/global/distribution.html It compiles fine using export TARGET=LINUX; make CC='mpicc' FC='mpif77' Trying to run the armci test.x code: chestnut:~/GLOBAL-ARRAYS/g/armci/src$ scrun -nodes=1x2 ./test.x SCore-D 5.0.1 connected (jid=5). <0:0> SCORE: 2 nodes (1x2) ready. ARMCI test program (2 processes) 0:Child process terminated prematurely, status=: 0 Last System Error Message from Task 0:: No child processes 0:Child process terminated prematurely, status=: 0 SCORE: Program killed by operator. I presume GA uses it's own communication and not PM. If anyone got this working with PM, please let me know. Thanks, Nick -------------------------------------------------- Dr Nick Birkett Technical Director Streamline Computing Ltd Unit 19, Barclays Venture Centre, Sir William Lyons Road, Coventry CV4 7EZ Fax : +44 (0)2476 323378 Mobile: +44 (0)7890 246662 Email : nrcb @ streamline-computing.com Web : http://www.streamline-computing.com _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Thu Jun 13 18:47:45 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 13 Jun 2002 18:47:45 +0900 Subject: [SCore-users-jp] [SCore-users] GLOBAL ARRAYS In-Reply-To: Your message of "Thu, 13 Jun 2002 08:22:22 JST." <200206130722.g5D7MMp02150@zeralda.streamline.com> Message-ID: <200206130947.g5D9ljv18938@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <200206130722.g5D7MMp02150 @ zeralda.streamline.com> Nick Birkett wrotes: > It compiles fine using export TARGET=LINUX; make CC='mpicc' FC='mpif77' > > > Trying to run the armci test.x code: > > chestnut:~/GLOBAL-ARRAYS/g/armci/src$ scrun -nodes=1x2 ./test.x > SCore-D 5.0.1 connected (jid=5). > <0:0> SCORE: 2 nodes (1x2) ready. > ARMCI test program (2 processes) > 0:Child process terminated prematurely, status=: 0 > Last System Error Message from Task 0:: No child processes > 0:Child process terminated prematurely, status=: 0 > SCORE: Program killed by operator. > > > I presume GA uses it's own communication and not PM. > > If anyone got this working with PM, please let me know. I download to version 3.2 beta, little fix them and run test.x is successful to nodes=1x2 and nodes=2x1. I modified as following: 1. change CC, FC, _CC, _FC to mpicc and mpif77. In config/makefile.h _CC is set to gcc. But some file compile to use _CC. 2. In config/makefile.h, remove _CC, _FC and Intel compiler option setting. I set _CC to mpicc, config/Makefile set to compiler option for Intel compiler. 3. In config/makefile.h, F2C_TWO_UNDERSCORES and armci/src/acc.h change to double underline. Default mpi compiler on SCore use doule underline. 4. In config/makefile.h and GNUmakefile on top directory, set USE_MPI to 1. If USE_MPI is not set, test.x run only localhost. 5. in config/makefile.h, remove LIBMPI setting. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Thu Jun 13 19:13:46 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 13 Jun 2002 19:13:46 +0900 Subject: [SCore-users-jp] [SCore-users] GLOBAL ARRAYS In-Reply-To: Your message of "Thu, 13 Jun 2002 18:47:45 JST." <200206130947.g5D9ljv18938@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <200206131013.g5DADkv19066@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <200206130947.g5D9ljv18938 @ yl-dhcp18.is.s.u-tokyo.ac.jp> kameyama @ pccluster.org wrotes: > In article <200206130722.g5D7MMp02150 @ zeralda.streamline.com> Nick Birkett I modified as following: > 1. change CC, FC, _CC, _FC to mpicc and mpif77. ... Here is patch from GLOBAL ARRAYS 3.2 beta. This is quick hack and I test only test.x with nodes are 1x2 and 2x1. Note that if you want to compile, please exec to following command: export TARGET=LINUX; make CC='/opt/score/bin/mpicc' FC='/opt/score/bin/mpif77' _CC=/opt/score/bin/mpicc _FC=/opt/score/bin/mpif77 from Kameyama Toyohisa ---------------------------------------cut here--------------------------------- diff -ur g/GNUmakefile g.success/GNUmakefile --- g/GNUmakefile Thu Mar 21 07:46:49 2002 +++ g.success/GNUmakefile Thu Jun 13 17:45:19 2002 @@ -15,6 +15,7 @@ MAKESUBDIRS = for dir in $(SUBDIRS); do $(MAKE) -C $$dir $@ || exit 1 ; done SUBDIRS = ma global tcgmsg-mpi LinAlg/lapack+blas server tcgmsg armci/src pario USE_ARMCI = yes + USE_MPI = yes ifdef USE_MPI MP_VER = MPI Only in g.success: LINUX-MPI.stamp Only in g.success/LinAlg/lapack+blas: LINUX.stamp Only in g.success/armci: lib Only in g.success/armci/src: LINUX.MPI.stamp diff -ur g/armci/src/acc.h g.success/armci/src/acc.h --- g/armci/src/acc.h Fri Mar 8 02:34:29 2002 +++ g.success/armci/src/acc.h Thu Jun 13 18:09:58 2002 @@ -20,6 +20,12 @@ # define C_ACCUMULATE_2D c_accumulate_2d_u_ # define Z_ACCUMULATE_2D z_accumulate_2d_u_ # define F_ACCUMULATE_2D f_accumulate_2d_u_ +#elif defined(LINUX) +# define I_ACCUMULATE_2D i_accumulate_2d__ +# define D_ACCUMULATE_2D d_accumulate_2d__ +# define C_ACCUMULATE_2D c_accumulate_2d__ +# define Z_ACCUMULATE_2D z_accumulate_2d__ +# define F_ACCUMULATE_2D f_accumulate_2d__ #elif !defined(CRAY) && !defined(WIN32) && !defined(HITACHI) # define I_ACCUMULATE_2D i_accumulate_2d_ # define D_ACCUMULATE_2D d_accumulate_2d_ diff -ur g/config/makefile.h g.success/config/makefile.h --- g/config/makefile.h Wed Feb 27 03:48:38 2002 +++ g.success/config/makefile.h Thu Jun 13 18:15:51 2002 @@ -54,7 +54,8 @@ # to enable two underscores in fortran names, please define environment variable # F2C_TWO_UNDERSCORES or uncomment the following line -#F2C_TWO_UNDERSCORES=1 +F2C_TWO_UNDERSCORES=1 +USE_MPI = 1 # #........................ SUN and Fujitsu Sparc/solaris ........................ # @@ -170,19 +171,15 @@ # IBM PC running Linux # ifeq ($(TARGET),LINUX) - CC = gcc - FC = g77 + CC = /opt/score/bin/mpicc + FC = /opt/score/bin/mpif77 CPP = gcc -E -nostdinc -undef -P RANLIB = ranlib _CPU = $(shell uname -m |\ awk ' /sparc/ { print "sparc" }; /i*86/ { print "x86" } ' ) -ifneq (,$(findstring mpif,$(_FC))) - _FC = $(shell $(FC) -v 2>&1 | awk ' /g77 version/ { print "g77"; exit }; /pgf/ { print "pgf77" ; exit } ' ) -endif -ifneq (,$(findstring mpicc,$(_CC))) - _CC = $(shell $(CC) -v 2>&1 | awk ' /gcc version/ { print "gcc" ; exit } ' ) -endif + _FC = /opt/score/bin/mpif77 + _CC = /opt/score/bin/mpicc # # GNU compilers ifeq ($(_CPU),x86) @@ -199,28 +196,8 @@ endif # # g77 -ifeq ($(_FC),g77) - ifeq ($(FOPT),-O) FOPT = -O2 FOPT_REN += -funroll-loops -fomit-frame-pointer $(OPT_ALIGN) - endif -else -# -# PGI fortran compiler on intel - ifneq (,$(findstring pgf,$(_FC))) - CMAIN = -Dmain=MAIN_ - FOPT_REN = -Mdalign -Minform,warn -Mnolist -Minfo=loop -Munixlogical - GLOB_DEFINES += -DPGLINUX - endif - ifneq (,$(findstring ifc,$(_FC))) - FOPT_REN = -O3 -prefetch - GLOB_DEFINES += -DIFCLINUX - endif - ifneq (,$(findstring icc,$(_CC))) - FOPT_REN = -O3 -prefetch - GLOB_DEFINES += -DIFCLINUX - endif -endif endif # @@ -688,7 +665,7 @@ ifdef USE_MPI ifndef LIBMPI - LIBMPI = -lmpi + LIBMPI = endif ifdef MPI_LIB LIBS += -L$(MPI_LIB) ---------------------------------------cut here--------------------------------- _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Fri Jun 14 09:07:40 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 14 Jun 2002 09:07:40 +0900 Subject: [SCore-users-jp] [SCore-users] GLOBAL ARRAYS In-Reply-To: Your message of "Thu, 13 Jun 2002 17:31:13 JST." <200206131631.g5DGVDe23428@zeralda.streamline.com> Message-ID: <200206140007.g5E07ev22589@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <200206131631.g5DGVDe23428 @ zeralda.streamline.com> Nick Birkett wrotes: > I applied your patch: > > tar zxvf global3-2B.tgz > patch -p0 < score.patch > cd g > export TARGET=LINUX > make CC='/opt/score/bin/mpicc' FC='/opt/score/bin/mpif77' _CC=/opt/score/bin/ > mpicc _FC=/opt/score/bin/mpif77 > > But I still have an underscore error: > > test.o: In function `MAIN__': > test.o(.text+0x6): undefined reference to `pbeginf_' > test.o(.text+0x3d7): undefined reference to `pend_' > collect2: ld returned 1 exit status > make[2]: *** [test.x] Error 1 > rm util.o ffflush.o test.o > make[2]: Leaving directory `/users/nrcb/GLOBAL-ARRAYS/g/global/testing' > make[1]: *** [test.x] Error 1 > make[1]: Leaving directory `/users/nrcb/GLOBAL-ARRAYS/g/global' > make: *** [test] Error 2 > zeralda:~/GLOBAL-ARRAYS/g$ > > I think pbeginf is in the tcgmsg-mpi directory. > > The test program wants to find pbeginf_ but in the library it is pbeginf__ > > I will try to find out why this is. Sorry, Please apply patch in this mail. test.F call pbeginf and pend if MPI is not define. But it shuld call mpi_initialize and mpi_finalize instead. from Kameyama Toyohisa ---------------------------------------cut heere-------------------------------- diff -ru g/global/src/global.fh g.success/global/src/global.fh --- g/global/src/global.fh Fri Nov 30 07:43:20 2001 +++ g.success/global/src/global.fh Fri Jun 14 08:53:03 2002 @@ -52,3 +52,4 @@ external nga_create_ghosts_irreg,nga_create_ghosts external nga_ddot_patch, nga_zdot_patch, nga_idot_patch external ga_sdot, ga_sdot_patch, nga_sdot_patch +#define MPI 1 diff -ru g/global/src/global.h g.success/global/src/global.h --- g/global/src/global.h Wed Jan 30 07:27:14 2002 +++ g.success/global/src/global.h Fri Jun 14 09:02:35 2002 @@ -4,6 +4,7 @@ #define GLOBAL_H #include +#define MPI 1 #include "typesf2c.h" ---------------------------------------cut heere-------------------------------- _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From master.of.brainless.things @ gmx.net Fri Jun 14 21:21:26 2002 From: master.of.brainless.things @ gmx.net (=?iso-2022-jp?b?bWFzdGVyLm9mLmJyYWlubGVzcy50aGluZ3MgGyRCIXcbKEIgZ214Lm5l?= =?iso-2022-jp?b?dA==?=) Date: Fri, 14 Jun 2002 14:21:26 +0200 Subject: [SCore-users-jp] [SCore-users] SCore-installation Message-ID: <001e01c2139e$03a1efa0$6400a8c0@leqoq> We've just installed SCcore 5.0.1 on our cluster: 8 dual Pentium4, ethernet (100mbit) and myrinet2k linked, 1 frontend-machine (RedHat7.2), only ethernet linked with the nodes via switch. We've installed SCore on the frontend and installed 4 nodes via the bootdisk. That works fine, although the SCore-manual didn't tell anything about, that it would partiton the disk and install RedHat itself. (A hint for the SCore team: eit don't know hd's with more than 20Gb, so we had to manipulate the corresponding script). 1. problem: First got some problemswith myrineth, but editing the pm-myrineth.conf solves them. Then we ran the myrineth test-routines till the stress test, and the troubleshooting didn't solve the problem.We haven't an idea, why!? 2. problem: We tried to run the ./hello example program (i think from MPI) and got following error message: "...SCore-D: Error no full coverage network" 4 times, because running on 4 (previous installed) nodes. Ethernet works, and myrinet2k should work. Has anyone an idea? Thanks, Alex Golks student, "cluster-assistent" HS-Niederrhein Germany _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Fri Jun 14 22:04:59 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 14 Jun 2002 22:04:59 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: Your message of "Fri, 14 Jun 2002 14:21:26 JST." <001e01c2139e$03a1efa0$6400a8c0@leqoq> Message-ID: <200206141304.g5ED4xv25831@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <001e01c2139e$03a1efa0$6400a8c0 @ leqoq> wrotes: > 2. problem: > We tried to run the ./hello example program (i think from MPI) and got > following error message: > "...SCore-D: Error no full coverage network" 4 times, because running on 4 > (previous installed) nodes. > Ethernet works, and myrinet2k should work. Please send us to scorehosts.db and command line log. I think the problem has scoreboard setting. SCore-D needs at least one same network in all hosts. # But rpmtest and scstest don't seach network attribute. # Those commands seach only PM configuration file. For example, scorehosts.db is following setting: comp0.pccluster.org group=pcc network=ethernet,myrinet2k comp1.pccluster.org group=pcc network=ethernet comp2.pccluster.org group=pcc network=myrinet2k comp3.pccluster.org group=pcc network=myrinet2k SCore-D cannot running to group pcc, because there is not all common network. But if pm-myrinet.conf is describe all hosts and it is correctly, scstest and pmtest are worked. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From nrcb @ streamline-computing.com Sat Jun 15 01:34:52 2002 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Fri, 14 Jun 2002 17:34:52 +0100 Subject: [SCore-users-jp] [SCore-users] suspending jobs Message-ID: <200206141634.g5EGYqN09259@zeralda.streamline.com> Hi - I would like to know if it is possible to suspend and resume parallel jobs which are NOT running in mutli-user mode (e.g via a batch scheduler). E.g when iin multi-user mode I can do: Z cpus= 4: Iteration = 220 8204282063356.164 cpus= 4: Iteration = 230 8190994724901.580 [1]+ Stopped scrun -nodes=1x2+2x1,monitor ./jacobi_mpi chestnut:~/benchmarks/mpi$ %1 scrun -nodes=1x2+2x1,monitor ./jacobi_mpi cpus= 4: Iteration = 240 8178002633925.273 cpus= 4: Iteration = 250 8165287360671.719 cpus= 4: Iteration = 260 8152832555266.307 cpus= 4: Iteration = 270 8140623336444.589 Can I send SIGSTOP/SIGCONT to a process to suspend / resume say a SCore PBS job or is this only possible in multi-user mode ? Thanks, Nick _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Sat Jun 15 10:24:05 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Sat, 15 Jun 2002 10:24:05 +0900 Subject: [SCore-users-jp] Re: [SCore-users] suspending jobs In-Reply-To: <200206141634.g5EGYqN09259@zeralda.streamline.com> References: <200206141634.g5EGYqN09259@zeralda.streamline.com> Message-ID: <3106981445.hori0000@mail.bestsystems.co.jp> Hi, Nick, How is everything going ? >Can I send SIGSTOP/SIGCONT to a process to suspend / resume say a >SCore PBS job or is this only possible in multi-user mode ? If you send a SIGTSTP (not SIGSTOP) or SIGCONT signal to the scrun process, then the parallel job will be suspended or resumed, regardless to the scheduling mode. # this is what I try to program and I believe it works :-) If you send SIGSTOP, however, then only the scrun process is suspended, because, in Linux (and Unix), there is no way to catch the SIGSTOP signal. So tha problem is how can you send SIGTSTP/SIGCONT in a batch job scheduler. ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Mon Jun 17 16:38:50 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Mon, 17 Jun 2002 16:38:50 +0900 Subject: [SCore-users-jp] [SCore-users] Re: Global arrays In-Reply-To: Your message of "Fri, 14 Jun 2002 15:30:19 JST." <200206141430.g5EEUJP09044@zeralda.streamline.com> Message-ID: <200206170738.g5H7cov19311@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <200206141430.g5EEUJP09044 @ zeralda.streamline.com> Nick Birkett wrotes: > I am now trying to get SCore Global arrays working with the Portland Compile > r > and the Intel Compiler. This is because our academic customers who use SCore > have these compilers. > > For PGI compiler I did this (without applying your patches): > > export TARGET=LINUX > make CC=mpicc _CC=mpicc FC='mpif77 -compiler pgi -fast' _FC='mpif77 > -compiler pgi -fast' USE_MPI=yes LIBMPI='-lm' > > (the -lm is just to reset the default from -lmpi and is not really needed). > > I am using SCore 5.0.0 and 5.0.1 and Portland 3.3-2. > > The above seems to work and I can run these GA version 3-2B tests (in > g/global/testing) > > test.x > testsolve.x > testeig.x > > These all work. > > However when I try the perf.x (performance test) it hangs at 128 byte: > > > SCore-D 5.0.0 connected. > <0:0> SCORE: 4 nodes (4x1) ready. > ARMCI configured for 4 cluster nodes > > Performance of GA get, put & acc for square sections of array[1024,1024] > > > Local 2-D Array Section > section get put accumulate > bytes dim sec MB/s sec MB/s sec MB/s > 8 1 .112D-05 .712D+01 .104D-05 .772D+01 .178D-05 .450D+01 > 72 3 .160D-05 .451D+02 .124D-05 .579D+02 .195D-05 .368D+02 > 128 4 .226D-05 .566D+02 .163D-05 .784D+02 .223D-05 .574D+02 > > Eventually I have to press C . It seems to deadlock between normal process and data server. On compute host, there are 3 processes for perf.x parent process is called wait4() in SIGCHILD signal handler. 1 chaild process is shadow process, tis is pause() loop. This process create by SCore, it is OK. 1 process called select(). I rewrite signal handler to remove call wait(), following message is apears: 0:Child process terminated prematurely, status=: 136085728 Probably, dataserver process (armci/src/dataserv.c) is dead, but I don't know that. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From Yamamoto.Takaya @ wrc.melco.co.jp Tue Jun 18 11:18:33 2002 From: Yamamoto.Takaya @ wrc.melco.co.jp (Takaya Yamamoto) Date: Tue, 18 Jun 2002 11:18:33 +0900 Subject: [SCore-users-jp] PC =?ISO-2022-JP?B?GyRCJS8laSU5JT8kTiVQJUMlLyUiJUMlVxsoQg==?=PC =?ISO-2022-JP?B?GyRCJEskRCQkJEYbKEI=?= In-Reply-To: <200206111030.g5BAUev06863@yl-dhcp18.is.s.u-tokyo.ac.jp> References: <"Your message of Tue, 11 Jun 2002 18:50:24 JST."<5.0.2.5.2.20020611182142.00bbd8f0@133.141.16.40> Message-ID: <5.0.2.5.2.20020618111710.00c19ac8@133.141.16.40> 亀山殿 山本です。 ご回答ありがとうございます。 今週・来週でプロトタイプを作ってみて、テストしてみます。 以上 At 19:30 02/06/11 +0900, you wrote: >亀山です. > >In article <5.0.2.5.2.20020611182142.00bbd8f0 @ 133.141.16.40> Takaya >Yamamoto wrotes: > > そこで質問ですが、 > > ・PCが1台故障で停止した場合でも、実行中のプログラムは動きつづけるのでしょうか > > ? > >これは SCore-D の multi user mode でのみ可能です. >(予備の PC はすでに動いていなければなりませんけど...) >scored を再起動して, checkpoint をとったところから restart します. >詳しくは > > http://www.pccluster.org/score/dist/score/html/ja/reference/scored/auto.html >を参照してください. >実行中のプログラムがどこから再開するかは >checkpoint の間隔に依存します. > > > ・故障停止した後予備のPCが動作開始するまで、どのぐらい時間がかかるのでしょう > > か? > >これは sc_watch の監視タイマの時間間隔に依存します. > > > ・実行中のプログラムを止めることなく、停止したPCの接続をはずして、 > >  そのはずした部分に新たに別の予備のPCを接続することは可能でしょうか? > >これは上記の質問に関連したものでしょうか? >それとも独立 (予備マシンを登録していない場合, 新しい PC を接続して >その PC で再開する) でしょうか? > >前者でしたら, その PC は使用されていないので接続することに問題は >無いと思います. >その PC を実際に使用するのは別のタイミングになりますが... >後者も sc_watch のやることを手動で行えば良いのでなんとかなると思います. >どちらにしろ PM/Ethernet を使用する場合は ethernet の config file >を修正する必要がありそうですが... > > from Kameyama Toyohisa From peer.ueberholz @ hs-niederrhein.de Tue Jun 18 18:13:10 2002 From: peer.ueberholz @ hs-niederrhein.de (Peer Ueberholz) Date: Tue, 18 Jun 2002 11:13:10 +0200 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: <200206141304.g5ED4xv25831@yl-dhcp18.is.s.u-tokyo.ac.jp> References: <200206141304.g5ED4xv25831@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <200206180913.g5I9DBj16163@pc03232.kr.hs-niederrhein.de> Dear Mr. Toyohisa, thank you very much for your fast answer. I would like to send you some more details on our configuration: We have one machine configured withe RedHat 7.2 and Score 5.0.1, called frontend.cluster.domain, which is connected only with Ethernet to the compute hosts. ( see #"define MSGBSERV msgbserv=(frontend.cluster.domain:8764)" in scorehosts.db. The compute-hosts comp4, comp5, comp6 and comp7 are connected via ethernet and myrinet2k. We have the following configuration files in /opt/score/etc: ============================================================================================ [root @ frontend etc]# more pm-myrinet.conf # # Sample configuration file for an 8 node PC Cluster # # Node specification # NodeNumber Hostname switchNumber.portNumber 0 comp4.cluster.domain 0.0 1 comp5.cluster.domain 0.1 2 comp6.cluster.domain 0.2 3 comp7.cluster.domain 0.3 ============================================================================================ [root @ frontend etc]# more pm-ethernet.conf unit 0 maxnsend 8 0 00:30:48:11:EC:DB comp4.cluster.domain 1 00:30:48:11:8B:CE comp5.cluster.domain 2 00:30:48:12:01:4C comp6.cluster.domain 3 00:30:48:12:01:63 comp7.cluster.domain ============================================================================================ [root @ frontend etc]# more scorehosts.db /* * SCore 5.0 scorehosts.db * generated by PCCC EIT 5.0 */ /* PM/Myrinet */ myrinet type=myrinet \ -firmware:file=/opt/score/share/lanai/lanai.mcp \ -config:file=/opt/score/etc/pm-myrinet.conf /* PM/Myrinet */ myrinet2k type=myrinet2k \ -firmware:file=/opt/score/share/lanai/lanaiM2k.mcp \ -config:file=/opt/score/etc/pm-myrinet.conf /* PM/Ethernet */ ethernet type=ethernet \ -config:file=/opt/score/etc/pm-ethernet.conf gigaethernet type=ethernet \ -config:file=/opt/score/etc/pm-ethernet.conf /* PM/Agent */ udp type=agent -agent=pmaudp \ -config:file=/opt/score/etc/pm-udp.conf /* RHiNET */ rhinet type=rhinet \ -firmware:file=/opt/score/share/rhinet/phu_top_0207a.hex \ -config:file=/opt/score/etc/pm-rhinet.conf ## /* PM/SHMEM */ shmem0 type=shmem -node=0 shmem1 type=shmem -node=1 ## #include "/opt/score//etc/ndconf/0" #include "/opt/score//etc/ndconf/1" #include "/opt/score//etc/ndconf/2" #include "/opt/score//etc/ndconf/3" ## #define MSGBSERV msgbserv=(frontend.cluster.domain:8764) comp4.cluster.domain HOST_0 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 MSGBSERV comp5.cluster.domain HOST_1 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 MSGBSERV comp6.cluster.domain HOST_2 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 MSGBSERV comp7.cluster.domain HOST_3 network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 MSGBSERV [root @ frontend etc]# ============================================================================================ We have an 8-port myrinet2k switch and the 4 nodes auf connected to the first 4 ports. We have also tried other ports without success. Thank you very much for you help With kind regards, Peer Ueberholz and Alexander Golx Am Freitag, 14. Juni 2002 15:04 schrieb kameyama @ pccluster.org: > In article <001e01c2139e$03a1efa0$6400a8c0 @ leqoq> wrotes: > > 2. problem: > > We tried to run the ./hello example program (i think from MPI) > > and got following error message: > > "...SCore-D: Error no full coverage network" 4 times, because > > running on 4 (previous installed) nodes. > > Ethernet works, and myrinet2k should work. > > Please send us to scorehosts.db and command line log. > I think the problem has scoreboard setting. > SCore-D needs at least one same network in all hosts. > # But rpmtest and scstest don't seach network attribute. > # Those commands seach only PM configuration file. > > For example, scorehosts.db is following setting: > comp0.pccluster.org group=pcc network=ethernet,myrinet2k > comp1.pccluster.org group=pcc network=ethernet > comp2.pccluster.org group=pcc network=myrinet2k > comp3.pccluster.org group=pcc network=myrinet2k > SCore-D cannot running to group pcc, because there is not all > common network. But if pm-myrinet.conf is describe all hosts and > it is correctly, scstest and pmtest are worked. > > from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From s-sumi @ flab.fujitsu.co.jp Tue Jun 18 18:40:19 2002 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Tue, 18 Jun 2002 18:40:19 +0900 (JST) Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: <200206180913.g5I9DBj16163@pc03232.kr.hs-niederrhein.de> References: <200206141304.g5ED4xv25831@yl-dhcp18.is.s.u-tokyo.ac.jp> <200206180913.g5I9DBj16163@pc03232.kr.hs-niederrhein.de> Message-ID: <20020618.184019.294709147.s-sumi@flab.fujitsu.co.jp> Hi. If your cluster nodes are connected to the first 4 ports, the pm-myrinet.conf of your cluster must be as follows: ========================================================= 0 comp4.cluster.domain 0.15 1 comp5.cluster.domain 0.14 2 comp6.cluster.domain 0.13 3 comp7.cluster.domain 0.12 ========================================================= Here is a view of front panel of your myrinet switch. ========================================================================= left right ========================================================================= node4 node5 node6 node7 xx xx xx xx Label 15 14 13 12 ========================================================================= Sinji. From: Peer Ueberholz Subject: Re: [SCore-users-jp] [SCore-users] SCore-installation Date: Tue, 18 Jun 2002 11:13:10 +0200 Message-ID: <200206180913.g5I9DBj16163 @ pc03232.kr.hs-niederrhein.de> peer.ueberholz> peer.ueberholz> Dear Mr. Toyohisa, peer.ueberholz> peer.ueberholz> thank you very much for your fast answer. I would like to send you peer.ueberholz> some more details on our configuration: peer.ueberholz> peer.ueberholz> We have one machine configured withe RedHat 7.2 and Score 5.0.1, peer.ueberholz> called frontend.cluster.domain, which is connected only with peer.ueberholz> Ethernet to the compute hosts. peer.ueberholz> ( see #"define MSGBSERV peer.ueberholz> msgbserv=(frontend.cluster.domain:8764)" in scorehosts.db. The peer.ueberholz> compute-hosts comp4, comp5, comp6 and comp7 are connected via peer.ueberholz> ethernet and myrinet2k. peer.ueberholz> peer.ueberholz> We have the following configuration files in /opt/score/etc: peer.ueberholz> ============================================================================================ peer.ueberholz> [root @ frontend etc]# more pm-myrinet.conf peer.ueberholz> # peer.ueberholz> # Sample configuration file for an 8 node PC Cluster peer.ueberholz> # peer.ueberholz> # Node specification peer.ueberholz> # NodeNumber Hostname switchNumber.portNumber peer.ueberholz> 0 comp4.cluster.domain 0.0 peer.ueberholz> 1 comp5.cluster.domain 0.1 peer.ueberholz> 2 comp6.cluster.domain 0.2 peer.ueberholz> 3 comp7.cluster.domain 0.3 peer.ueberholz> ============================================================================================ peer.ueberholz> [root @ frontend etc]# more pm-ethernet.conf peer.ueberholz> unit 0 peer.ueberholz> maxnsend 8 peer.ueberholz> 0 00:30:48:11:EC:DB comp4.cluster.domain peer.ueberholz> 1 00:30:48:11:8B:CE comp5.cluster.domain peer.ueberholz> 2 00:30:48:12:01:4C comp6.cluster.domain peer.ueberholz> 3 00:30:48:12:01:63 comp7.cluster.domain peer.ueberholz> ============================================================================================ peer.ueberholz> [root @ frontend etc]# more scorehosts.db peer.ueberholz> /* peer.ueberholz> * SCore 5.0 scorehosts.db peer.ueberholz> * generated by PCCC EIT 5.0 peer.ueberholz> */ peer.ueberholz> peer.ueberholz> /* PM/Myrinet */ peer.ueberholz> myrinet type=myrinet \ peer.ueberholz> -firmware:file=/opt/score/share/lanai/lanai.mcp \ peer.ueberholz> -config:file=/opt/score/etc/pm-myrinet.conf peer.ueberholz> peer.ueberholz> /* PM/Myrinet */ peer.ueberholz> myrinet2k type=myrinet2k \ peer.ueberholz> peer.ueberholz> -firmware:file=/opt/score/share/lanai/lanaiM2k.mcp \ peer.ueberholz> -config:file=/opt/score/etc/pm-myrinet.conf peer.ueberholz> peer.ueberholz> /* PM/Ethernet */ peer.ueberholz> ethernet type=ethernet \ peer.ueberholz> -config:file=/opt/score/etc/pm-ethernet.conf peer.ueberholz> gigaethernet type=ethernet \ peer.ueberholz> -config:file=/opt/score/etc/pm-ethernet.conf peer.ueberholz> /* PM/Agent */ peer.ueberholz> udp type=agent -agent=pmaudp \ peer.ueberholz> -config:file=/opt/score/etc/pm-udp.conf peer.ueberholz> peer.ueberholz> /* RHiNET */ peer.ueberholz> rhinet type=rhinet \ peer.ueberholz> peer.ueberholz> -firmware:file=/opt/score/share/rhinet/phu_top_0207a.hex \ peer.ueberholz> -config:file=/opt/score/etc/pm-rhinet.conf peer.ueberholz> ## peer.ueberholz> /* PM/SHMEM */ peer.ueberholz> shmem0 type=shmem -node=0 peer.ueberholz> shmem1 type=shmem -node=1 peer.ueberholz> ## peer.ueberholz> #include "/opt/score//etc/ndconf/0" peer.ueberholz> #include "/opt/score//etc/ndconf/1" peer.ueberholz> #include "/opt/score//etc/ndconf/2" peer.ueberholz> #include "/opt/score//etc/ndconf/3" peer.ueberholz> ## peer.ueberholz> #define MSGBSERV msgbserv=(frontend.cluster.domain:8764) peer.ueberholz> peer.ueberholz> comp4.cluster.domain HOST_0 peer.ueberholz> network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 peer.ueberholz> MSGBSERV peer.ueberholz> comp5.cluster.domain HOST_1 peer.ueberholz> network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 peer.ueberholz> MSGBSERV peer.ueberholz> comp6.cluster.domain HOST_2 peer.ueberholz> network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 peer.ueberholz> MSGBSERV peer.ueberholz> comp7.cluster.domain HOST_3 peer.ueberholz> network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 peer.ueberholz> MSGBSERV peer.ueberholz> [root @ frontend etc]# peer.ueberholz> ============================================================================================ peer.ueberholz> peer.ueberholz> We have an 8-port myrinet2k switch and the 4 nodes auf connected to peer.ueberholz> the first 4 ports. We have also tried other ports without success. peer.ueberholz> peer.ueberholz> Thank you very much for you help peer.ueberholz> peer.ueberholz> With kind regards, peer.ueberholz> peer.ueberholz> Peer Ueberholz and Alexander Golx peer.ueberholz> peer.ueberholz> peer.ueberholz> peer.ueberholz> Am Freitag, 14. Juni 2002 15:04 schrieb kameyama @ pccluster.org: peer.ueberholz> > In article <001e01c2139e$03a1efa0$6400a8c0 @ leqoq> peer.ueberholz> wrotes: peer.ueberholz> > > 2. problem: peer.ueberholz> > > We tried to run the ./hello example program (i think from MPI) peer.ueberholz> > > and got following error message: peer.ueberholz> > > "...SCore-D: Error no full coverage network" 4 times, because peer.ueberholz> > > running on 4 (previous installed) nodes. peer.ueberholz> > > Ethernet works, and myrinet2k should work. peer.ueberholz> > peer.ueberholz> > Please send us to scorehosts.db and command line log. peer.ueberholz> > I think the problem has scoreboard setting. peer.ueberholz> > SCore-D needs at least one same network in all hosts. peer.ueberholz> > # But rpmtest and scstest don't seach network attribute. peer.ueberholz> > # Those commands seach only PM configuration file. peer.ueberholz> > peer.ueberholz> > For example, scorehosts.db is following setting: peer.ueberholz> > comp0.pccluster.org group=pcc network=ethernet,myrinet2k peer.ueberholz> > comp1.pccluster.org group=pcc network=ethernet peer.ueberholz> > comp2.pccluster.org group=pcc network=myrinet2k peer.ueberholz> > comp3.pccluster.org group=pcc network=myrinet2k peer.ueberholz> > SCore-D cannot running to group pcc, because there is not all peer.ueberholz> > common network. But if pm-myrinet.conf is describe all hosts and peer.ueberholz> > it is correctly, scstest and pmtest are worked. peer.ueberholz> > peer.ueberholz> > from Kameyama Toyohisa peer.ueberholz> peer.ueberholz> peer.ueberholz> _______________________________________________ peer.ueberholz> SCore-users mailing list peer.ueberholz> SCore-users @ pccluster.org peer.ueberholz> http://www.pccluster.org/mailman/listinfo/score-users peer.ueberholz> _______________________________________________ peer.ueberholz> SCore-users-jp mailing list peer.ueberholz> SCore-users-jp @ pccluster.org peer.ueberholz> http://www.pccluster.org/mailman/listinfo/score-users-jp peer.ueberholz> peer.ueberholz> ------ Shinji Sumimoto, Fujitsu Labs _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Tue Jun 18 18:54:31 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Tue, 18 Jun 2002 18:54:31 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: Your message of "Tue, 18 Jun 2002 11:13:10 JST." <200206180913.g5I9DBj16163@pc03232.kr.hs-niederrhein.de> Message-ID: <200206180954.g5I9sWv25998@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <200206180913.g5I9DBj16163 @ pc03232.kr.hs-niederrhein.de> Peer Ueberholz wrotes: > /* PM/Myrinet */ > myrinet2k type=myrinet2k \ > > -firmware:file=/opt/score/share/lanai/lanaiM2k.mcp \ > -config:file=/opt/score/etc/pm-myrinet.conf If there is blank line, please delete this line. > comp4.cluster.domain HOST_0 > network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 > MSGBSERV If this is realy 3 lines, please join the lines. Basically, scorehosts.db consist 1 record to 1 line. If you want to split line to 1 record, you must insert "\" to end of the lines (same as myrinet2k). You can check network setting by scbutil % scbutil network --v This command print network and hosts which have this network. If scoreboard database is right, scbutil output as following: myrinet2k comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain ethernet comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain shmem0 comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain shmem1 comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From master.of.brainless.things @ gmx.net Wed Jun 19 07:06:20 2002 From: master.of.brainless.things @ gmx.net (=?iso-2022-jp?b?bWFzdGVyLm9mLmJyYWlubGVzcy50aGluZ3MgGyRCIXcbKEIgZ214Lm5l?= =?iso-2022-jp?b?dA==?=) Date: Wed, 19 Jun 2002 00:06:20 +0200 Subject: [SCore-users-jp] [SCore-users] SCore-installation References: <200206180954.g5I9sWv25998@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <004f01c21714$61a84e40$6400a8c0@leqoq> at first we ordered our contacts on the myrinet2k switch like Mr. Sumimoto (thanks for that hint) said: ======================================================= left right ======================================================= | node4 node5 node6 node7 xx xx xx xx label | 15 14 13 12 11 10 9 8 ======================================================= and changed the pm-myrinet.conf to: ------------------------------------------------------- 0 comp4.cluster.domain 0.15 1 comp5.cluster.domain 0.14 2 comp6.cluster.domain 0.13 3 comp7.cluster.domain 0.12 # 4 %s 0.11 # 5 %s 0.10 # 6 %s 0.9 # 7 %s 0.8 ------------------------------------------------------- rpmtests with "-dest 1/2/3 -ping" and "-dest 1/2/3 -vwrite" works now, but here the output of our stress test: [root @ frontend etc]# msgb -group pcc& [1] 30046 [root @ frontend etc]# scout [comp4-7]: SCOUT(5.0.1): Ready. [root @ frontend etc]# scstest -network myrinet2k Host (comp5.cluster.domain) unreachable. Host (comp4.cluster.domain) unreachable. Host (comp4.cluster.domain) unreachable. Host (comp4.cluster.domain) unreachable. what is confusing us mostly, is 1 time comp5..., and 3 times comp4, but nothing else. in scorehosts.db wasn't a blank line in the myrinet2k config line (that was just a copy&paste-error). the same with > comp4.cluster.domain HOST_0 > network=myrinet2k,ethernet,shmem0,shmem1 group=_scoreall_,pcc smp=2 > MSGBSERV its in one line, but its 104 lines long. so we even tried with "\"-seperated splitted lines, but still no effort with stress test. here some more output: [root @ frontend etc]# scbutil network --v myrinet2k comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain ethernet comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain shmem0 comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain shmem1 comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain 4 values, 4 records found. we more and more get the idea of an really stupid error in our installation or configuration. please help us, even if it seems to be a very easy problem. Thanks to everyone Peer Ueberholz and Alexander Golks _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Wed Jun 19 10:03:54 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Wed, 19 Jun 2002 10:03:54 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: <004f01c21714$61a84e40$6400a8c0@leqoq> References: <200206180954.g5I9sWv25998@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <3107325834.hori0002@mail.bestsystems.co.jp> Hi. >SCOUT(5.0.1): Ready. >[root @ frontend etc]# scstest -network myrinet2k >Host (comp5.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. > >what is confusing us mostly, is 1 time comp5..., and 3 times comp4, >but nothing else. OK, probably the link description between comp4 and com5 is wrong (and maybe others too). When comp4 checks the connection to comp5, it fails and output the message on comp5. And the others check the connections to comp4, they also fail and output the messages on comp4. These checks take place in the order described in the pm-myrinet.conf. Well, there is no mistery here. The scstest program (and SCore-D) checks the connection (described above) by the syntax only (I mean, no checking on physical connection) at this step. What you have to do is checking the "descriptions." Did you restart /etc/rc.d/init.d/scoreboard after some changes on scorehosts.db ? ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From master.of.brainless.things @ gmx.net Thu Jun 20 00:38:29 2002 From: master.of.brainless.things @ gmx.net (=?iso-2022-jp?b?bWFzdGVyLm9mLmJyYWlubGVzcy50aGluZ3MgGyRCIXcbKEIgZ214Lm5l?= =?iso-2022-jp?b?dA==?=) Date: Wed, 19 Jun 2002 17:38:29 +0200 Subject: [SCore-users-jp] [SCore-users] SCore-installation References: <20020619030000.30333.13423.Mailman@www.pccluster.org> Message-ID: <001801c217a7$648350e0$6400a8c0@leqoq> here again a short brief description of our cluster configuration, because we don't know, if we exaplaind that just good enough: ----------------- ----------------- | frontend | | comp4 | | RedHat7.2 | |-----| RH7.2 |---------------| | SCore5.0.1 | | | SCore5.0.1 | | ----------------- | ----------------- | | | | | |-------| | | | ----------------- | | | | comp5 | | | | |-------| RH7.2 |-------| | | | | | SCore5.0.1 | | | ----------------- ----------------- --------------- | 100Mb Ethernet| | myrinet2k | | switch | | switch | ----------------- ----------------- --------------- | | | comp6 | | | | |-------| RH7.2 |-------| | | | SCore5.0.1 | | | ----------------- | | | | | | ----------------- | | | comp7 | | |-------------| RH7.2 |---------------| | SCore5.0.1 | ----------------- (this looks just fine in blockcharacter mode, and i hope this picture will still live after emailing, in outlock xpress it looks horrible) So, not according to the installation guide, we don't install RedHat7.2 with all packages on the frontend. Could this be the hole problem??? After SCore5.0.1 installation on the frontend, we installed the comp4-7 via the eit-bootdisk. We don't know if we really got the idea how the scstest works, but as Atsushi HORI (thanks fo that hint) mentioned, the "link-descriptions", whitch should be the pm-myrinet.conf, pm-ethernet.conf,etc. and the scorehosts.db looks just fine. We can't find any problem. or ment you something else with "description"? also, now we restarted scoreboard after changes on scoreboard.db. all config files in /etc seems to be good,too, according to the guide. the "scstest -network myrinet2k" and "scstest -network ethernet" both fails with: >Host (comp5.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. >Host (comp4.cluster.domain) unreachable. Our last option would be to install the frontend again (full packages, Score) and all nodes again. But probably that would not solve that problem... Thanks again for your help Peer Ueberholz and Alexander Golks _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Thu Jun 20 15:31:54 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 20 Jun 2002 15:31:54 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: Your message of "Wed, 19 Jun 2002 17:38:29 JST." <001801c217a7$648350e0$6400a8c0@leqoq> Message-ID: <200206200631.g5K6Vsv04887@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <001801c217a7$648350e0$6400a8c0 @ leqoq> wrotes: > the "scstest -network myrinet2k" and "scstest -network ethernet" both fails > with: > >Host (comp5.cluster.domain) unreachable. > >Host (comp4.cluster.domain) unreachable. > >Host (comp4.cluster.domain) unreachable. > >Host (comp4.cluster.domain) unreachable. About this error message, my hostname is registerd in PM configuration file (if is not registerd, scstest will print other message), but other hosts is not registered... Please check this: % export PM_DEBUG=3 % rpminit comp4.cluster.domain ethernet 2>&1| grep self This command will print as follow: self comp4.cluster.domain n 0 of 4 nodes comp4.cluster.domain is self hostname. If this hostname is different in config file, please change this hostname. If all hostname seems to correct, please exec % rpminit comp4.cluster.domain ethernet 2>&1| grep /var This command will print as follow: ethernet_open_device(): -config /var/scored/scoreboard/frontend.0000B3000sD- ... /var/scored/... is cache file of pm-ethernet.conf Then, please exec following command in scout environment: % scout cat /var/scored/scoreboard/frontend.0000B3000sD- (Pleach change filename to privious output filename) And please send we this output. This command will be output pm-ethernet.conf. If the all cache file is same, this print only 1 times. Otherwise, scoreboard cache file on some hosts is broken. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From master.of.brainless.things @ gmx.net Fri Jun 21 04:46:42 2002 From: master.of.brainless.things @ gmx.net (=?iso-2022-jp?b?bWFzdGVyLm9mLmJyYWlubGVzcy50aGluZ3MgGyRCIXcbKEIgZ214Lm5l?= =?iso-2022-jp?b?dA==?=) Date: Thu, 20 Jun 2002 21:46:42 +0200 Subject: [SCore-users-jp] [SCore-users] SCore-installation References: <200206200631.g5K6Vsv04887@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <000801c21893$3c75afc0$6400a8c0@leqoq> At first; thanks to be so patient with us. So i've done how Mr. Toyohisa has advised us, and nwo here's the output: [root @ frontend root]# export PM_DEBUG=3 [root @ frontend root]# rpminit comp4.cluster.domain ethernet 2>&1 | grep self self comp4.cluster.domain n 0 of 4 nodes [root @ frontend root]# rpminit comp5.cluster.domain ethernet 2>&1 | grep self self comp5.cluster.domain n 1 of 4 nodes [root @ frontend root]# rpminit comp6.cluster.domain ethernet 2>&1 | grep self self comp6.cluster.domain n 2 of 4 nodes [root @ frontend root]# rpminit comp7.cluster.domain ethernet 2>&1 | grep self self comp7.cluster.domain n 3 of 4 nodes [root @ frontend root]# rpminit comp4.cluster.domain ethernet 2>&1|grep /var ethernet_open_device(): -config /var/scored/scoreboard/frontend.0000B6002NM4 pmEthernetOpenDevice("/var/scored/scoreboard/frontend.0000B6002NM4", 0xbffffb84): pmEthernetMapEthernet(0, 0xbffff8c8): 0 [root @ frontend root]# scout -g pcc SCOUT: Spawning done. SCOUT: session started. [root @ frontend root]# scout cat /var/scored/scoreboard/frontend.0000B6002NM4 [comp4-7]: unit 0 maxnsend 8 0 00:30:48:11:EC:DB comp4.cluster.domain 1 00:30:48:11:8B:CE comp5.cluster.domain 2 00:30:48:12:01:4C comp6.cluster.domain 3 00:30:48:12:01:63 comp7.cluster.domain and one more command, i've tried, in this debug-mode, but i don't really think, that this would be helpful!? [root @ frontend root]#scstest -network myrinet2k . . . myri_is_reachable(0x83deeb0, 0x40157008): myriGetNodeByName(0x83dcd98, 1075146760, 0xbffff738): No route to host(113) Host (comp4.cluster.domain) unreachable. any idea? thanks for all, Peer Ueberholz and Alex Golks _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Fri Jun 21 08:05:02 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Fri, 21 Jun 2002 08:05:02 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: <000801c21893$3c75afc0$6400a8c0@leqoq> References: <200206200631.g5K6Vsv04887@yl-dhcp18.is.s.u-tokyo.ac.jp> Message-ID: <3107491502.hori0000@mail.bestsystems.co.jp> >any idea? Well, try the following and let us know all of the output. ----- # scout -g pcc # scout officialname `scorehosts -g pcc` ----- Here (`) is a back quote. ----- ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Fri Jun 21 09:53:22 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 21 Jun 2002 09:53:22 +0900 Subject: [SCore-users-jp] [SCore-users] SCore-installation In-Reply-To: Your message of "Fri, 21 Jun 2002 08:05:02 JST." <3107491502.hori0000@mail.bestsystems.co.jp> Message-ID: <200206210053.g5L0rMv09764@yl-dhcp18.is.s.u-tokyo.ac.jp> In article <3107491502.hori0000 @ mail.bestsystems.co.jp> Atsushi HORI wrotes: > >any idea? > > Well, try the following and let us know all of the output. > > ----- > # scout -g pcc > # scout officialname `scorehosts -g pcc` > ----- Here (`) is a back quote. ----- Sorry, officialname install only server host in this version. So officialname is not found on compute host. Please exec following command to copy officialname to compute host: # rsh-all -g pcc -norsh rdist -c /opt/score/bin/officialname /opt/score/bin/bin.*/officialname @host: from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From master.of.brainless.things @ gmx.net Wed Jun 26 23:27:55 2002 From: master.of.brainless.things @ gmx.net (=?iso-2022-jp?b?bWFzdGVyLm9mLmJyYWlubGVzcy50aGluZ3MgGyRCIXcbKEIgZ214Lm5l?= =?iso-2022-jp?b?dA==?=) Date: Wed, 26 Jun 2002 16:27:55 +0200 Subject: [SCore-users-jp] [SCore-users] (no subject) Message-ID: <000f01c21d1d$b003eea0$6400a8c0@leqoq> sorry, for coming so late with this return mail: We don't know if we got the way you want to do what, but we think like this: at first copy officialname to all compute hosts: >[root @ frontend bin]# rsh-all -g pcc -norsh rdist -c /opt/score/bin/officialname /opt/score/bin/bin.*/officialname @host: >comp4.cluster.domain >comp5.cluster.domain >comp6.cluster.domain >comp7.cluster.domain >comp4.cluster.domain: comp4.cluster.domain: updating host comp4.cluster.domain >comp5.cluster.domain: comp5.cluster.domain: updating host comp5.cluster.domain >comp6.cluster.domain: comp6.cluster.domain: updating host comp6.cluster.domain >comp7.cluster.domain: comp7.cluster.domain: updating host comp7.cluster.domain >comp4.cluster.domain: comp4.cluster.domain: LOCAL ERROR: /opt/score/bin/bin.*/officialname: lstat failed: No such file or directory >comp4.cluster.domain: comp4.cluster.domain: updating of comp4.cluster.domain finished >comp5.cluster.domain: comp5.cluster.domain: LOCAL ERROR: /opt/score/bin/bin.*/officialname: lstat failed: No such file or directory >comp5.cluster.domain: comp5.cluster.domain: updating of comp5.cluster.domain finished >comp6.cluster.domain: comp6.cluster.domain: LOCAL ERROR: /opt/score/bin/bin.*/officialname: lstat failed: No such file or directory >comp6.cluster.domain: comp6.cluster.domain: updating of comp6.cluster.domain finished >comp7.cluster.domain: comp7.cluster.domain: LOCAL ERROR: /opt/score/bin/bin.*/officialname: lstat failed: No such file or directory >comp7.cluster.domain: comp7.cluster.domain: updating of comp7.cluster.domain finished some errors, and this isn't really good, isn't it?! >[root @ frontend bin]# scout -g pcc >SCOUT: Spawning done. >SCOUT: session started. >[root @ frontend bin]# scout officialname `scorehosts -g pcc` >4 hosts found. >[comp4-7]: >bash: officialname: command not found now, we realize, that officialname is a symbolic link to .wrapper. on the nodes officialname in this state was a broken link to .wrapper. so we run >[root @ frontend bin]# rsh-all -g pcc -norsh rdist -c /opt/score/bin/.wrapper /opt/score/bin/bin.*/.wrapper @host: there still is missing something, so we copy the whole bin directory to the hosts: [root @ frontend bin]# rsh-all -g pcc -norsh rdist -c /opt/score/bin/* @host: now running: [root @ frontend bin]# scout officialname `scorehosts -g pcc` 4 hosts found. [comp4]: comp4.cluster.domain client05.cluster.domain client06.cluster.domain client07.cluster.domain [comp5]: client04.cluster.domain comp5.cluster.domain client06.cluster.domain client07.cluster.domain [comp6]: client04.cluster.domain client05.cluster.domain comp6.cluster.domain client07.cluster.domain [comp7]: client04.cluster.domain client05.cluster.domain client06.cluster.domain comp7.cluster.domain now we see, what's the problem (which is a little bit stupid). it seems that yp commands are wrong interpreted, so that the old /etc/hosts is used to determine the hostnames. so we updated the yp database, and modified hosts, and now: [root @ frontend etc]# scout officialname `scorehosts -g pcc` 4 hosts found. [comp4-7]: comp4.cluster.domain comp5.cluster.domain comp6.cluster.domain comp7.cluster.domain [root @ frontend etc]# scstest -network myrinet2k SCSTEST: BURST on myrinet2k(chan=0,ctx=0,len=16) 50 K packets. 100 K packets. 150 K packets. 200 K packets. 250 K packets. 300 K packets. 350 K packets. 400 K packets. 450 K packets. 500 K packets. . . . and the mpi demos work, too. Really big, big thanks to everyone (especially Mr. Kameyama, Mr. Hori and Mr. Sumimoto) for helping and supporting us. And we hope, that this group will dure a long time to help other score-"newbie's" like us. Peer Ueberholz & Alex Golks _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From chisaki @ cs.kumamoto-u.ac.jp Thu Jun 27 13:48:46 2002 From: chisaki @ cs.kumamoto-u.ac.jp (Yoshifumi CHISAKI) Date: Thu, 27 Jun 2002 13:48:46 +0900 Subject: [SCore-users-jp] Ver 5.0.1 Message-ID: <20020627044847.32082@vivaldi.cs.kumamoto-u.ac.jp> 苣木です。 毎度お世話になっております。 さて, http://www.pccluster.org/score/dist/SCore5.html から入手した, score-5.0.1-redhat7.2.i386.iso.gz をCD-ROMにして,中身を確認したら, /mnt/cdrom/score.rpm以下は, bininstall binuninstall j2re-1.3.1_02-1.i386.rpm score5.0.0-bench-5.0.0-1.i386.rpm score5.0.0-common-5.0.0-2.i386.rpm score5.0.0-comp-5.0.0-2.i386.rpm score5.0.0-demo-5.0.0-1.i386.rpm score5.0.0-doc-5.0.0-2.i386.rpm score5.0.0-example-5.0.0-1.i386.rpm score5.0.0-mpich-chscore-gnu-5.0.0-2.i386.rpm score5.0.0-mpich-chscore-gnu1ul-5.0.0-2.i386.rpm score5.0.0-mpich-common-5.0.0-2.i386.rpm score5.0.0-pbs-common-5.0.0-1.i386.rpm score5.0.0-pbs-server-5.0.0-1.i386.rpm score5.0.0-pbs-user-5.0.0-1.i386.rpm score5.0.0-pvm-common-5.0.0-1.i386.rpm score5.0.0-pvm-user-5.0.0-1.i386.rpm score5.0.0-scash-5.0.0-1.i386.rpm score5.0.0-server-5.0.0-2.i386.rpm score5.0.0-taco-5.0.0-1.i386.rpm score5.0.0-user-5.0.0-2.i386.rpm score5.0.0-utils-common-5.0.0-2.i386.rpm score5.0.0-utils-user-5.0.0-2.i386.rpm という状況です。 READMEをみても, Core Cluster Development Kit Release 5.0.0 2002/03/19 Copyright (C) 2002 PC Cluster Consortium Copyright (c) 2001, 2000, 1999 Real World Computing Partnership The SCore Cluster System Software License Agreement specifies the terms and conditions for use. see doc/html/en/license/LICENSE.html となっています。 score5.0.0-* は,実際は5.0.1なのでしょうか? 既にinstallしておりますので,確認する方法を お教えいただけませんでしょうか? #ちなみに,/opt/score5.0.0となっています。 ダウンロードする場所を間違えているようでしたら, お教え願えないでしょうか。 よろしくお願いいたします。 では。 From tsuchiya @ prologj.com Thu Jun 27 13:58:19 2002 From: tsuchiya @ prologj.com (Naohisa TSUCHIYA) Date: Thu, 27 Jun 2002 13:58:19 +0900 Subject: [SCore-users-jp] Ver 5.0.1 In-Reply-To: <20020627044847.32082@vivaldi.cs.kumamoto-u.ac.jp> Message-ID: 土屋@東清SIです。 on 02.6.27 1:48 PM, Yoshifumi CHISAKI at chisaki @ cs.kumamoto-u.ac.jp wrote: > score5.0.0-* は,実際は5.0.1なのでしょうか? > > 既にinstallしておりますので,確認する方法を > お教えいただけませんでしょうか? > #ちなみに,/opt/score5.0.0となっています。 > 私は5.0.1をwebからD/Lして使っていますが、scrunを実行すると、 SCore-D 5.0.1 conneted. <0:0> SCORE: 8 nodes (4x2) ready. と表示されます。 たしかに/optのディレクトリはscore5.0.0ですが。 ---------------------------------------------------------------- 土 屋 尚 久 (Naohisa TSUCHIYA, tsuchiya @ prologj.com) Tosei System Integrations Inc. http://www.tosei-si.com From kameyama @ pccluster.org Thu Jun 27 14:01:05 2002 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 27 Jun 2002 14:01:05 +0900 Subject: [SCore-users-jp] Ver 5.0.1 In-Reply-To: Your message of "Thu, 27 Jun 2002 13:48:46 JST." <20020627044847.32082@vivaldi.cs.kumamoto-u.ac.jp> Message-ID: <200206270501.g5R515v23385@yl-dhcp18.is.s.u-tokyo.ac.jp> 亀山です. お騒がせして申し訳ありません. In article <20020627044847.32082 @ vivaldi.cs.kumamoto-u.ac.jp> Yoshifumi CHISAKI wrotes: > http://www.pccluster.org/score/dist/SCore5.html > から入手した, > score-5.0.1-redhat7.2.i386.iso.gz > をCD-ROMにして,中身を確認したら, SCore 5.0.1 は SCore 5.0.0 の bug fix 版です. rpm 的には 違った version の混在を許そう ということで, version によって, install directory を変更するため score5.0.0-* という名前にしたのですが, 5.0.1 に関しては bug fix ということで, この対象にしませんでした. また, bug fix の対象ではない rpm も更新していません. そのため, rpm の名前はこのままで, release 番号だけを変更することにしました. 5.0.0 と 5.0.1 の見分け方ですが... > score5.0.0-common-5.0.0-2.i386.rpm ~ のように, いくつかの rpm の release 番号が 2 になっています. (5.0.0 の場合は 1 になっています.) また, install したあとでは /opt/score/etc/version が 5.0.1 になっています. また, scout 中で % scout コマンドを打ったときや scrun を動かしたときも 5.0.1 と 出力されます. from Kameyama Toyohisa From chisaki @ cs.kumamoto-u.ac.jp Thu Jun 27 14:03:04 2002 From: chisaki @ cs.kumamoto-u.ac.jp (Yoshifumi CHISAKI) Date: Thu, 27 Jun 2002 14:03:04 +0900 Subject: [SCore-users-jp] Ver 5.0.1 In-Reply-To: References: Message-ID: <20020627050304.11823@vivaldi.cs.kumamoto-u.ac.jp> 苣木です。 Naohisa TSUCHIYA wrote to 02.6.27 13:58: >土屋@東清SIです。 >私は5.0.1をwebからD/Lして使っていますが、scrunを実行すると、 >SCore-D 5.0.1 conneted. ><0:0> SCORE: 8 nodes (4x2) ready. >と表示されます。 >たしかに/optのディレクトリはscore5.0.0ですが。 あー,確かに, SCore-D 5.0.1 conneted. と表示されました。 ということは,5.0.1なんですね。 ありがとうございました。 From hartke @ phc.uni-kiel.de Fri Jun 28 03:00:25 2002 From: hartke @ phc.uni-kiel.de (Bernd Hartke) Date: Thu, 27 Jun 2002 20:00:25 +0200 (METDST) Subject: [SCore-users-jp] [SCore-users] mpirun/scrun error Message-ID: Dear score users, I have a cluster with the SCore system. I did all the tests described in the documentation, section installation guide, subsection system test, listed under scout test procedure and pm test procedure (scorehosts, sceptic, scout commands, and various rpmtest trials, and the stress test with scstest), without detecting any errors. So, it seems to me that the hardware, its interconnection, and the basic SCore intercommunication seems to work (?). But there is something wrong with application programs, compiled with mpif77, mpif90, or mpicc. When I try to run a simple mpi program (e.g. the cpi.c example program or the Fortran equivalent of it (taken from the Gropp/Lusk/Skjellum book "Using MPI")), I reprocibly get the following error message: <8> ULT: Exception Signal (11) without anything else happening; then the system appears to "hang", and I can get out of it only with Ctrl-C and by quitting the scout environment. I have no clue where this error message could come from. What is going wrong here? Best regards, Bernd Hartke --- Prof. Dr. Bernd Hartke e-mail: hartke @ phc.uni-kiel.de Theoretical Chemistry phone : +49-431-880-2753 Institute for Physical Chemistry fax : +49-431-880-1758 University of Kiel http://www.theochem.uni-stuttgart.de/~hartke Olshausenstrasse 40 24098 Kiel GERMANY _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Fri Jun 28 10:01:06 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Fri, 28 Jun 2002 10:01:06 +0900 Subject: [SCore-users-jp] Re: [SCore-users] mpirun/scrun error Message-ID: <3108103266.hori0000@mail.bestsystems.co.jp> Hi. > I reprocibly get the following >error message: > ><8> ULT: Exception Signal (11) Please do the following command in a scout environment. % scout ls -l /opt/score/deploy/i386-redhat7-linux/scored* If all the binary look the same, then it is OK, but not, you have to copy the SCore-D binary so that you have all the same binary files. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hartke @ phc.uni-kiel.de Fri Jun 28 17:44:58 2002 From: hartke @ phc.uni-kiel.de (Bernd Hartke) Date: Fri, 28 Jun 2002 10:44:58 +0200 (METDST) Subject: [SCore-users-jp] [SCore-users] I/O problem? In-Reply-To: Message-ID: Dear score users, there is a little Fortran program for the calculation of the number pi; it is the first example in the book "Using MPI" by Gropp/Lusk/Skjellum, and it is contained in many MPI(CH) distributions. I installed MPICH on my linuxPC some time ago, and this program ran just fine. Now, I tried to run this program on my cluster under SCore. To my surprise, it dies with the following error when compiled with mpif77: enter number of intervals (0 quits): list in: end of file apparent state: unit 5 (unnamed) last format: list io lately reading direct formatted external IO <0> SCORE: Program signaled (SIGABRT). When compiled with mpif90, I get the same error situation with slightly different error messages: enter number of intervals (0 quits): PGFIO-F-217/list-directed read/unit=5/attempt to read past end of file. File name = stdin formatted, sequential access record = 1 In source file pi.f, at line number 13 (in this case, I have to stop the program with Ctrl-C). The apparently offending part of the code is this: call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world,myid,ierr) call mpi_comm_size(mpi_comm_world,numprocs,ierr) 10 if (myid.eq.0) then write(6,*)"enter number of intervals (0 quits):" read(5,*)n end if call mpi_bcast(n,1,mpi_integer,0,mpi_comm_world,ierr) (Using "print *" and "read(*,*)" instead does not change anything.) Is there some construction in SCore that prevents this type of I/O on just one node? Bernd Hartke --- Prof. Dr. Bernd Hartke e-mail: hartke @ phc.uni-kiel.de Theoretical Chemistry phone : +49-431-880-2753 Institute for Physical Chemistry fax : +49-431-880-1758 University of Kiel http://www.theochem.uni-stuttgart.de/~hartke Olshausenstrasse 40 24098 Kiel GERMANY _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Fri Jun 28 17:54:22 2002 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Fri, 28 Jun 2002 17:54:22 +0900 Subject: [SCore-users-jp] Re: [SCore-users] I/O problem? Message-ID: <3108131662.hori0006@mail.bestsystems.co.jp> Hi, again, >Now, I tried to run this program on my cluster under SCore. >To my surprise, it dies with the following error when compiled with mpif77: > > enter number of intervals (0 quits): >list in: end of file >apparent state: unit 5 (unnamed) >last format: list io >lately reading direct formatted external IO ><0> SCORE: Program signaled (SIGABRT). SCore does not support stadnard input directly. Do the following, % scrun scatter -node 0 == a.out ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users