From nrcb @ streamline-computing.com Sun Feb 2 03:17:25 2003 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Sat, 1 Feb 2003 18:17:25 +0000 Subject: [SCore-users-jp] [SCore-users] Copper Myrinet pm problems Message-ID: <200302011817.h11IHPK10399@zeralda.streamline.com> Hi, we have just upgraded one of our older clusters to SCore 5.0.1 from 4.x (I think it was the first SCore to support Myrinet 2000 from 18 months ago - RedHat 6.2 dist). The cluster was working more or less ok under the old SCore system. The entire system has been re-installed as RedHat 7.2 + SCore 5.0.1. Hardware: Copper based Myrinet2k (May 2001) and Pentium III dual 866Mhz SuperMicro 1U Superservers. I have run rpmtest and the scstest -network myrinet2k for many hours over all compute nodes without problems. Have run gm1.6.3 codes (e.g PMB) and they work fine. SCore PM codes are having problems over Myrinet - e.g running PMB: <6:0> SCORE:WARNING MPICH/SCore [buffer=0x8951498, type=1025, from=11, size=262144, offset=189520] <6:0> SCORE:WARNING MPICH/SCore: receive-message-queue: <6:0> SCORE:WARNING MPICH/SCore (empty) <6:0> SCORE:WARNING MPICH/SCore: received-fragment: <6:0> SCORE:WARNING MPICH/SCore [buffer=0x40066180, type=1025, from=11, size=262144, fragment_size=8240, offset=189521] <6:0> SCORE:WARNING MPICH/SCore: queued-message: <6:0> SCORE:WARNING MPICH/SCore [buffer=0x8951498, type=1025, from=11, size=262144, offset=189520] <6:0> SCORE:WARNING MPICH/SCore: received an invalid fragment (mismatched offset) <6:0> SCORE:PANIC MPICH/SCore: critical error on message transfer <6:0> Trying to attach GDB (DISPLAY=localhost:10.0): PANIC SCORE: Program aborted. SCOUT: Session done. Lots of buffer mismatch errors. The same binary runs fine over ethernet or gigabit on the same hardware (i.e if add the -network=ethernet option then all ok so it is a Myrinet problem). We would like to keep SCore as the cluster has some new Xeon Gigabit nodes but will have to convert to GM if we cannot resolve this. Looks like a hardware problem (same code runs fine over Score 5.0.1 and fibre optic Myrinet 2k on Intel Xeon systems). Thanks, Nick _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From s-sumi @ bd6.so-net.ne.jp Sun Feb 2 09:30:18 2003 From: s-sumi @ bd6.so-net.ne.jp (Shinji Sumimoto) Date: Sun, 02 Feb 2003 09:30:18 +0900 (JST) Subject: [SCore-users-jp] Re: [SCore-users] Copper Myrinet pm problems In-Reply-To: <200302011817.h11IHPK10399@zeralda.streamline.com> References: <200302011817.h11IHPK10399@zeralda.streamline.com> Message-ID: <20030202.093018.730554912.s-sumi@bd6.so-net.ne.jp> Hi. Could you check CRC errors of the cluster nodes? Shinji. From: Nick Birkett Subject: [SCore-users] Copper Myrinet pm problems Date: Sat, 1 Feb 2003 18:17:25 +0000 Message-ID: <200302011817.h11IHPK10399 @ zeralda.streamline.com> nrcb> Hi, we have just upgraded one of our older clusters to SCore 5.0.1 from 4.x nrcb> (I think it was the first SCore to support Myrinet 2000 from 18 months ago - nrcb> RedHat 6.2 dist). nrcb> nrcb> The cluster was working more or less ok under the old SCore system. nrcb> nrcb> The entire system has been re-installed as RedHat 7.2 + SCore 5.0.1. nrcb> nrcb> Hardware: Copper based Myrinet2k (May 2001) and Pentium III dual 866Mhz nrcb> SuperMicro 1U Superservers. nrcb> nrcb> I have run rpmtest and the scstest -network myrinet2k for many hours over all nrcb> compute nodes without problems. nrcb> nrcb> Have run gm1.6.3 codes (e.g PMB) and they work fine. nrcb> nrcb> SCore PM codes are having problems over Myrinet - e.g running PMB: nrcb> nrcb> <6:0> SCORE:WARNING MPICH/SCore [buffer=0x8951498, type=1025, from=11, nrcb> size=262144, offset=189520] nrcb> <6:0> SCORE:WARNING MPICH/SCore: receive-message-queue: nrcb> <6:0> SCORE:WARNING MPICH/SCore (empty) nrcb> <6:0> SCORE:WARNING MPICH/SCore: received-fragment: nrcb> <6:0> SCORE:WARNING MPICH/SCore [buffer=0x40066180, type=1025, from=11, nrcb> size=262144, fragment_size=8240, offset=189521] nrcb> <6:0> SCORE:WARNING MPICH/SCore: queued-message: nrcb> <6:0> SCORE:WARNING MPICH/SCore [buffer=0x8951498, type=1025, from=11, nrcb> size=262144, offset=189520] nrcb> <6:0> SCORE:WARNING MPICH/SCore: received an invalid fragment (mismatched nrcb> offset) nrcb> <6:0> SCORE:PANIC MPICH/SCore: critical error on message transfer nrcb> <6:0> Trying to attach GDB (DISPLAY=localhost:10.0): PANIC nrcb> SCORE: Program aborted. nrcb> SCOUT: Session done. nrcb> nrcb> nrcb> Lots of buffer mismatch errors. The same binary runs fine over ethernet or nrcb> gigabit on the same hardware (i.e if add the -network=ethernet option then nrcb> all ok so it is a Myrinet problem). nrcb> nrcb> We would like to keep SCore as the cluster has some new Xeon Gigabit nodes nrcb> but will have to convert to GM if we cannot resolve this. nrcb> nrcb> Looks like a hardware problem (same code runs fine over Score 5.0.1 and fibre nrcb> optic Myrinet 2k on Intel Xeon systems). nrcb> nrcb> Thanks, nrcb> nrcb> Nick nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> nrcb> _______________________________________________ nrcb> SCore-users mailing list nrcb> SCore-users @ pccluster.org nrcb> http://www.pccluster.org/mailman/listinfo/score-users nrcb> ----- Shinji Sumimoto E-Mail: s-sumi @ bd6.so-net.ne.jp _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From ishikawa @ is.s.u-tokyo.ac.jp Mon Feb 3 09:53:15 2003 From: ishikawa @ is.s.u-tokyo.ac.jp (Yutaka Ishikawa) Date: Mon, 03 Feb 2003 09:53:15 +0900 (JST) Subject: [SCore-users-jp] 第2 回PCクラスタシンポジウム開催のお知らせ Message-ID: <20030203.095315.884014234.ishikawa@is.s.u-tokyo.ac.jp> SCore Usersの皆様、 シンポジウム事前申し込みの締め切りが一週間後となりました。 参加を予定されている方で、まだ登録されていない方は、今すぐWEB登録のほど お願い致します。 シンポジウムでは、 SCoreの最新情報(SCore 5.4の配布) 企業によるPCクラスタの最新の取り組み などが無料で聴講できます。 石川@PCクラスタコンソーシアム会長 =================================        第2回PCクラスタシンポジウムのご案内 構造解析、流体力学、分子動力学、ゲノム情報処理など、大規模計算を 必要とする計算科学・計算化学分野では、研究機関を中心にPCクラスタ が利用されてきました。最近では、PCクラスタは産業界にも浸透し、例え ば、自動車産業界では、車体設計、衝突シミュレーションなどにPCクラス タが利用されています。 PCクラスタコンソーシアム主催の第2回PCクラスタシンポジウムでは、 会員企業による産業界におけるPCクラスタ導入事例の講演や企業展示 など、PCクラスタに関する最新の動向をお伝えします。皆様の御参加を お待ちしております。 ■開催日時 2003年2月20日(木)〜2003年2月21日(金) 10:00〜17:00 ■開催場所 日本科学未来館7階 2003年2月20日(木):ワークショップ・・・みらいCANホール 2003年2月21日(金):シンポジウム・・・・みらいCANホール 併設企業展示:イノベーションホール なお、2月20日(木)のワークショップの参加対象は会員のみとなります。 ■参加費 無料 ■申込み方法 http://www.pccluster.org/  より お申込み下さい。 ■定員 300名 ■主催 PCクラスタコンソーシアム TEL:03-3263-6474, FAX:03-3263-7537 e-mail:sec @ pccluster.org U R L :http://www.pccluster.org/ ================================= From terasawa @ nssnet.co.jp Mon Feb 3 18:06:42 2003 From: terasawa @ nssnet.co.jp (Terasawa) Date: Mon, 03 Feb 2003 18:06:42 +0900 Subject: [SCore-users-jp] PETScのインストール方法 In-Reply-To: <20021219042516.A994920046@neal.il.is.s.u-tokyo.ac.jp> References: <20021219042516.A994920046@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <200302030914.SAA08680@nss-ntsv4.nssnet.co.jp> 御世話に成ります。 寺沢@NSSと申します。 以前のメーリングリストで、下記の記述がありました。 >SCore の library は static link 版のみしか提供していないため, SCore の >関数/変数を使用している library が shared library になっていると >うまく link できないようです... 因みに私共では、MPIで通信をする親のモジュールにリンクさせて使用する 計算モジュールを動的に切換えたいために、動的にロードされるライブラリ (zzz.so)にしたいと考えます。 計算モジュールのライブラリ(zzz.so)の内部ではSCoreの関数もMPIも使ってい ません。 SCoreの環境と矛盾するでしょうか? システムの設計にかかわることですので心配しております。 何方かお教えいただければ幸です。 よろしくお願いします。 以上 -- Terasawa mailto:terasawa @ nssnet.co.jp From kameyama @ pccluster.org Tue Feb 4 09:07:00 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Tue, 04 Feb 2003 09:07:00 +0900 Subject: [SCore-users-jp] PETScのインストール方法 In-Reply-To: Your message of "Mon, 03 Feb 2003 18:06:42 JST." <200302030914.SAA08680@nss-ntsv4.nssnet.co.jp> Message-ID: <20030204000700.2A17F2004F@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <200302030914.SAA08680 @ nss-ntsv4.nssnet.co.jp> Terasawa wrotes: > 因みに私共では、MPIで通信をする親のモジュールにリンクさせて使用する > 計算モジュールを動的に切換えたいために、動的にロードされるライブラリ > (zzz.so)にしたいと考えます。 > 計算モジュールのライブラリ(zzz.so)の内部ではSCoreの関数もMPIも使ってい > ません。 > > SCoreの環境と矛盾するでしょうか? (link できれば) 動くとは思いますが, 以下のことに注意してください. 1. SCore が提供する compile driver (mpicc など) の default は static link になっています. dynamic link を使用するときは, -nostatic オプションをつけてください. 2. dynamic link されている実行ファイルは checkpoint/restart ができません. 3. SCore プログラムは compute host で実行されるため, zzz.so は compute host にも install されている必要があります. 実行ファイル自体は scrun/mpirun がコピーするので, compute host から 見えている必要は無いのですが... from Kameyama Toyohisa From k-egg @ gmx.de Tue Feb 4 23:45:17 2003 From: k-egg @ gmx.de (Andreas Vitz) Date: Tue, 4 Feb 2003 15:45:17 +0100 Subject: [SCore-users-jp] [SCore-users] pooma Message-ID: <200302041437.h14EbUS09209@pccluster.org> Hello everybody, Did anybody ever tried out pooma ( www.codesourcery.com ) with score ?? Is it possible at all ?? Yours, Andreas Vitz _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From M.Newiger @ deltacomputer.de Fri Feb 7 04:08:07 2003 From: M.Newiger @ deltacomputer.de (Martin Newiger) Date: Thu, 6 Feb 2003 20:08:07 +0100 Subject: [SCore-users-jp] [SCore-users] Changing IPs after SCore-Installation Message-ID: Hi, is it possible to change the IPs of the master and the nodes after a successful SCore-Installation. I need to change them from routeable (194.x.x.x) into non-routeable (192.168.x.x) Addresses. If it is possible what files must I edit exactly? >Regards >Martin Newiger > _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Fri Feb 7 09:49:09 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 07 Feb 2003 09:49:09 +0900 Subject: [SCore-users-jp] Re: [SCore-users] Changing IPs after SCore-Installation In-Reply-To: Your message of "Thu, 06 Feb 2003 20:08:07 JST." Message-ID: <20030207004909.1A37820050@neal.il.is.s.u-tokyo.ac.jp> In article Martin Newiger wrotes: > is it possible to change the IPs of the master and the nodes after a > successful SCore-Installation. I need to change them from routeable > (194.x.x.x) into non-routeable (192.168.x.x) Addresses. If it is > possible what files must I edit exactly? SCore configuration files is not include IP address. (Except PM/UDP and PM/Agent/UDP, but this is obsolated.) So if you want to change IP address only, you do't need change SCoe configuration files. (Probably you must change /etc/sysconfig/network and /etc/sysconfig/network-scripts/ifcfg-* to change IP address.) If you want to change hostname, you must change follwing SCore configuration files on server host: /opt/score/etc/scorehosts.db /opt/score/etc/scorehosts.defects /opt/score/etc/pm-ethernet.conf If you use PM/Ethernet /opt/score/etc/pm-myrinet.conf If you use PM/Myrinet (If you use more PM network (for example, PM/ethernet trunking), you must change more files.) /etc/profile.d/score.* To change SCBDSERV environment variable And you must change /etc/hosts.equiv and /root/.rhosts on server host and all compute hosts. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From linux @ jcsn.co.jp Fri Feb 7 15:59:02 2003 From: linux @ jcsn.co.jp (linux) Date: Fri, 07 Feb 2003 15:59:02 +0900 Subject: [SCore-users-jp] SCBD In-Reply-To: <20030127105356.160F.SUGAWARA@mlab.jks.ynu.ac.jp> References: <20030127013533.38A3220050@neal.il.is.s.u-tokyo.ac.jp> <20030127105356.160F.SUGAWARA@mlab.jks.ynu.ac.jp> Message-ID: <20030207154917.AEF9.LINUX@jcsn.co.jp> はじめましてJCSの柴田と申します。 > 念のためもう一度Serverのインストール作業をしてみましたが、 > やはりうまく動きません。 > また、pm-ethernet.confの内容も1行目の「unit 0」のみになってしまい、 > 前回書き込んだものが消えてしまいました。 私も先日来クラスタを組んでいるのですが、一度もうまく行きません。 違う場所で組んでいるので細かいところを忘れていますが、 nd...の場所のファイルがうまく生成されてきません。 最後に追加した機械を忘れているようでサーバー機を計算用クラスタとして 組み込んでいるのですが必要なファイルが出来上がらないのでmesg系が立ち 上がらず時間がかかってしまうようです。 マシンはインテル製 2U Xeon 2.8GHz x2 メモリ 512MB SCSI HDD 18G Onboard Gigabit RedHat 7.3 + SCore5.2 です。 以前違うシステムですが5.0.1を組んでいたときには問題なかったのですが このシステムで組始めたらだめになってしまったようです。 ほかのシステムがないのでどれが悪いというのがわかりませんが、何かわか りましたらまたmailします。 -- shiabta @ jcsn.co.jp -- -- linux @ jcsn.co.jp -- From emile.carcamo @ nec.fr Mon Feb 10 23:17:15 2003 From: emile.carcamo @ nec.fr (Emile CARCAMO) Date: Mon, 10 Feb 2003 15:17:15 +0100 Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 Message-ID: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> Dear List Members, As far as I know, a new version of Score is gonna be released by end of this month. I was just wondering if PC-Cluster Con- sortium was going to provide us with a set of MPI libraries compiled with Intel ICC/IFC release 7.0 compilers ?? And what about Intel 7.0 compilers support ?? Do we need to recompile SCore or not ? Many thanks for your help and support. Regards, -- Emile_CARCAMO NEC High Performance http://www.hpce.nec.com System Engineer Computing Europe mailto:ecarcamo @ hpce.nec.com (+33)6-8063-7003 GSM (+33)1-3930-6601 FAX / Your mouse has moved. Windows NT must be restarted \ (+33)1-3930-6613 PHONE \ for the change to take effect. Reboot now? [ OK ] / _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From bogdan.costescu @ iwr.uni-heidelberg.de Mon Feb 10 23:51:49 2003 From: bogdan.costescu @ iwr.uni-heidelberg.de (Bogdan Costescu) Date: Mon, 10 Feb 2003 15:51:49 +0100 (CET) Subject: [SCore-users-jp] Re: [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 In-Reply-To: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> Message-ID: On Mon, 10 Feb 2003, Emile CARCAMO wrote: > Do we need to recompile SCore or not ? Well, I don't care about recompilation, provided that it actually works without the large amount of fixes/patches that were needed for the present version... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu @ IWR.Uni-Heidelberg.De _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From M.Newiger @ deltacomputer.de Tue Feb 11 00:16:50 2003 From: M.Newiger @ deltacomputer.de (Martin Newiger) Date: Mon, 10 Feb 2003 16:16:50 +0100 Subject: [SCore-users-jp] [SCore-users] No start Message-ID: Hi, when I try to start SCore with scout -g pcc it doesn't start. Any ideas how I can solve this? Regards M.Newiger _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From M.Newiger @ deltacomputer.de Tue Feb 11 01:36:03 2003 From: M.Newiger @ deltacomputer.de (Martin Newiger) Date: Mon, 10 Feb 2003 17:36:03 +0100 Subject: [SCore-users-jp] [SCore-users] (no subject) Message-ID: Hi, when I try to start SCore with scout -g pcc it doesn't start. After a while a message saying permission denied appears. Any ideas how I can solve this? Regards M.Newiger _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From s-sumi @ bd6.so-net.ne.jp Tue Feb 11 09:50:16 2003 From: s-sumi @ bd6.so-net.ne.jp (Shinji Sumimoto) Date: Tue, 11 Feb 2003 09:50:16 +0900 (JST) Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 In-Reply-To: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> References: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> Message-ID: <20030211.095016.730552515.s-sumi@bd6.so-net.ne.jp> Hi. Next SCore version 5.4 will also provide MPICH/SCore liblrary object of intel compiler version 7 in addition to intel compiler version 6. Shinji. From: Emile CARCAMO Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 Date: Mon, 10 Feb 2003 15:17:15 +0100 Message-ID: <200302101417.h1AEHFmq009995 @ emilepc.ess.nec.fr> emile.carcamo> emile.carcamo> Dear List Members, emile.carcamo> emile.carcamo> As far as I know, a new version of Score is gonna be released emile.carcamo> by end of this month. I was just wondering if PC-Cluster Con- emile.carcamo> sortium was going to provide us with a set of MPI libraries emile.carcamo> compiled with Intel ICC/IFC release 7.0 compilers ?? And what emile.carcamo> about Intel 7.0 compilers support ?? Do we need to recompile emile.carcamo> SCore or not ? Many thanks for your help and support. Regards, emile.carcamo> emile.carcamo> -- emile.carcamo> Emile_CARCAMO NEC High Performance http://www.hpce.nec.com emile.carcamo> System Engineer Computing Europe mailto:ecarcamo @ hpce.nec.com emile.carcamo> (+33)6-8063-7003 GSM emile.carcamo> (+33)1-3930-6601 FAX / Your mouse has moved. Windows NT must be restarted \ emile.carcamo> (+33)1-3930-6613 PHONE \ for the change to take effect. Reboot now? [ OK ] / emile.carcamo> emile.carcamo> emile.carcamo> emile.carcamo> _______________________________________________ emile.carcamo> SCore-users mailing list emile.carcamo> SCore-users @ pccluster.org emile.carcamo> http://www.pccluster.org/mailman/listinfo/score-users emile.carcamo> _______________________________________________ emile.carcamo> SCore-users-jp mailing list emile.carcamo> SCore-users-jp @ pccluster.org emile.carcamo> http://www.pccluster.org/mailman/listinfo/score-users-jp emile.carcamo> ----- Shinji Sumimoto E-Mail: s-sumi @ bd6.so-net.ne.jp _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From s-sumi @ bd6.so-net.ne.jp Tue Feb 11 09:53:25 2003 From: s-sumi @ bd6.so-net.ne.jp (Shinji Sumimoto) Date: Tue, 11 Feb 2003 09:53:25 +0900 (JST) Subject: [SCore-users-jp] [SCore-users] No start In-Reply-To: References: Message-ID: <20030211.095325.846938538.s-sumi@bd6.so-net.ne.jp> Hi. How did you install SCore, EIT or RPM? Or which hosts are you trying to run scout? Shinji. From: Martin Newiger Subject: [SCore-users-jp] [SCore-users] No start Date: Mon, 10 Feb 2003 16:16:50 +0100 Message-ID: M.Newiger> Hi, M.Newiger> M.Newiger> when I try to start SCore with scout -g pcc it doesn't start. Any ideas M.Newiger> how I can solve this? M.Newiger> M.Newiger> Regards M.Newiger> M.Newiger M.Newiger> _______________________________________________ M.Newiger> SCore-users mailing list M.Newiger> SCore-users @ pccluster.org M.Newiger> http://www.pccluster.org/mailman/listinfo/score-users M.Newiger> _______________________________________________ M.Newiger> SCore-users-jp mailing list M.Newiger> SCore-users-jp @ pccluster.org M.Newiger> http://www.pccluster.org/mailman/listinfo/score-users-jp M.Newiger> ----- Shinji Sumimoto E-Mail: s-sumi @ bd6.so-net.ne.jp _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From francois.courteille @ nec.fr Tue Feb 11 18:41:47 2003 From: francois.courteille @ nec.fr (Francois Courteille) Date: Tue, 11 Feb 2003 10:41:47 +0100 Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 References: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> <20030211.095016.730552515.s-sumi@bd6.so-net.ne.jp> Message-ID: <004201c2d1b1$cfce5e20$03e874c1@VERSAFC> Dear Shinji san, thank you for the information. I have 3other questions : 1/ Do Score 5.4 support Itanium2 (IA-64) and Linux IA-64 ? How far Score has been tested on IA-64 plate-form ? 2/Is Quadrics supported by Score ? What about Infiniband ? 3/I would like to have your advice about the following: we were trying to read hdw performance counters. For that we tried to install the kernel patch perfctr-2.4.5.tar.gz which is required to use lperfex but we got an error message when trying to generate a new kernel. (See appendix below - apparently their is an irq usage conflict with score) Have you an idea how to fix this problem ? Thank you for your help, With best regards, Francois Courteille APPENDIX make -C perfctr make[2]: Entering directory `/work/linux-2.4.18score/drivers/perfctr' make all_targets make[3]: Entering directory `/work/linux-2.4.18score/drivers/perfctr' gcc -D__KERNEL__ -I/work/linux-2.4.18score/include -Wall -Wstrict-prototype s -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pi pe -mpreferred-stack-boundary=2 -march=i686 -DKBUILD_BASENAME=x86_setup -DEXPORT_SYMTAB -c x86_setup.c x86_setup.c: In function `set_cpus_allowed': x86_setup.c:32: warning: implicit declaration of function `BUG_ON' gcc -D__KERNEL__ -I/work/linux-2.4.18score/include -Wall -Wstrict-prototype s -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pi pe -mpreferred-stack-boundary=2 -march=i686 -DKBUILD_BASENAME=init -c -o init.o init.c In file included from /work/linux-2.4.18score/include/asm/perfctr.h:147, from /work/linux-2.4.18score/include/linux/perfctr.h:9, from init.c:12: /work/linux-2.4.18score/include/asm/hw_irq.h:228: parse error before `irq_desc' /work/linux-2.4.18score/include/asm/hw_irq.h:228: warning: type defaults to `int' in declaration of `irq_desc' /work/linux-2.4.18score/include/asm/hw_irq.h:228: warning: data definition has no type or storage class make[3]: *** [init.o] Error 1 make[3]: Leaving directory `/work/linux-2.4.18score/drivers/perfctr' make[2]: *** [first_rule] Error 2 make[2]: Leaving directory `/work/linux-2.4.18score/drivers/perfctr' make[1]: *** [_subdir_perfctr] Error 2 make[1]: Leaving directory `/work/linux-2.4.18score/drivers' make: *** [_dir_drivers] Error 2 _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From s-sumi @ bd6.so-net.ne.jp Tue Feb 11 20:46:16 2003 From: s-sumi @ bd6.so-net.ne.jp (Shinji Sumimoto) Date: Tue, 11 Feb 2003 20:46:16 +0900 (JST) Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 In-Reply-To: <004201c2d1b1$cfce5e20$03e874c1@VERSAFC> References: <200302101417.h1AEHFmq009995@emilepc.ess.nec.fr> <20030211.095016.730552515.s-sumi@bd6.so-net.ne.jp> <004201c2d1b1$cfce5e20$03e874c1@VERSAFC> Message-ID: <20030211.204616.607958486.s-sumi@bd6.so-net.ne.jp> Hi. From: "Francois Courteille" Subject: Re: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 Date: Tue, 11 Feb 2003 10:41:47 +0100 Message-ID: <004201c2d1b1$cfce5e20$03e874c1 @ VERSAFC> francois.courteille> I have 3other questions : francois.courteille> francois.courteille> 1/ Do Score 5.4 support Itanium2 (IA-64) and Linux IA-64 ? francois.courteille> How far Score has been tested on IA-64 plate-form ? Yes, SCore 5.4 supports Itanium2, however, we have only few nodes for testing SCore on IA-64, so the test is limited. We hope SCore users try to use SCore 5.4 on Itanium2 and give us feedback about bugs. francois.courteille> 2/Is Quadrics supported by Score ? francois.courteille> What about Infiniband ? We have no schedule of Quadrics support. Infiniband is scheduled, however we do not when it will be included now. francois.courteille> 3/I would like to have your advice about the following: francois.courteille> we were trying to read hdw performance counters. francois.courteille> For that we tried to install the kernel patch perfctr-2.4.5.tar.gz which francois.courteille> is required to use lperfex but we got an error message when trying to francois.courteille> generate a new kernel. francois.courteille> (See appendix below - apparently their is an irq usage conflict with score) francois.courteille> francois.courteille> Have you an idea how to fix this problem ? Is the patch is for original linux kernel, or some distibution? SCore 5.2 kernel is besed on 2.4.18, however some function such as ACPI is added. If the patch is for original linux kernel, we can provide minimum patch for 2.4.18 kernel. Please give him some advices > Kameyama-san. Shinji. francois.courteille> Thank you for your help, francois.courteille> francois.courteille> With best regards, francois.courteille> francois.courteille> Francois Courteille francois.courteille> francois.courteille> APPENDIX francois.courteille> francois.courteille> make -C perfctr francois.courteille> make[2]: Entering directory `/work/linux-2.4.18score/drivers/perfctr' francois.courteille> make all_targets francois.courteille> make[3]: Entering directory `/work/linux-2.4.18score/drivers/perfctr' francois.courteille> gcc -D__KERNEL__ -I/work/linux-2.4.18score/include -Wall -Wstrict-prototype francois.courteille> s francois.courteille> -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pi francois.courteille> pe francois.courteille> -mpreferred-stack-boundary=2 -march=i686 -DKBUILD_BASENAME=x86_setup francois.courteille> -DEXPORT_SYMTAB -c x86_setup.c francois.courteille> x86_setup.c: In function `set_cpus_allowed': francois.courteille> x86_setup.c:32: warning: implicit declaration of function `BUG_ON' francois.courteille> gcc -D__KERNEL__ -I/work/linux-2.4.18score/include -Wall -Wstrict-prototype francois.courteille> s francois.courteille> -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pi francois.courteille> pe francois.courteille> -mpreferred-stack-boundary=2 -march=i686 -DKBUILD_BASENAME=init -c -o francois.courteille> init.o init.c francois.courteille> In file included from /work/linux-2.4.18score/include/asm/perfctr.h:147, francois.courteille> from /work/linux-2.4.18score/include/linux/perfctr.h:9, francois.courteille> from init.c:12: francois.courteille> /work/linux-2.4.18score/include/asm/hw_irq.h:228: parse error before francois.courteille> `irq_desc' francois.courteille> /work/linux-2.4.18score/include/asm/hw_irq.h:228: warning: type defaults to francois.courteille> `int' in declaration of `irq_desc' francois.courteille> /work/linux-2.4.18score/include/asm/hw_irq.h:228: warning: data definition francois.courteille> has francois.courteille> no type or storage class francois.courteille> make[3]: *** [init.o] Error 1 francois.courteille> make[3]: Leaving directory `/work/linux-2.4.18score/drivers/perfctr' francois.courteille> make[2]: *** [first_rule] Error 2 francois.courteille> make[2]: Leaving directory `/work/linux-2.4.18score/drivers/perfctr' francois.courteille> make[1]: *** [_subdir_perfctr] Error 2 francois.courteille> make[1]: Leaving directory `/work/linux-2.4.18score/drivers' francois.courteille> make: *** [_dir_drivers] Error 2 francois.courteille> ----- Shinji Sumimoto E-Mail: s-sumi @ bd6.so-net.ne.jp _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Wed Feb 12 09:24:59 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Wed, 12 Feb 2003 09:24:59 +0900 Subject: [SCore-users-jp] [SCore-users] Intel 7.0 compiler support with mpich-1.2.4 In-Reply-To: Your message of "Tue, 11 Feb 2003 20:46:16 JST." <20030211.204616.607958486.s-sumi@bd6.so-net.ne.jp> Message-ID: <20030212002459.C3DA120050@neal.il.is.s.u-tokyo.ac.jp> In article <20030211.204616.607958486.s-sumi @ bd6.so-net.ne.jp> Shinji Sumimoto wrotes: > francois.courteille> 3/I would like to have your advice about the following: > francois.courteille> we were trying to read hdw performance counters. > francois.courteille> For that we tried to install the kernel patch perfctr-2 > .4.5.tar.gz which > francois.courteille> is required to use lperfex but we got an error message w > hen trying to > francois.courteille> generate a new kernel. > francois.courteille> (See appendix below - apparently their is an irq usage c > onflict with score) > francois.courteille> > francois.courteille> Have you an idea how to fix this problem ? > > Is the patch is for original linux kernel, or some distibution? SCore > 5.2 kernel is besed on 2.4.18, however some function such as ACPI is > added. SCore 5.2 kernel is based on 2.4.18 + IA64 patch. (ACPI is added by IA 64 patch) So some files are changed for original kernel by IA64 patch, some patch for IA32 original kernel is failed. If you want to use perfctr on SCore 5.4, please apply original 2.4.18 kernel and linux2.4.18_minimal.patch (score.rpm directory in SCore 5.2 CD-ROM). On SCore 5.4, we separate IA32 kernel and IA64 kernel. The kernel patch will be provide two files, for IA32 and for IA64. IA32 kernel based on original 2.4.19. IA64 kernel based on original 2.4.19 + IA64 patch. IA64 kernel patch include IA64 patch. So you will be install perfctr patch on SCore 5.4 (IA32) kernel. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Wed Feb 12 09:31:57 2003 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Wed, 12 Feb 2003 09:31:57 +0900 Subject: [SCore-users-jp] Re: [SCore-users] (no subject) In-Reply-To: References: Message-ID: <3127887117.hori0000@swimmy-soft.com> Hi, >when I try to start SCore with scout -g pcc it doesn't start. After a >while a message saying permission denied appears. Any ideas how I can >solve this? Try rsh command from the server host to one of the cluster host. I guess you will have the same phenomenon. Check /etc/hosts, /etc/hosts.equiv and /root/.rhosts files. ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Wed Feb 12 09:49:49 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Wed, 12 Feb 2003 09:49:49 +0900 Subject: [SCore-users-jp] [SCore-users] (no subject) In-Reply-To: Your message of "Mon, 10 Feb 2003 17:36:03 JST." Message-ID: <20030212004949.BCBA520050@neal.il.is.s.u-tokyo.ac.jp> In article Martin Newiger wrotes: > Hi, > > when I try to start SCore with scout -g pcc it doesn't start. After a > while a message saying permission denied appears. Any ideas how I can > solve this? Please check /etc/hosts.equiv (and /root/.rhosts if you execute to root) on compute hosts. If you have 4 compute hosts (comp0, comp2, comp3), and you try to execute scout on server, scout access to following order: server -> comp0 -> comp1 -> comp2 -> comp3 This means compX must be access to comp(X-1). If you want to scout to comp0 and comp2, comp2 must be access to comp0... At last all compute hosts must access to server and all other compute hosts. So /etc/hosts.equiv on all compute hosts must be include all compute hosts and server host (and if you want to execute other hosts, please add the hosts). from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From m-kawaguchi @ pst.fujitsu.com Thu Feb 13 14:45:14 2003 From: m-kawaguchi @ pst.fujitsu.com (川口貢) Date: Thu, 13 Feb 2003 14:45:14 +0900 Subject: [SCore-users-jp] scrunコマンド Message-ID: <20030213144514E.m-kawaguchi@pst.fujitsu.com> 富士通プライムソフトテクノロジの川口と申します。 いつもお世話になっております。 scrunコマンドで以下のようなことを行いたいのですが、 可能かどうか教えて頂けますでしょうか? - グループ名 "PC1"に4台のノードが登録されている環境で 以下の通りジョブ実行させる。 % scrun -group=PC1,nodes=2x1 ./a.out 上記状態で、ジョブ実行されていない2ノード上で別ジョブを実行させたい。 但し、この時の2ノードの割り当てを動的に行いたい。 % scrun ?????,nodes=2x1 ./a2.out 静的には、ジョブ実行されていないノードを自分で特定し scrun実行時にオプション指定(ex. -group=PC1~node1~node2)すれば 実現できると思います。 ですが、空きノードがどれかということを意識せずに ジョブ実行を行いたいのですが可能でしょうか? # sc_qsubコマンドは不可とさせて下さい。 以上、宜しくお願い致します。 --- 川口 mail => m-kawaguchi @ pst.fujitsu.com From kameyama @ pccluster.org Thu Feb 13 18:22:28 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 13 Feb 2003 18:22:28 +0900 Subject: [SCore-users-jp] scrunコマンド In-Reply-To: Your message of "Thu, 13 Feb 2003 14:45:14 JST." <20030213144514E.m-kawaguchi@pst.fujitsu.com> Message-ID: <20030213092228.9760B2004F@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <20030213144514E.m-kawaguchi @ pst.fujitsu.com> 川口貢 wrotes: > - グループ名 "PC1"に4台のノードが登録されている環境で > 以下の通りジョブ実行させる。 > > % scrun -group=PC1,nodes=2x1 ./a.out > > 上記状態で、ジョブ実行されていない2ノード上で別ジョブを実行させたい。 > 但し、この時の2ノードの割り当てを動的に行いたい。 > > % scrun ?????,nodes=2x1 ./a2.out > > 静的には、ジョブ実行されていないノードを自分で特定し > scrun実行時にオプション指定(ex. -group=PC1~node1~node2)すれば > 実現できると思います。 最初の scrun で PC1 group 全部が占有されてしまうので, 最初の scrun が group 指定である限り実現できないと思いますが... > ですが、空きノードがどれかということを意識せずに > ジョブ実行を行いたいのですが可能でしょうか? > # sc_qsubコマンドは不可とさせて下さい。 multi user mode を使用すればできますが, single user mode では できないと思います. from Kameyama Toyohisa From hori @ swimmy-soft.com Thu Feb 13 18:46:56 2003 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Thu, 13 Feb 2003 18:46:56 +0900 Subject: [SCore-users-jp] scrunコマンド In-Reply-To: <20030213092228.9760B2004F@neal.il.is.s.u-tokyo.ac.jp> References: <20030213144514E.m-kawaguchi@pst.fujitsu.com> Message-ID: <3128006816.hori0002@swimmy-soft.com> 堀です. >> 静的には、ジョブ実行されていないノードを自分で特定し >> scrun実行時にオプション指定(ex. -group=PC1~node1~node2)すれば >> 実現できると思います。 > >最初の scrun で PC1 group 全部が占有されてしまうので, >最初の scrun が group 指定である限り実現できないと思いますが... ちょっと時間がないので簡単に. SCore 5.4 で新規導入された msgblock コマンドをうまく使うと可能ではあり ますが,今考えるとちょっと問題があるようですね.ちょっとしたハックが必 要です. ---- 堀 敦史(ほり あつし) スイミー・ソフトウェア株式会社 From m-kawaguchi @ pst.fujitsu.com Thu Feb 13 21:31:54 2003 From: m-kawaguchi @ pst.fujitsu.com (川口貢) Date: Thu, 13 Feb 2003 21:31:54 +0900 Subject: [SCore-users-jp] scrunコマンド In-Reply-To: <20030213092228.9760B2004F@neal.il.is.s.u-tokyo.ac.jp> References: <20030213144514E.m-kawaguchi@pst.fujitsu.com> <20030213092228.9760B2004F@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <20030213213154T.m-kawaguchi@pst.fujitsu.com> 亀山殿、堀殿 川口@富士通プライムソフトテクノロジです。 回答ありがとうございました。 From: kameyama @ pccluster.org Subject: Re: [SCore-users-jp] scrunコマンド Date: Thu, 13 Feb 2003 18:22:28 +0900 Message-ID: <20030213092228.9760B2004F @ neal.il.is.s.u-tokyo.ac.jp> > 亀山です. > > In article <20030213144514E.m-kawaguchi @ pst.fujitsu.com> 川口貢 wrotes: > > - グループ名 "PC1"に4台のノードが登録されている環境で > > 以下の通りジョブ実行させる。 > > > > % scrun -group=PC1,nodes=2x1 ./a.out > > > > 上記状態で、ジョブ実行されていない2ノード上で別ジョブを実行させたい。 > > 但し、この時の2ノードの割り当てを動的に行いたい。 > > > > % scrun ?????,nodes=2x1 ./a2.out > > > > 静的には、ジョブ実行されていないノードを自分で特定し > > scrun実行時にオプション指定(ex. -group=PC1~node1~node2)すれば > > 実現できると思います。 > > 最初の scrun で PC1 group 全部が占有されてしまうので, > 最初の scrun が group 指定である限り実現できないと思いますが... ちょっと間違えました。すみません。 先のscrun実行時にグループ内のノード数を指定し、 次のscrun実行時に上記以外のノードを指定すれば 静的には実現できると思います。 % scrun -group=PC1:2 ./a.out この後、ジョブ実行されているノードを特定し、次のジョブを実行。 % scrun -group=PC1~node1~node2 ./a.out でも、やはり動的にノードを割り当てることはできなさそうですね。 ありがとうございました。 > > ですが、空きノードがどれかということを意識せずに > > ジョブ実行を行いたいのですが可能でしょうか? > > # sc_qsubコマンドは不可とさせて下さい。 > > multi user mode を使用すればできますが, single user mode では > できないと思います. > > from Kameyama Toyohisa > _______________________________________________ > SCore-users-jp mailing list > SCore-users-jp @ pccluster.org > http://www.pccluster.org/mailman/listinfo/score-users-jp --- 川口 mail => m-kawaguchi @ pst.fujitsu.com From s-sumi @ flab.fujitsu.co.jp Sat Feb 15 23:20:18 2003 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Sat, 15 Feb 2003 23:20:18 +0900 (JST) Subject: [SCore-users-jp] 第2 回PCクラスタシンポジウム開催のお知らせ In-Reply-To: <20030203.095315.884014234.ishikawa@is.s.u-tokyo.ac.jp> References: <20030203.095315.884014234.ishikawa@is.s.u-tokyo.ac.jp> Message-ID: <20030215.232018.730558638.s-sumi@flab.fujitsu.co.jp> SCoreユーザの皆様 今回のPCクラスタコンソーシアムの企業展示ではItanium 2クラスタやOpteron マシンの展示など興味深い展示が目白押しです。企業展示は21日だけでなく、 20日もレジストレーション無しに参加可能ですので、興味のある方の参加をお 待ちしております。 http://www.pccluster.org/event/symp/2002/exhibits.html From: Yutaka Ishikawa Subject: [SCore-users-jp] 第2 回PCクラスタシンポジウム開催のお知らせ Date: Mon, 03 Feb 2003 09:53:15 +0900 (JST) Message-ID: <20030203.095315.884014234.ishikawa @ is.s.u-tokyo.ac.jp> ishikawa> SCore Usersの皆様、 ishikawa> ishikawa> シンポジウム事前申し込みの締め切りが一週間後となりました。 ishikawa> 参加を予定されている方で、まだ登録されていない方は、今すぐWEB登録のほど ishikawa> お願い致します。 ishikawa> シンポジウムでは、 ishikawa> SCoreの最新情報(SCore 5.4の配布) ishikawa> 企業によるPCクラスタの最新の取り組み ishikawa> などが無料で聴講できます。 ishikawa> ishikawa> 石川@PCクラスタコンソーシアム会長 ishikawa> ================================= ishikawa> ishikawa>        第2回PCクラスタシンポジウムのご案内 ishikawa> ishikawa> ishikawa> 構造解析、流体力学、分子動力学、ゲノム情報処理など、大規模計算を ishikawa> 必要とする計算科学・計算化学分野では、研究機関を中心にPCクラスタ ishikawa> が利用されてきました。最近では、PCクラスタは産業界にも浸透し、例え ishikawa> ば、自動車産業界では、車体設計、衝突シミュレーションなどにPCクラス ishikawa> タが利用されています。 ishikawa> PCクラスタコンソーシアム主催の第2回PCクラスタシンポジウムでは、 ishikawa> 会員企業による産業界におけるPCクラスタ導入事例の講演や企業展示 ishikawa> など、PCクラスタに関する最新の動向をお伝えします。皆様の御参加を ishikawa> お待ちしております。 ishikawa> ishikawa> ■開催日時 2003年2月20日(木)〜2003年2月21日(金) ishikawa> 10:00〜17:00 ishikawa> ishikawa> ■開催場所 日本科学未来館7階 ishikawa> ishikawa> 2003年2月20日(木):ワークショップ・・・みらいCANホール ishikawa> 2003年2月21日(金):シンポジウム・・・・みらいCANホール ishikawa> 併設企業展示:イノベーションホール ishikawa> なお、2月20日(木)のワークショップの参加対象は会員のみとなります。 ishikawa> ishikawa> ■参加費 無料 ishikawa> ishikawa> ■申込み方法 http://www.pccluster.org/  より ishikawa> お申込み下さい。 ishikawa> ishikawa> ■定員 300名 ishikawa> ishikawa> ■主催 PCクラスタコンソーシアム ishikawa> ishikawa> TEL:03-3263-6474, FAX:03-3263-7537 ishikawa> e-mail:sec @ pccluster.org ishikawa> U R L :http://www.pccluster.org/ ishikawa> ishikawa> ================================= ishikawa> _______________________________________________ ishikawa> SCore-users-jp mailing list ishikawa> SCore-users-jp @ pccluster.org ishikawa> http://www.pccluster.org/mailman/listinfo/score-users-jp ishikawa> ------ Shinji Sumimoto, Fujitsu Labs From salzmann @ mpch-mainz.mpg.de Wed Feb 19 02:55:53 2003 From: salzmann @ mpch-mainz.mpg.de (Marc Salzmann) Date: Tue, 18 Feb 2003 18:55:53 +0100 (CET) Subject: [SCore-users-jp] [SCore-users] multiuser mode Message-ID: Hi everybody, When I was using SCore-D (5.2.0) in multiuser mode I could run a 'hello world' test program if I logged in as root but not if I logged in as another user. The error message was: FEP:ERROR Unable to open secure port. The problem also occured after having started scored with the -nosecure option. So I decided to re-compile scrun with #define SCORE_NOT_SECURE and this solved the problem. Is there any better (and maybe saver?) way to solve the problem than this? I'm a newbee and would be grateful for any advice. best regards, Marc _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ Marc Salzmann Max Planck Institute for Chemistry tel: +49/(0)6131/305311 Department of Atmospheric Chemistry / NWG fax: +49/(0)6131/305577 Postfach 3060 55020 Mainz, Germany e-mail: salzmann @ mpch-mainz.mpg.de _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Wed Feb 19 09:14:31 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Wed, 19 Feb 2003 09:14:31 +0900 Subject: [SCore-users-jp] Re: [SCore-users] multiuser mode In-Reply-To: Your message of "Tue, 18 Feb 2003 18:55:53 JST." Message-ID: <20030219001431.9245720050@neal.il.is.s.u-tokyo.ac.jp> In article Marc Salzmann wrotes: > When I was using SCore-D (5.2.0) in multiuser mode I could > run a 'hello world' test program if I logged in as root but not > if I logged in as another user. > The error message was: FEP:ERROR Unable to open secure port. Please check permition of scrun. % ls -l /opt/score/bin/bin.*/scrun.exe -rwsr-xr-x 1 root root 1194841 Oct 24 11:04 /opt/score/bin/bin.i386-redhat7-linux2_4/scrun.exe scrun.exe must be owned by root, and must set set-uid bit. Note that scored must be run by root. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From vegeta @ nerv-center.de Thu Feb 20 19:36:48 2003 From: vegeta @ nerv-center.de (Henryk Feider) Date: 20 Feb 2003 11:36:48 +0100 Subject: [SCore-users-jp] [SCore-users] Root password after EIT installation on nodes Message-ID: <1045737414.1135.2.camel@belldandy> Hi, I installed a SCore cluster with EIT 2.0. But I can not lock into the nodes because I do not know the root password. What password do EIT set or where I can find this Information? Greetings Henryk Feider -- Fight for love and justice! Public key ID: 8065655F example command: gpg --recv-keys --keyserver www.keyserver.net 8065655F -------------- next part -------------- テキスト形式以外の添付ファイルを保管しました... ファイル名: signature.asc 型: application/pgp-signature サイズ: 189 バイト 説明: This is a digitally signed message part URL: From vegeta @ nerv-center.de Thu Feb 20 21:17:19 2003 From: vegeta @ nerv-center.de (Henryk Feider) Date: 20 Feb 2003 13:17:19 +0100 Subject: [SCore-users-jp] Re: [SCore-users] Root password after EIT installation on nodes In-Reply-To: <200302201042.h1KAgKs06088@emilepc.ess.nec.fr> References: <200302201042.h1KAgKs06088@emilepc.ess.nec.fr> Message-ID: <1045743440.1135.6.camel@belldandy> I already tried this, but it does not work. Problem with EIT? Greetings Henryk Feider On Thu, 2003-02-20 at 11:42, Emile CARCAMO wrote: > > > vegeta @ nerv-center.de said: > > I installed a SCore cluster with EIT 2.0. But I can not lock into the nodes > > because I do not know the root password. What password do EIT set or where I > > can find this Information? > > > Hello Henryk, > > The password is the same that the one you have for > root on the master machine (where you've run EIT!) > HTH, and best regards. > > -- > Emile_CARCAMO NEC High Performance http://www.hpce.nec.com > System Engineer Computing Europe mailto:ecarcamo @ hpce.nec.com > (+33)6-8063-7003 GSM > (+33)1-3930-6601 FAX / Your mouse has moved. Windows NT must be restarted \ > (+33)1-3930-6613 PHONE \ for the change to take effect. Reboot now? [ OK ] / > > > > -- Fight for love and justice! Public key ID: 8065655F example command: gpg --recv-keys --keyserver www.keyserver.net 8065655F -------------- next part -------------- テキスト形式以外の添付ファイルを保管しました... ファイル名: signature.asc 型: application/pgp-signature サイズ: 189 バイト 説明: This is a digitally signed message part URL: From nrcb @ streamline-computing.com Thu Feb 20 21:32:24 2003 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Thu, 20 Feb 2003 12:32:24 +0000 Subject: [SCore-users-jp] Re: [SCore-users] Root password after EIT installation on nodes In-Reply-To: <1045743440.1135.6.camel@belldandy> References: <200302201042.h1KAgKs06088@emilepc.ess.nec.fr> <1045743440.1135.6.camel@belldandy> Message-ID: <200302201232.h1KCWOY05889@zeralda.streamline.com> On Thursday 20 February 2003 12:17 pm, you wrote: > I already tried this, but it does not work. Problem with EIT? > Score is set up so you can rsh into the compute nodes as an ordinary user but not as root. However you can still rsh a command or rcp a file as root. To enable rsh logins as root please rcp a new /etc/pam.d/rlogin to each compute node. I use this pam.d/rlogin (ie auth sufficient /lib/security/pam_rhosts_auth.so is first line) #%PAM-1.0 # For root login to succeed here with pam_securetty, "rlogin" must be # listed in /etc/securetty. auth sufficient /lib/security/pam_rhosts_auth.so auth required /lib/security/pam_nologin.so auth required /lib/security/pam_securetty.so auth required /lib/security/pam_env.so auth required /lib/security/pam_stack.so service=system-auth account required /lib/security/pam_stack.so service=system-auth password required /lib/security/pam_stack.so service=system-auth session required /lib/security/pam_stack.so service=system-auth eg rcp rlogin comp00:/etc/pamd.d/rlogin With this you will then be able to rsh into comp00 etc. Nick _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Thu Feb 20 23:52:23 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 20 Feb 2003 23:52:23 +0900 Subject: [SCore-users-jp] Re: [SCore-users] Root password after EIT installation on nodes In-Reply-To: Your message of "Thu, 20 Feb 2003 12:32:24 JST." <200302201232.h1KCWOY05889@zeralda.streamline.com> Message-ID: <20030220145223.9652620050@neal.il.is.s.u-tokyo.ac.jp> In article <200302201232.h1KCWOY05889 @ zeralda.streamline.com> Nick Birkett wrotes: > On Thursday 20 February 2003 12:17 pm, you wrote: > > I already tried this, but it does not work. Problem with EIT? EIT get password from /etc/shadow or /etc/passwd on server host. If your root password is not here, probably root password does not set. > Score is set up so you can rsh into the compute nodes as an ordinary user but > > not as root. > > However you can still rsh a command or rcp a file as root. If you can rsh and rcp as root, you can scout command as root, too. > To enable rsh logins as root please rcp a new /etc/pam.d/rlogin to each > compute node. I use this pam.d/rlogin > > (ie auth sufficient /lib/security/pam_rhosts_auth.so is first line) > > > #%PAM-1.0 > # For root login to succeed here with pam_securetty, "rlogin" must be > # listed in /etc/securetty. In this comment, you don7t need this file. If you want to rlogin as user, please issue follwing command: # scout -g pcc # scout "echo rlogin >> /etc/securetty" # exit from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From vegeta @ nerv-center.de Fri Feb 21 01:11:04 2003 From: vegeta @ nerv-center.de (Henryk Feider) Date: 20 Feb 2003 17:11:04 +0100 Subject: [SCore-users-jp] Re: [SCore-users] Root password after EIT installation on nodes In-Reply-To: <20030220145223.9652620050@neal.il.is.s.u-tokyo.ac.jp> References: <20030220145223.9652620050@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <1045757464.1135.20.camel@belldandy> Hi, The password is set in /etc/passwd on the nodes (I boot it with 'init /bin/sh' to find out). But even using the keyboard on the node, I can not log into it. The 'ssh' fails too. Thanks & Greetings Henryk Feider On Thu, 2003-02-20 at 15:52, kameyama @ pccluster.org wrote: > In article <200302201232.h1KCWOY05889 @ zeralda.streamline.com> Nick Birkett wrotes: > > On Thursday 20 February 2003 12:17 pm, you wrote: > > > I already tried this, but it does not work. Problem with EIT? > > EIT get password from /etc/shadow or /etc/passwd on server host. > If your root password is not here, probably root password does not set. > > > Score is set up so you can rsh into the compute nodes as an ordinary user but > > > > not as root. > > > > However you can still rsh a command or rcp a file as root. > > If you can rsh and rcp as root, > you can scout command as root, too. > > > To enable rsh logins as root please rcp a new /etc/pam.d/rlogin to each > > compute node. I use this pam.d/rlogin > > > > (ie auth sufficient /lib/security/pam_rhosts_auth.so is first line) > > > > > > #%PAM-1.0 > > # For root login to succeed here with pam_securetty, "rlogin" must be > > # listed in /etc/securetty. > > In this comment, you don7t need this file. > If you want to rlogin as user, please issue follwing command: > # scout -g pcc > # scout "echo rlogin >> /etc/securetty" > # exit > from Kameyama Toyohisa > _______________________________________________ > SCore-users mailing list > SCore-users @ pccluster.org > http://www.pccluster.org/mailman/listinfo/score-users -- Fight for love and justice! Public key ID: 8065655F example command: gpg --recv-keys --keyserver www.keyserver.net 8065655F -------------- next part -------------- テキスト形式以外の添付ファイルを保管しました... ファイル名: signature.asc 型: application/pgp-signature サイズ: 189 バイト 説明: This is a digitally signed message part URL: From jure.jerman @ rzs-hm.si Fri Feb 21 15:00:12 2003 From: jure.jerman @ rzs-hm.si (Jure Jerman) Date: Fri, 21 Feb 2003 07:00:12 +0100 Subject: [SCore-users-jp] [SCore-users] problems with jobs locking cluster Message-ID: <3E55C06C.1040306@rzs-hm.si> Dear all, sometimes I have annoying problem with score: when for instance the deadlock is detected, score tries to attach gdb debugger to the process, but in my case this fails due to the DISPLAY problems (Can not open DISPLAY - even if I set the DISPLAY variable explicitly and I am able to run other X applications it does not work). The problem is that such a process lock whole machine: it just hangs there and no other no jobs are running, no jobs can be submitted and even no jobs can be aborted or killed. The only solution is to restart sc_watch. Does anyone has an idea how to: 1. get the gdb working with score 2. prevent score of attaching debugger to the process when not desired (cron submitted job for example) 3. generaly handle the situation where some jobs can not be killed via sc_console. I am using Score 5.2.0. Many thanks in advance, Jure Jerman _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Fri Feb 21 20:14:09 2003 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Fri, 21 Feb 2003 20:14:09 +0900 Subject: [SCore-users-jp] Re: [SCore-users] problems with jobs locking cluster In-Reply-To: <3E55C06C.1040306@rzs-hm.si> References: <3E55C06C.1040306@rzs-hm.si> Message-ID: <3128703249.hori0001@swimmy-soft.com> Hi. >Does anyone has an idea how to: > >1. get the gdb working with score I suspect that your X server is not accesible from cluster compute hosts. you must do "xhost +" to allow access from any hosts. >2. prevent score of attaching debugger to the process when not desired > (cron submitted job for example) SCore tries to attach debugger only with "debug" optiom. Further, jobs scheduled in bacth scheduling mode, attaching debugger feature is disabled. >3. generaly handle the situation where some jobs can not be killed via > sc_console. This could be a SCore bug. Could you tell me the situation in more deitail ? ---- Atsushi HORI Swimmy Software, Inc. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From okamoto @ gsport.co.jp Fri Feb 21 23:15:22 2003 From: okamoto @ gsport.co.jp (Okamoto Masafumi) Date: Fri, 21 Feb 2003 23:15:22 +0900 Subject: [SCore-users-jp] mpi++における共有ライブラリのリンクについて Message-ID: <005101c2d9b3$abeeea10$6e0010ac@tommy> お世話になっております、ジースポートの岡本です。 SCore5.0.1をredhat7.2上で使用しております。 開発しているプログラムでXMLを使用しているため、 Xerces(http://xml.apache.org/xerces-c/)を利用しています。 Xercesを共有ライブラリとして使っているのですが ここでMakefileのg++をmpic++に置き換えてビルドすると 以下のエラーが起きてしま います。 /usr/bin/ld; cannot find -lxerces-c collect2: ld returned 1 exit status g++でmakeした場合には問題なくビルドできます。 Makefileの中身は g++ → mpic++ gcc → mpicc を変更しただけです。 リンクしている部分は以下の様です。 /opt/score/bin/mpic++ -L. -o hoge1.o hoge2.o hoge3.o -L../xerces-c-src2_1_0/lib -lxerces-c 何かアドバイスをいただけると幸いです。 From kameyama @ pccluster.org Mon Feb 24 09:28:49 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Mon, 24 Feb 2003 09:28:49 +0900 Subject: [SCore-users-jp] Re: [SCore-users] Root password after EIT installation on nodes In-Reply-To: Your message of "20 Feb 2003 17:11:04 JST." <1045757464.1135.20.camel@belldandy> Message-ID: <20030224002849.8059C20054@neal.il.is.s.u-tokyo.ac.jp> In article <1045757464.1135.20.camel @ belldandy> Henryk Feider wrotes: > The password is set in /etc/passwd on the nodes (I boot it with 'init > /bin/sh' to find out). But even using the keyboard on the node, I can > not log into it. The 'ssh' fails too. Please check /etc/passwd and /etc/shadow on the nodes. If /etc/passwd is following: root:x:0:0:root:/root:/bin/bash (password field is x), root password is stored in /etc/shadow. I think password field is same as server host. If this is diffirent, please change to password using passwd command (boot with "init /bin/sh"). Can you use "su" to normal user? % su Password: # If you can become root using su, please check /etc/securetty. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Mon Feb 24 09:33:49 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Mon, 24 Feb 2003 09:33:49 +0900 Subject: [SCore-users-jp] mpi++における共有ライブラリのリンクについて In-Reply-To: Your message of "Fri, 21 Feb 2003 23:15:22 JST." <005101c2d9b3$abeeea10$6e0010ac@tommy> Message-ID: <20030224003349.7B18C20054@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <005101c2d9b3$abeeea10$6e0010ac @ tommy> "Okamoto Masafumi" wrotes: > お世話になっております、ジースポートの岡本です。 > > SCore5.0.1をredhat7.2上で使用しております。 > > 開発しているプログラムでXMLを使用しているため、 > Xerces(http://xml.apache.org/xerces-c/)を利用しています。 > > Xercesを共有ライブラリとして使っているのですが > ここでMakefileのg++をmpic++に置き換えてビルドすると 以下のエラーが起きてしま > います。 > /usr/bin/ld; cannot find -lxerces-c > collect2: ld returned 1 exit status > > g++でmakeした場合には問題なくビルドできます。 SCore では checkpoint が static link された binary を前提としているため, default で static link するようにしています. 共有 library を使用する場合は -nostatic オプションをつけてください. > /opt/score/bin/mpic++ -L. -o hoge1.o hoge2.o > hoge3.o -L../xerces-c-src2_1_0/lib -lxerces-c /opt/score/bin/mpic++ -nostatic -L. -o hoge1.o hoge2.o hoge3.o -L../xerces-c-src2_1_0/lib -lxerces-c のようになると思います. from Kameyama Toyohisa From johannes.werhahn @ imk.fzk.de Tue Feb 25 00:37:15 2003 From: johannes.werhahn @ imk.fzk.de (Johannes Werhahn) Date: Mon, 24 Feb 2003 16:37:15 +0100 Subject: [SCore-users-jp] [SCore-users] Adding new hosts with eit Message-ID: <3E5A3C2B.9060900@imk.fzk.de> Hi there, we got a ready configured system with 10 hosts (1 server, 9 compute hosts) from our distributer and now want to add 2 other compute hosts using eit. Is this the correct procedure: after pressing the Load botton I go through the Network Configuration etc. leaving them unchanged until Host information # of Hosts 12 instead of 10 Name Prefix star (unchanged) digit 0 Figur 1 (unchanged) If I then press Add, I get Duplicate host name star1.cluster.domain So where should I inluce star11.cluster.domain to get it ready for the automatic installation procedure? And is it possible to configure two groups, one with Ethernet and 1 with Myrinet2000 (our new hosts have Myrinet-cards, the others not)? Thanks for any hint and best regards Johannes Werhahn -------------------------------------------------------- Institut für Meteorologie und Klimaforschung Bereich Atmosphärische Umweltforschung Forschungszentrum Karlsruhe GmbH Kreuzeckbahnstr. 19 D-82467 Garmisch-Partenkirchen Germany Phone +49-8821-183-244 Fax +49-8821-183-243 Email johannes.werhahn @ imk.fzk.de WWW http://www.fzk.de -------------------------------------------------------- _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From kameyama @ pccluster.org Tue Feb 25 09:15:55 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Tue, 25 Feb 2003 09:15:55 +0900 Subject: [SCore-users-jp] Re: [SCore-users] Adding new hosts with eit In-Reply-To: Your message of "Mon, 24 Feb 2003 16:37:15 JST." <3E5A3C2B.9060900@imk.fzk.de> Message-ID: <20030225001555.E367820054@neal.il.is.s.u-tokyo.ac.jp> In article <3E5A3C2B.9060900 @ imk.fzk.de> Johannes Werhahn wrotes: > after pressing the Load botton I go through the Network Configuration > etc. leaving them unchanged until > Host information > # of Hosts 12 instead of 10 > Name Prefix star (unchanged) > digit 0 Figur 1 (unchanged) > If I then press Add, I get > Duplicate host name star1.cluster.domain > So where should I inluce star11.cluster.domain to get it ready for the > automatic installation procedure? Please specify ONLY new hostname. Probably, # of hostname is 2, Name prefix is unchanged, start is 10. > And is it possible to configure two groups, one with Ethernet and 1 with > Myrinet2000 (our new hosts have Myrinet-cards, the others not)? It is possible. You can add new group and modify group on group list (next window). But if you use SCore 5.0.1 or older, EIT generate wrong pm-myrinet.conf. Please check /opt/score/etc/pm-myrinet.conf. from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From okamoto @ gsport.co.jp Tue Feb 25 11:16:09 2003 From: okamoto @ gsport.co.jp (Okamoto Masafumi) Date: Tue, 25 Feb 2003 11:16:09 +0900 Subject: [SCore-users-jp] mpi++における共有ライブラリのリンクについて In-Reply-To: <20030224003349.7B18C20054@neal.il.is.s.u-tokyo.ac.jp> References: <005101c2d9b3$abeeea10$6e0010ac@tommy> <20030224003349.7B18C20054@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <20030225111444.5FC6.OKAMOTO@gsport.co.jp> 岡本です、-nostaticオプションをつけたところ 問題なくビルドできました。 どうも有難うございました。 kameyama> 亀山です. kameyama> kameyama> In article <005101c2d9b3$abeeea10$6e0010ac @ tommy> "Okamoto Masafumi" wrotes: kameyama> > お世話になっております、ジースポートの岡本です。 kameyama> > kameyama> > SCore5.0.1をredhat7.2上で使用しております。 kameyama> > kameyama> > 開発しているプログラムでXMLを使用しているため、 kameyama> > Xerces(http://xml.apache.org/xerces-c/)を利用しています。 kameyama> > kameyama> > Xercesを共有ライブラリとして使っているのですが kameyama> > ここでMakefileのg++をmpic++に置き換えてビルドすると 以下のエラーが起きてしま kameyama> > います。 kameyama> > /usr/bin/ld; cannot find -lxerces-c kameyama> > collect2: ld returned 1 exit status kameyama> > kameyama> > g++でmakeした場合には問題なくビルドできます。 kameyama> kameyama> SCore では checkpoint が static link された binary を前提としているため, kameyama> default で static link するようにしています. kameyama> 共有 library を使用する場合は -nostatic オプションをつけてください. kameyama> kameyama> > /opt/score/bin/mpic++ -L. -o hoge1.o hoge2.o kameyama> > hoge3.o -L../xerces-c-src2_1_0/lib -lxerces-c kameyama> kameyama> /opt/score/bin/mpic++ -nostatic -L. -o hoge1.o hoge2.o hoge3.o -L../xerces-c-src2_1_0/lib -lxerces-c kameyama> kameyama> のようになると思います. kameyama> kameyama> from Kameyama Toyohisa kameyama> _______________________________________________ kameyama> SCore-users-jp mailing list kameyama> SCore-users-jp @ pccluster.org kameyama> http://www.pccluster.org/mailman/listinfo/score-users-jp From nrcb @ streamline-computing.com Wed Feb 26 20:49:05 2003 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Wed, 26 Feb 2003 11:49:05 +0000 Subject: [SCore-users-jp] [SCore-users] score 5.0.1 large memory jobs Message-ID: <200302261149.h1QBn5L04090@zeralda.streamline.com> Hi I got a report from one of our users runnning very big Pallas benchmark tests (128 cpus). #---------------------------------------------------------------- # Benchmarking Alltoall # ( #processes = 64 ) # ( 64 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 784.39 784.86 784.79 1 1000 792.01 792.18 792.10 2 1000 785.42 785.78 785.68 4 1000 796.99 797.23 797.14 8 1000 800.98 801.11 801.06 16 1000 778.84 779.19 779.11 32 1000 787.78 788.14 788.03 64 1000 821.54 821.79 821.66 128 1000 881.18 881.38 881.30 256 1000 952.46 952.64 952.56 512 1000 1158.49 1159.00 1158.88 1024 1000 1640.78 1644.24 1641.00 2048 1000 3454.18 3454.95 3454.62 4096 1000 6882.82 6884.97 6883.97 8192 1000 16088.81 16094.80 16091.81 16384 1000 33715.59 33732.60 33727.56 32768 1000 65014.80 65027.62 65023.50 65536 640 129590.04 129636.99 129623.44 131072 320 263434.38 263628.56 263587.57 262144 160 531708.42 532274.39 532124.75 524288 80 1069253.25 1071251.60 1070571.90 1048576 40 2173875.02 2187574.55 2184477.23 2097152 20 4228944.70 4270372.05 4258162.98 4194304 10 8398147.40 8512784.40 8478838.18 <8> SCore-D:PANIC Network freezing timed out !! _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From hori @ swimmy-soft.com Wed Feb 26 23:13:27 2003 From: hori @ swimmy-soft.com (Atsushi HORI) Date: Wed, 26 Feb 2003 23:13:27 +0900 Subject: [SCore-users-jp] Re: [SCore-users] score 5.0.1 large memory jobs In-Reply-To: <200302261149.h1QBn5L04090@zeralda.streamline.com> References: <200302261149.h1QBn5L04090@zeralda.streamline.com> Message-ID: <3129146007.hori0002@swimmy-soft.com> Hi. >Hi I got a report from one of our users runnning very big >Pallas benchmark tests (128 cpus). This looks like the timeout value is too small. BTW, what network are you (is he or she) using ? Mriynet or Ethernet (I bet on ethernet) ? The next question is how big is the maximum messages ? >#---------------------------------------------------------------- ># Benchmarking Alltoall ># ( #processes = 64 ) ># ( 64 additional processes waiting in MPI_Barrier) >#---------------------------------------------------------------- > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > 0 1000 784.39 784.86 784.79 > 1 1000 792.01 792.18 792.10 > 2 1000 785.42 785.78 785.68 > 4 1000 796.99 797.23 797.14 > 8 1000 800.98 801.11 801.06 > 16 1000 778.84 779.19 779.11 > 32 1000 787.78 788.14 788.03 > 64 1000 821.54 821.79 821.66 > 128 1000 881.18 881.38 881.30 > 256 1000 952.46 952.64 952.56 > 512 1000 1158.49 1159.00 1158.88 > 1024 1000 1640.78 1644.24 1641.00 > 2048 1000 3454.18 3454.95 3454.62 > 4096 1000 6882.82 6884.97 6883.97 > 8192 1000 16088.81 16094.80 16091.81 > 16384 1000 33715.59 33732.60 33727.56 > 32768 1000 65014.80 65027.62 65023.50 > 65536 640 129590.04 129636.99 129623.44 > 131072 320 263434.38 263628.56 263587.57 > 262144 160 531708.42 532274.39 532124.75 > 524288 80 1069253.25 1071251.60 1070571.90 > 1048576 40 2173875.02 2187574.55 2184477.23 > 2097152 20 4228944.70 4270372.05 4258162.98 > 4194304 10 8398147.40 8512784.40 8478838.18 ><8> SCore-D:PANIC Network freezing timed out !! >_______________________________________________ >SCore-users mailing list >SCore-users @ pccluster.org >http://www.pccluster.org/mailman/listinfo/score-users _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From suga @ sse.co.jp Thu Feb 27 15:50:38 2003 From: suga @ sse.co.jp (Sugano, Mitsukuni) Date: Thu, 27 Feb 2003 15:50:38 +0900 Subject: [SCore-users-jp] SCore 5.4.0 - Itanium2 Message-ID: <3E5DB53E.4A85E7AC@sse.co.jp> SSE 菅野と申します。 お世話になります。 現在、Itanium2上で、SCore5.4をソースから構築しているのですが、 以下のような不具合および疑問点があります。 1.SMP 実行時に、SHMEMデバイスが無いので、smpの数を1にリセットする 旨のWarningが表示される。現在、2x2環境で実行していますが、 nodesが2であればWarnigだけで実行課のですが、4の場合、SMP が使えないので、当然のように実行できません。 なお、scorehosts.dbのSHMEM設定は、確認済みです。 また、5.4のBugFix情報で、 −−− Bug Fixes SCore does not work more than 128 hosts. The mpich 1.2.4 logviewer command does not have an execution right. The mpich 1.2.4 Fortran 90 library does not exist. PM/shmem does not work in IA64. −−− と、いう記述があり、「PM/shmem does not work in IA64.」の不具合 は解消されているようなのですが、通常の5.4.0のソースではなく異なる ソースが存在するのでしょうか? 2.mandel mandelのデモが、以下のエラーで実行できません。 <0:0> SCORE: 2 nodes (2x1) ready. Worker die, display exit 3.オプションコンパイラ siteファイルを変更しただけではだめなようなので、 mpi等をSCoreソースから再コンパイルする必要があると思うのですが、 その際に、複数のコンパイラを指定可能なのですよね? また、Cは、gccのみ。 Fortranは、g77とifc(v6)とifc(v7) といったような混在環境で且つ、ifcはあるがiccはないと いった場合でも構築可能なのでしょうか? その手順を含めご教示いただけませんでしょうか? From kameyama @ pccluster.org Thu Feb 27 16:54:46 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 27 Feb 2003 16:54:46 +0900 Subject: [SCore-users-jp] SCore 5.4.0 - Itanium2 In-Reply-To: Your message of "Thu, 27 Feb 2003 15:50:38 JST." <3E5DB53E.4A85E7AC@sse.co.jp> Message-ID: <20030227075446.4C90220054@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <3E5DB53E.4A85E7AC @ sse.co.jp> "Sugano, Mitsukuni" wrotes: > 実行時に、SHMEMデバイスが無いので、smpの数を1にリセットする > 旨のWarningが表示される。現在、2x2環境で実行していますが、 > nodesが2であればWarnigだけで実行課のですが、4の場合、SMP > が使えないので、当然のように実行できません。 compute host で dmesg を行い, pmshmem: version = $Id: pm_shmem.c,v ... pmshmem_init: register pm_shmem as major(124) のような message がでているかどうか確認してください. また, /dev/pmshmem/0 などの file があるかどうかを確認してください. > また、5.4のBugFix情報で、 > −−− > Bug Fixes > SCore does not work more than 128 hosts. > The mpich 1.2.4 logviewer command does not have an execution right. > The mpich 1.2.4 Fortran 90 library does not exist. > PM/shmem does not work in IA64. > −−− > と、いう記述があり、「PM/shmem does not work in IA64.」の不具合 > は解消されているようなのですが、通常の5.4.0のソースではなく異なる > ソースが存在するのでしょうか? SCore の IA64 のサポートは SCore 5.2.0 から入っています. これは 5.2.0 から 5.4.0 で fix された bug で, しかも PM/shmem 自体は認識されていてエラーが出るというものです. > 2.mandel > mandelのデモが、以下のエラーで実行できません。 > <0:0> SCORE: 2 nodes (2x1) ready. > Worker die, display exit 再現しました. 調査します. > 3.オプションコンパイラ > > siteファイルを変更しただけではだめなようなので、 > mpi等をSCoreソースから再コンパイルする必要があると思うのですが、 > その際に、複数のコンパイラを指定可能なのですよね? > また、Cは、gccのみ。 Fortranは、g77とifc(v6)とifc(v7) > といったような混在環境で且つ、ifcはあるがiccはないと > いった場合でも構築可能なのでしょうか? > その手順を含めご教示いただけませんでしょうか? ifc ということは IA32 の場合でしょうか? (IA64 の Intel compiler は efc になります.) IA32 の場合は SCore 5.2.0 より Intel compiler version 6 の binary が 入っています. SCore 5.4.0 では EIT のときに Intel compiler version 6, version 7 ともに 入っていますので, site ファイルを変更しただけで動くと思います. この binary は icc, ifc ともにある場合を想定しているため, うまく link できないかもしれません. (済みません, 試していないのでわかりません.) その場合は MPI を再コンパイルする必要があると思います. その場合, 以下の手順になります. (IA64 の場合も同様の手順でインストールできます.) 1. /opt/score/etc/compiler/site ファイルを修正します. 2. mpi のソースプログラムを /opt/score のしたに展開します. # cd /opt/score # tar vxzf /mnt/cdrom/score.ssource/score-5.4.0.mpi.tar.gz 3. MPI プログラムをコンパイル, インストールします. # cd /opt/score/score-src/runtime/mpi # smake # smake install 4. これでエラーがなければ使用できると思います. from Kameyama Toyohisa From kameyama @ pccluster.org Thu Feb 27 18:19:39 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Thu, 27 Feb 2003 18:19:39 +0900 Subject: [SCore-users-jp] SCore 5.4.0 - Itanium2 In-Reply-To: Your message of "Thu, 27 Feb 2003 16:54:46 JST." <20030227075446.4C90220054@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <20030227091939.BA5BE20054@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <20030227075446.4C90220054 @ neal.il.is.s.u-tokyo.ac.jp> kameyama @ pccluster.org wrotes: > > 2.mandel > > mandelのデモが、以下のエラーで実行できません。 > > <0:0> SCORE: 2 nodes (2x1) ready. > > Worker die, display exit > > 再現しました. > 調査します. IA64 だと, SCore 環境で fork() が無視されるようになっていました. とりあえず, score-src/program/demo/mandel/Makefile の LDFLAGS に -nockpt を追加して checkpoint 無しで compile してください. (それでもまだ表示が変ですけど...) from Kameyama Toyohisa From suga @ sse.co.jp Thu Feb 27 19:13:21 2003 From: suga @ sse.co.jp (Sugano, Mitsukuni) Date: Thu, 27 Feb 2003 19:13:21 +0900 Subject: [SCore-users-jp] SCore 5.4.0 - Itanium2 References: <20030227091939.BA5BE20054@neal.il.is.s.u-tokyo.ac.jp> Message-ID: <3E5DE4C1.C1274784@sse.co.jp> 亀山様: 菅野です。 お世話になります。 おかげさまで、先ほどお伺いした問題すべてが解決いたしました。 今後とも、よろしくお願いいたします。 kameyama @ pccluster.org wrote: > > 亀山です. > > In article <20030227075446.4C90220054 @ neal.il.is.s.u-tokyo.ac.jp> kameyama @ pccluster.org wrotes: > > > 2.mandel > > > mandelのデモが、以下のエラーで実行できません。 > > > <0:0> SCORE: 2 nodes (2x1) ready. > > > Worker die, display exit > > > > 再現しました. > > 調査します. > > IA64 だと, SCore 環境で fork() が無視されるようになっていました. > とりあえず, > score-src/program/demo/mandel/Makefile > の LDFLAGS に > -nockpt > を追加して checkpoint 無しで compile してください. > (それでもまだ表示が変ですけど...) > > from Kameyama Toyohisa > _______________________________________________ > SCore-users-jp mailing list > SCore-users-jp @ pccluster.org > http://www.pccluster.org/mailman/listinfo/score-users-jp From nrcb @ streamline-computing.com Thu Feb 27 19:29:11 2003 From: nrcb @ streamline-computing.com (Nick Birkett) Date: Thu, 27 Feb 2003 10:29:11 +0000 Subject: [SCore-users-jp] Re: [SCore-users] score 5.0.1 large memory jobs Message-ID: <200302271029.h1RATB801837@zeralda.streamline.com> Sorry, the message I sent was truncated and therefore confusing. Here it is again: Score 5.0.1, Myrinet 2000 system. ---------message from user ------------------------------- Following the addition of swap on all Snowdon compute nodes, I reran the PALLAS benchmark tests (on 64 nodes running 2 processes per node). The following output was recorded towards the end of the run: #---------------------------------------------------------------- # Benchmarking Alltoall # ( #processes = 64 ) # ( 64 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 784.39 784.86 784.79 1 1000 792.01 792.18 792.10 2 1000 785.42 785.78 785.68 4 1000 796.99 797.23 797.14 8 1000 800.98 801.11 801.06 16 1000 778.84 779.19 779.11 32 1000 787.78 788.14 788.03 64 1000 821.54 821.79 821.66 128 1000 881.18 881.38 881.30 256 1000 952.46 952.64 952.56 512 1000 1158.49 1159.00 1158.88 1024 1000 1640.78 1644.24 1641.00 2048 1000 3454.18 3454.95 3454.62 4096 1000 6882.82 6884.97 6883.97 8192 1000 16088.81 16094.80 16091.81 16384 1000 33715.59 33732.60 33727.56 32768 1000 65014.80 65027.62 65023.50 65536 640 129590.04 129636.99 129623.44 131072 320 263434.38 263628.56 263587.57 262144 160 531708.42 532274.39 532124.75 524288 80 1069253.25 1071251.60 1070571.90 1048576 40 2173875.02 2187574.55 2184477.23 2097152 20 4228944.70 4270372.05 4258162.98 4194304 10 8398147.40 8512784.40 8478838.18 <8> SCore-D:PANIC Network freezing timed out !! And the .e file states: <0:0> SCORE: 128 nodes (64x2) ready. <56:1> SCORE:WARNING MPICH/SCore: pmGetSendBuffer(pmc=0x8541db8, dest=37, len=8256) failed, errno=22 <56:1> SCORE:PANIC MPICH/SCore: critical error on message transfer <56:1> Trying to attach GDB (DISPLAY=snowdon.leeds.ac.uk:18.0): PANIC SCOUT: Session done. It looks like now the memory allocation is working fine, but the benchmark is unable to undertake the next test in the benchmark. The next test is an all-to-all zero length message to 128 processors (on 64 nodes). Extrapolating the results, this should take about 1.6 ms. It appears as if the communications grind to a halt when we try to communicate between 128 processes (and over) when running 2 processes per node. _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From chen @ mdl.ipc.pku.edu.cn Fri Feb 28 01:40:42 2003 From: chen @ mdl.ipc.pku.edu.cn (Chen Hao) Date: Fri, 28 Feb 2003 00:40:42 +0800 Subject: [SCore-users-jp] [SCore-users] A very strange problem Message-ID: <008201c2de7e$f9b94770$9101a8c0@billgates> Dear All: I have a very strange problem when I install GROMACS to my cluster(score 5.4.0). First the configure program reports that mpicc cannot handle assembly files, so I disable assembly loops and then configure program produces the Makefile file successfully. And then I execute make, after a few minutes I get the execute files. But when I try to run it, system reports that "No such file or directory". what's the matter? BTW, the execute file can be opened in vi and I run it with the correct path. -------------- next part -------------- HTMLの添付ファイルを保管しました... URL: From s-sumi @ flab.fujitsu.co.jp Fri Feb 28 12:59:18 2003 From: s-sumi @ flab.fujitsu.co.jp (Shinji Sumimoto) Date: Fri, 28 Feb 2003 12:59:18 +0900 (JST) Subject: [SCore-users-jp] Re: [SCore-users] score 5.0.1 large memory jobs In-Reply-To: <200302271029.h1RATB801837@zeralda.streamline.com> References: <200302271029.h1RATB801837@zeralda.streamline.com> Message-ID: <20030228.125918.294709392.s-sumi@flab.fujitsu.co.jp> Hi Nick. Could you run the benchmark with PM_DEBUG=1? If there are PM/Myrinet problems, some messages are output. Ex: sh export PM_DEBUG=1 Shinji. From: Nick Birkett Subject: Re: [SCore-users] score 5.0.1 large memory jobs Date: Thu, 27 Feb 2003 10:29:11 +0000 Message-ID: <200302271029.h1RATB801837 @ zeralda.streamline.com> nrcb> Sorry, the message I sent was truncated and therefore confusing. nrcb> nrcb> Here it is again: nrcb> nrcb> Score 5.0.1, Myrinet 2000 system. nrcb> nrcb> ---------message from user ------------------------------- nrcb> nrcb> nrcb> Following the addition of swap on all Snowdon compute nodes, I reran the nrcb> PALLAS benchmark tests (on 64 nodes running 2 processes per node). The nrcb> following output was recorded towards the end of the run: nrcb> nrcb> #---------------------------------------------------------------- nrcb> # Benchmarking Alltoall nrcb> # ( #processes = 64 ) nrcb> # ( 64 additional processes waiting in MPI_Barrier) nrcb> #---------------------------------------------------------------- nrcb> #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] nrcb> 0 1000 784.39 784.86 784.79 nrcb> 1 1000 792.01 792.18 792.10 nrcb> 2 1000 785.42 785.78 785.68 nrcb> 4 1000 796.99 797.23 797.14 nrcb> 8 1000 800.98 801.11 801.06 nrcb> 16 1000 778.84 779.19 779.11 nrcb> 32 1000 787.78 788.14 788.03 nrcb> 64 1000 821.54 821.79 821.66 nrcb> 128 1000 881.18 881.38 881.30 nrcb> 256 1000 952.46 952.64 952.56 nrcb> 512 1000 1158.49 1159.00 1158.88 nrcb> 1024 1000 1640.78 1644.24 1641.00 nrcb> 2048 1000 3454.18 3454.95 3454.62 nrcb> 4096 1000 6882.82 6884.97 6883.97 nrcb> 8192 1000 16088.81 16094.80 16091.81 nrcb> 16384 1000 33715.59 33732.60 33727.56 nrcb> 32768 1000 65014.80 65027.62 65023.50 nrcb> 65536 640 129590.04 129636.99 129623.44 nrcb> 131072 320 263434.38 263628.56 263587.57 nrcb> 262144 160 531708.42 532274.39 532124.75 nrcb> 524288 80 1069253.25 1071251.60 1070571.90 nrcb> 1048576 40 2173875.02 2187574.55 2184477.23 nrcb> 2097152 20 4228944.70 4270372.05 4258162.98 nrcb> 4194304 10 8398147.40 8512784.40 8478838.18 nrcb> <8> SCore-D:PANIC Network freezing timed out !! nrcb> nrcb> And the .e file states: nrcb> nrcb> <0:0> SCORE: 128 nodes (64x2) ready. nrcb> <56:1> SCORE:WARNING MPICH/SCore: pmGetSendBuffer(pmc=0x8541db8, dest=37, nrcb> len=8256) failed, errno=22 nrcb> <56:1> SCORE:PANIC MPICH/SCore: critical error on message transfer nrcb> <56:1> Trying to attach GDB (DISPLAY=snowdon.leeds.ac.uk:18.0): PANIC nrcb> SCOUT: Session done. nrcb> nrcb> It looks like now the memory allocation is working fine, but the benchmark nrcb> is unable to undertake the next test in the benchmark. nrcb> nrcb> The next test is an all-to-all zero length message to 128 processors (on 64 nrcb> nodes). Extrapolating the results, this should take about 1.6 ms. nrcb> nrcb> It appears as if the communications grind to a halt when we try to nrcb> communicate between 128 processes (and over) when running 2 processes per nrcb> node. nrcb> _______________________________________________ nrcb> SCore-users mailing list nrcb> SCore-users @ pccluster.org nrcb> http://www.pccluster.org/mailman/listinfo/score-users nrcb> nrcb> ------ Shinji Sumimoto, Fujitsu Labs _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users From tyokoi @ jodco.co.jp Fri Feb 28 14:57:40 2003 From: tyokoi @ jodco.co.jp (Takeshi Yokoi) Date: Fri, 28 Feb 2003 14:57:40 +0900 Subject: [SCore-users-jp] MPICH-Score version problem Message-ID: <3E5EFA54.A4A32194@jodco.co.jp> みなさん、こんにちは    現在SCoreの導入を検討しております。 MPICH-SCoreについて、  社内既存のSimulation Softのにカスタマイズされた  MPICHがはいっているため、SCoreのMPICHでは  なく既存のMPICHに見に行かせることは可能でしょうか。  (それとも単純に上書きすれば良いとか) 初歩的な質問ですみません。                              T.Yokoi     From kameyama @ pccluster.org Fri Feb 28 16:40:06 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 28 Feb 2003 16:40:06 +0900 Subject: [SCore-users-jp] MPICH-Score version problem In-Reply-To: Your message of "Fri, 28 Feb 2003 14:57:40 JST." <3E5EFA54.A4A32194@jodco.co.jp> Message-ID: <20030228074006.16C8820055@neal.il.is.s.u-tokyo.ac.jp> 亀山です. In article <3E5EFA54.A4A32194 @ jodco.co.jp> Takeshi Yokoi wrotes: >  社内既存のSimulation Softのにカスタマイズされた >  MPICHがはいっているため、SCoreのMPICHでは >  なく既存のMPICHに見に行かせることは可能でしょうか。 >  (それとも単純に上書きすれば良いとか) やりたいことがはっきりしませんが... 1. MPICH/SCore とその他の MPI を使い分けたい場合 2. MPICH/SCore でカスタマイズされた機能を使いたい場合 が考えられ, どちらがやりたいのか分かりませんので, 両方書きます. 1. MPICH/SCore とその他の MPI を使い分けたい場合 SCore 自体は /opt/score に install されます. それと異なった場所に別の mpi が存在しているかどうかは何も調べません. (たとえば, Redhat 7.3 を full i0nstal すると LAM が install されています. それと独立に SCore が install されます.) RPM を使用して SCore を install すると, /etc/profile.d/score.* という ファイルが install され, このファイルの中で SCore 関連の PATH を いれています. つまり, 同一 host に SCore 以外の MPI を共存させることは可能で, user の PATH によってどちらの MPI を使用するかを選択することができます. たとえば, redhat 7.3 を full install した場合, user のサーチパスが /opt/score/bin のほうが先にあれば SCore の MPI が /usr/bin のほうが 先にあれば LAM の MPI が使用されることになります. (コマンドを full path で指定すれば, 明示的にどちらかを指定することが できます.) 注意しなければならないのは, コンパイルと実行環境を会わせる必要があることです. MPICH/SCore の mpicc などでコンパイルした実行ファイルは MPICH/SCore の mpirun で実行する必要がありますし, 他の MPI のコンパイルコマンドで 作成した実行ファイルはその MPI の mpirun で実行する必要があります. 2. MPICH/SCore でカスタマイズされた機能を使いたい場合 MPICH のライブラリにデバイス依存の部分も組み込まれてしまっているため, カスタマイズ部分がデバイスに依存していない部分であっても, binary のままでは使用できません. もし, そのカスタマイズされた部分のソースがあるのでしたら, その部分をマージすることができるかも知れません. 基本的には MPICH/SCore は MPICH のソースに SCore 依存部分をいれたものです. (独自の bug fix, 複数 compiler や RPM に対応するための修正が入っていますが...) そのため, MPICH/SCore のソースファイルを展開し, カスタマイズした部分を マージすれば使えると思います. from Kameyama Toyohisa From kameyama @ pccluster.org Fri Feb 28 19:24:40 2003 From: kameyama @ pccluster.org (=?iso-2022-jp?b?a2FtZXlhbWEgGyRCIXcbKEIgcGNjbHVzdGVyLm9yZw==?=) Date: Fri, 28 Feb 2003 19:24:40 +0900 Subject: [SCore-users-jp] Re: [SCore-users] A very strange problem In-Reply-To: Your message of "Fri, 28 Feb 2003 00:40:42 JST." <008201c2de7e$f9b94770$9101a8c0@billgates> Message-ID: <20030228102440.2577D20055@neal.il.is.s.u-tokyo.ac.jp> In article <008201c2de7e$f9b94770$9101a8c0 @ billgates> "Chen Hao" wrotes: > I have a very strange problem when I install GROMACS to > my cluster(score 5.4.0). > First the configure program reports that mpicc cannot handle assembly files, > so I disable assembly loops and then configure program produces the Makefile > file successfully. Currently, mpicc cannot handle assembly files, This probrem is cause mpich 1.2.4, too, but fixed on mpich 1.2.5. > And then I execute make, after a few minutes I get the execute files. > But when I try to run it, system reports that "No such file or directory". Please add configure options to: --disable-shared --without-motif-libraries If you want to run mdrun, please use scrun: % scrun mdrun -np ... from Kameyama Toyohisa _______________________________________________ SCore-users mailing list SCore-users @ pccluster.org http://www.pccluster.org/mailman/listinfo/score-users