[SCore-users-jp] SCoreを使用しないMPICH よりスコアが劣る問題.
Shinji Sumimoto
s-sumi @ flab.fujitsu.co.jp
2004年 1月 27日 (火) 15:38:51 JST
池辺さま
富士通研の住元です。
/opt/score/etc/pm-ethernet.confはどうなっていますでしょうか?
このファイルのパラメータを以下のようにして試してもらえないでしょうか?
=================================
maxnsend 24
backoff 2400
intreap 1
=================================
/opt/score/etc/pm-ethernet.confの説明は以下にあるので参考にしてください。
http://www.pccluster.org/score/dist/score/html/ja/man/man5/pm-ether-conf.html
From: 池辺 厚慈 <atuyosi @ comp.eng.himeji-tech.ac.jp>
Subject: [SCore-users-jp] SCoreを使用しないMPICH よりスコアが劣る問題.
Date: Tue, 27 Jan 2004 15:23:02 +0900
Message-ID: <426FFAEC-5091-11D8-903A-003065AD5970 @ comp.eng.himeji-tech.ac.jp>
atuyosi> 姫路工業大学,情報制御機構研究室の池辺と申します.
atuyosi> 前回2,3質問させて頂いた者です.その節はありがとうございました.
atuyosi> 今回,下記の質問についてお答え頂戴したくメールを致しました.
atuyosi> 何卒ご教授願います.
atuyosi>
atuyosi> ---ここから質問内容です.
atuyosi>
atuyosi> 下記環境にてMPICH-SCore環境においてベンチマークを
atuyosi> 実行したところ,同一のハードウェア上でのSCoreを利用しない
atuyosi> MPICHよりスコアが劣ってしまうのですが,設定に問題があるのでしょうか?
atuyosi>
atuyosi> 動作環境
atuyosi> CPU: AthlonXP 2200+
atuyosi> RAM: PC2700 512MB
atuyosi> HDD: SCore時のみ80GB
atuyosi> NIC: intel PRO/1000MT デスクトップアダプタ
atuyosi> HUB: corega GSW-8
atuyosi> OS: RedHat Linux 7.3
atuyosi> SCore version 5.6.1
atuyosi> MPICH version 1.2.5
atuyosi>
atuyosi> 上記構成を計算ノード16ノード+クラスタ管理ノード1ノード
atuyosi> の計17台で運用しています.
atuyosi> 計算ノードへのインストールにはEITを使用しました.
atuyosi>
atuyosi> 使用したベンチマーク: Poisson FEM-BMTおよび
atuyosi> 姫野ベンチXP mpi版 計算サイズM
atuyosi> コンパイラg77-2.96 コンパイルオプション: -O3
atuyosi>
atuyosi> 結果(SCore環境時)
atuyosi> Poisson FEM-BMT
atuyosi> SCore-D 5.6.1 connected.
atuyosi> <0:0> SCORE: 16 nodes (16x1) ready.
atuyosi> No. of DOFs : 2097152 (n = 128)
atuyosi> No. of PEs : 16
atuyosi>
atuyosi> Initialization ...
atuyosi> Start rehearsal measurement process.
atuyosi>
atuyosi> Number of iterations in CG 10
atuyosi> Loop executed for 1 times
atuyosi> Residual : 0.00053340235
atuyosi> Elapsed time : 3.72145009 sec.
atuyosi> NFLOPS = 914913280.
atuyosi> MFLOPS measured : 245.848595
atuyosi> -----------------------------------------
atuyosi>
atuyosi> Number of iterations in CG 10
atuyosi> Loop executed for 16 times
atuyosi> Residual : 0.00053340235
atuyosi> Elapsed time : 92.4863849 sec.
atuyosi> NFLOPS = 914913280.
atuyosi> MFLOPS measured : 158.278567
atuyosi> -----------------------------------------
atuyosi>
atuyosi> 姫野ベンチxp mpi版 計算サイズM
atuyosi> SCore-D 5.6.1 connected.
atuyosi> <0:0> SCORE: 16 nodes (16x1) ready.
atuyosi> Sequential version array size
atuyosi> mimax= 257 mjmax= 129 mkmax= 129
atuyosi> Parallel version array size
atuyosi> mimax= 131 mjmax= 67 mkmax= 35
atuyosi> imax= 129 jmax= 65 kmax= 33
atuyosi> I-decomp= 2 J-decomp= 2 K-decomp= 4
atuyosi>
atuyosi> Start rehearsal measurement process.
atuyosi> Measure the performance in 3 times.
atuyosi> MFLOPS: 3717.79994 time(s): 0.110634089 0.00169377867
atuyosi> Now, start the actual measurement process.
atuyosi> The loop will be excuted in 1626 times.
atuyosi> This will take about one minute.
atuyosi> Wait for a while.
atuyosi> Loop executed for 1626 times
atuyosi> Gosa : 0.000568608928
atuyosi> MFLOPS: 3408.83448 time(s): 65.3985848
atuyosi> Score based on Pentium III 600MHz : 41.1496201
atuyosi>
atuyosi> 結果(非SCore環境時)
atuyosi> Poisson FEM-BMT
atuyosi> No. of DOFs : 2097152 (n = 128)
atuyosi> No. of PEs : 16
atuyosi>
atuyosi> Initialization ...
atuyosi> Start rehearsal measurement process.
atuyosi>
atuyosi> Number of iterations in CG 10
atuyosi> Loop executed for 1 times
atuyosi> Residual : 0.000533402352
atuyosi> Elapsed time : 0.934157 sec.
atuyosi> NFLOPS = 914913280.
atuyosi> MFLOPS measured : 979.399906
atuyosi> -----------------------------------------
atuyosi>
atuyosi> Number of iterations in CG 10
atuyosi> Loop executed for 64 times
atuyosi> Residual : 0.000533402352
atuyosi> Elapsed time : 69.241711 sec.
atuyosi> NFLOPS = 914913280.
atuyosi> MFLOPS measured : 845.652843
atuyosi> -----------------------------------------
atuyosi>
atuyosi> 姫野ベンチxp mpi版 計算サイズM
atuyosi> Sequential version array size
atuyosi> mimax= 257 mjmax= 129 mkmax= 129
atuyosi> Parallel version array size
atuyosi> mimax= 131 mjmax= 67 mkmax= 35
atuyosi> imax= 129 jmax= 65 kmax= 33
atuyosi> I-decomp= 2 J-decomp= 2 K-decomp= 4
atuyosi>
atuyosi> Start rehearsal measurement process.
atuyosi> Measure the performance in 3 times.
atuyosi> MFLOPS: 4094.68704 time(s): 0.100451 0.00169377949
atuyosi> Now, start the actual measurement process.
atuyosi> The loop will be excuted in 1791 times.
atuyosi> This will take about one minute.
atuyosi> Wait for a while.
atuyosi> Loop executed for 1791 times
atuyosi> Gosa : 0.000530048565
atuyosi> MFLOPS: 4027.27022 time(s): 60.973137
atuyosi> Score based on Pentium III 600MHz : 48.6150475
atuyosi>
atuyosi>
atuyosi> 〓〓 姫路工業大学 情報制御機構研究室
atuyosi> 〓〓 池辺 厚慈
atuyosi> 〓〓 atuyosi @ comp.eng.himeji-tech.ac.jp
atuyosi>
atuyosi> _______________________________________________
atuyosi> SCore-users-jp mailing list
atuyosi> SCore-users-jp @ pccluster.org
atuyosi> http://www.pccluster.org/mailman/listinfo/score-users-jp
atuyosi>
------
Shinji Sumimoto, Fujitsu Labs
SCore-users-jp メーリングリストの案内