[SCore-users-jp] SCoreを使用しないMPICH よりスコアが劣る問題.

Shinji Sumimoto s-sumi @ flab.fujitsu.co.jp
2004年 1月 27日 (火) 15:38:51 JST


池辺さま

富士通研の住元です。

/opt/score/etc/pm-ethernet.confはどうなっていますでしょうか?

このファイルのパラメータを以下のようにして試してもらえないでしょうか?
=================================
maxnsend 24
backoff 2400
intreap 1
=================================

/opt/score/etc/pm-ethernet.confの説明は以下にあるので参考にしてください。

http://www.pccluster.org/score/dist/score/html/ja/man/man5/pm-ether-conf.html

From: 池辺 厚慈 <atuyosi @ comp.eng.himeji-tech.ac.jp>
Subject: [SCore-users-jp] SCoreを使用しないMPICH よりスコアが劣る問題.
Date: Tue, 27 Jan 2004 15:23:02 +0900
Message-ID: <426FFAEC-5091-11D8-903A-003065AD5970 @ comp.eng.himeji-tech.ac.jp>

atuyosi> 姫路工業大学,情報制御機構研究室の池辺と申します.
atuyosi> 前回2,3質問させて頂いた者です.その節はありがとうございました.
atuyosi> 今回,下記の質問についてお答え頂戴したくメールを致しました.
atuyosi> 何卒ご教授願います.
atuyosi> 
atuyosi> ---ここから質問内容です.
atuyosi> 
atuyosi> 下記環境にてMPICH-SCore環境においてベンチマークを
atuyosi> 実行したところ,同一のハードウェア上でのSCoreを利用しない
atuyosi> MPICHよりスコアが劣ってしまうのですが,設定に問題があるのでしょうか?
atuyosi> 
atuyosi> 動作環境
atuyosi> CPU: AthlonXP 2200+
atuyosi> RAM: PC2700 512MB
atuyosi> HDD: SCore時のみ80GB
atuyosi> NIC: intel PRO/1000MT デスクトップアダプタ
atuyosi> HUB: corega GSW-8
atuyosi> OS: RedHat Linux 7.3
atuyosi> SCore version 5.6.1
atuyosi> MPICH version 1.2.5
atuyosi> 
atuyosi> 上記構成を計算ノード16ノード+クラスタ管理ノード1ノード
atuyosi> の計17台で運用しています.
atuyosi> 計算ノードへのインストールにはEITを使用しました.
atuyosi> 
atuyosi> 使用したベンチマーク: Poisson FEM-BMTおよび
atuyosi> 姫野ベンチXP mpi版 計算サイズM
atuyosi> コンパイラg77-2.96 コンパイルオプション: -O3
atuyosi> 
atuyosi> 結果(SCore環境時)
atuyosi> Poisson FEM-BMT
atuyosi> SCore-D 5.6.1 connected.
atuyosi> <0:0> SCORE: 16 nodes (16x1) ready.
atuyosi>    No. of DOFs : 2097152  (n =  128)
atuyosi>    No. of PEs  : 16
atuyosi> 
atuyosi>    Initialization ...
atuyosi>    Start rehearsal measurement process.
atuyosi> 
atuyosi>    Number of iterations in CG  10
atuyosi>    Loop executed for  1 times
atuyosi>    Residual :  0.00053340235
atuyosi>    Elapsed time :   3.72145009 sec.
atuyosi>    NFLOPS =  914913280.
atuyosi>    MFLOPS measured :        245.848595
atuyosi>   -----------------------------------------
atuyosi> 
atuyosi>    Number of iterations in CG  10
atuyosi>    Loop executed for  16 times
atuyosi>    Residual :  0.00053340235
atuyosi>    Elapsed time :   92.4863849 sec.
atuyosi>    NFLOPS =  914913280.
atuyosi>    MFLOPS measured :        158.278567
atuyosi>   -----------------------------------------
atuyosi> 
atuyosi>   姫野ベンチxp mpi版 計算サイズM
atuyosi>   SCore-D 5.6.1 connected.
atuyosi> <0:0> SCORE: 16 nodes (16x1) ready.
atuyosi>   Sequential version array size
atuyosi>    mimax= 257 mjmax= 129 mkmax= 129
atuyosi>   Parallel version  array size
atuyosi>    mimax= 131 mjmax= 67 mkmax= 35
atuyosi>    imax= 129 jmax= 65 kmax= 33
atuyosi>    I-decomp=  2 J-decomp=  2 K-decomp=  4
atuyosi> 
atuyosi>    Start rehearsal measurement process.
atuyosi>    Measure the performance in 3 times.
atuyosi>     MFLOPS:  3717.79994  time(s):  0.110634089  0.00169377867
atuyosi>   Now, start the actual measurement process.
atuyosi>   The loop will be excuted in 1626 times.
atuyosi>   This will take about one minute.
atuyosi>   Wait for a while.
atuyosi>    Loop executed for  1626 times
atuyosi>    Gosa :  0.000568608928
atuyosi>    MFLOPS:  3408.83448  time(s):  65.3985848
atuyosi>    Score based on Pentium III 600MHz :  41.1496201
atuyosi> 
atuyosi>    結果(非SCore環境時)
atuyosi>    Poisson FEM-BMT
atuyosi>      No. of DOFs : 2097152  (n =  128)
atuyosi>    No. of PEs  : 16
atuyosi> 
atuyosi>    Initialization ...
atuyosi>    Start rehearsal measurement process.
atuyosi> 
atuyosi>    Number of iterations in CG  10
atuyosi>    Loop executed for  1 times
atuyosi>    Residual :  0.000533402352
atuyosi>    Elapsed time :   0.934157 sec.
atuyosi>    NFLOPS =  914913280.
atuyosi>    MFLOPS measured :        979.399906
atuyosi>   -----------------------------------------
atuyosi> 
atuyosi>    Number of iterations in CG  10
atuyosi>    Loop executed for  64 times
atuyosi>    Residual :  0.000533402352
atuyosi>    Elapsed time :   69.241711 sec.
atuyosi>    NFLOPS =  914913280.
atuyosi>    MFLOPS measured :        845.652843
atuyosi>   -----------------------------------------
atuyosi> 
atuyosi>    姫野ベンチxp mpi版 計算サイズM
atuyosi>     Sequential version array size
atuyosi>    mimax= 257 mjmax= 129 mkmax= 129
atuyosi>   Parallel version  array size
atuyosi>    mimax= 131 mjmax= 67 mkmax= 35
atuyosi>    imax= 129 jmax= 65 kmax= 33
atuyosi>    I-decomp=  2 J-decomp=  2 K-decomp=  4
atuyosi> 
atuyosi>    Start rehearsal measurement process.
atuyosi>    Measure the performance in 3 times.
atuyosi>     MFLOPS:  4094.68704  time(s):  0.100451  0.00169377949
atuyosi>   Now, start the actual measurement process.
atuyosi>   The loop will be excuted in 1791 times.
atuyosi>   This will take about one minute.
atuyosi>   Wait for a while.
atuyosi>    Loop executed for  1791 times
atuyosi>    Gosa :  0.000530048565
atuyosi>    MFLOPS:  4027.27022  time(s):  60.973137
atuyosi>    Score based on Pentium III 600MHz :  48.6150475
atuyosi> 
atuyosi> 
atuyosi> 〓〓  姫路工業大学 情報制御機構研究室
atuyosi> 〓〓             池辺 厚慈
atuyosi> 〓〓 atuyosi @ comp.eng.himeji-tech.ac.jp
atuyosi> 
atuyosi> _______________________________________________
atuyosi> SCore-users-jp mailing list
atuyosi> SCore-users-jp @ pccluster.org
atuyosi> http://www.pccluster.org/mailman/listinfo/score-users-jp
atuyosi> 
------
Shinji Sumimoto, Fujitsu Labs



SCore-users-jp メーリングリストの案内