[SCore-users-jp] [SCore-users] Problems with MPI when running CHARMM

si011015 @ fh-stpoelten.ac.at si011015 @ fh-stpoelten.ac.at
2005年 4月 1日 (金) 19:32:43 JST


Hello score users,

I sucessefully compiled SCore from source on two dual-pentium III
machines. One is the server and both are compute hosts.
The provided examples run without any errors.
The only difference to the installation documentation is:

[root @ omega sbin]# sceptic -v -g pcc
omega.mdy.univie.ac.at: scping FAILED
sheet.mdy.univie.ac.at: scping FAILED
sheet.mdy.univie.ac.at: OK
omega.mdy.univie.ac.at: OK
All host responding.

But I didn't pay it much attention. Should I?


I compiled CHARMM which is used in our group (molecular dynamic
simulations) and linked it with the /opt/score/mpi/mpich-1.2.5/  library.

I start CHARMM with the following command:
scrun -nodes=$n /opt/c32a1_MPICH_CMPI_GENCOMM/exec/gnu/charmm < charmm.inp
> test.$n

If $n=1 --> I run on the local machine and 1 cpu
everything works out fine.

If $n=1x2 --> I run on 2 machines with 1 cpu
some communication is done, but somethings going wrong because the results
are bad, and the program aborts with the following error message on
stderror:

<0:0> SCORE: 2 nodes (1x2) ready.
<0:0>SCore: *** SIGNAL EXCEPTION eip=0x08698790, cr2=0x 37daf88 ***
<0:0>SCore: gs=0x0000, fs=0x0000, es=0x002b, ds=0x002b
<0:0>SCore: edi=0x037daf88, esi=0x80000001, ebp=0xbfffda08, esp=0xbfffd6b0
<0:0>SCore: ebx=0xffffffff, edx=0x0d5c4d60, ecx=0x0d5dbd78, eax=0x0d5dbd78
<0:0>SCore: trapno=0x0000000e, err=0x00000004, eip=0x08698790, cs=0x0023
<0:0>SCore: esp_at_signal=0xbfffd6b0, ss=0x002b, oldmask=0x00000000,
cr2=0x037daf88
<0:0> Trying to attach GDB (DISPLAY=localhost:11.0): Exception signal
(SIGSEGV)
<1:1>SCore: *** SIGNAL EXCEPTION eip=0x08698790, cr2=0x 37daf88 ***
<1:1>SCore: gs=0x0000, fs=0x0000, es=0x002b, ds=0x002b
<1:1>SCore: edi=0x037daf88, esi=0x80000001, ebp=0xbfffda08, esp=0xbfffd6b0
<1:1>SCore: ebx=0xffffffff, edx=0x0d5c4518, ecx=0x0d5db530, eax=0x0d5db530
<1:1>SCore: trapno=0x0000000e, err=0x00000004, eip=0x08698790, cs=0x0023
<1:1>SCore: esp_at_signal=0xbfffd6b0, ss=0x002b, oldmask=0x00000000,
cr2=0x037daf88
<0:1> Trying to attach GDB (DISPLAY=localhost:11.0): Exception signal
(SIGSEGV)
SCORE: Program aborted.


If $n=2x1 --> I  run on the local machine with 2 cpus
it is the same, only the error message is shorter:

0:0> SCORE: 2 nodes (2x1) ready.
<1:0>SCore: *** SIGNAL EXCEPTION eip=0x08698790, cr2=0x 37daf88 ***
<1:0>SCore: gs=0x0000, fs=0x0000, es=0x002b, ds=0x002b
<1:0>SCore: edi=0x037daf88, esi=0x80000001, ebp=0xbfffda08, esp=0xbfffd6b0
<1:0>SCore: ebx=0xffffffff, edx=0x0d5c4518, ecx=0x0d5db530, eax=0x0d5db530
<1:0>SCore: trapno=0x0000000e, err=0x00000004, eip=0x08698790, cs=0x0023
<1:0>SCore: esp_at_signal=0xbfffd6b0, ss=0x002b, oldmask=0x00000000,
cr2=0x037daf88
<1:0> Trying to attach GDB (DISPLAY=localhost:11.0): Exception signal
(SIGSEGV)
SCORE: Program aborted.


If $n=4 --> 2 machines with 2 cpus
the program hangs before any communication is done, but scrun doesn't
abort and all 4 cpus run on almost 100%

I attached the outputfiles ( test.{1,1x2,2x1,4} ). If anyone is using
CHARMM, it might help.

Any help is appreciated! Thank you!

Best regards, Alfred Karl
University of Vienna
University of Applied Sience St. Poelten
-------------- next part --------------
テキスト形式以外の添付ファイルを保管しました...
ファイル名: test.1
型:         application/octet-stream
サイズ:     72356 バイト
説明:       無し
URL:        <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20050401/a24cbeee/attachment.obj>

-------------- next part --------------
テキスト形式以外の添付ファイルを保管しました...
ファイル名: test.1x2
型:         application/octet-stream
サイズ:     16982 バイト
説明:       無し
URL:        <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20050401/a24cbeee/attachment-0001.obj>

-------------- next part --------------
テキスト形式以外の添付ファイルを保管しました...
ファイル名: test.2x1
型:         application/octet-stream
サイズ:     16982 バイト
説明:       無し
URL:        <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20050401/a24cbeee/attachment-0002.obj>

-------------- next part --------------
テキスト形式以外の添付ファイルを保管しました...
ファイル名: test.4
型:         application/octet-stream
サイズ:     16883 バイト
説明:       無し
URL:        <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20050401/a24cbeee/attachment-0003.obj>


SCore-users-jp メーリングリストの案内