From jain.peeyush at gmail.com Fri Apr 7 23:07:49 2006 From: jain.peeyush at gmail.com (Peeyush Jain) Date: Fri, 7 Apr 2006 07:07:49 -0700 Subject: [SCore-users] Armci problem with lammpi Message-ID: <57334b600604070707w19eb0e82s34b8ebeffa04317e@mail.gmail.com> I am doing ARMCI with lammpi-7.1.1 on two processor. When I check test.x on one processor, it works fine but when i check it on more than two computers, then i get the following error: $mpirun -np 2 -v ./test.x 7605 ./test.x running on n0 (o) 3168 ./test.x running on n1 ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP Sockets'. 1:trying connect to host=peeyush, port=37912 t=5 111 trying to connect:: Connection refused 1:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 1:: Connection refused 1:armci_CreateSocketAndConnect: connect failed: -1 0:trying connect to host=localhost, port=32911 t=5 111 trying to connect:: Connection refused 0:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 0:: Connection refused 0:armci_CreateSocketAndConnect: connect failed: -1 -10001(s):armci_AcceptSockAll:timeout waiting for connection: 0 -10001(s):armci_AcceptSockAll:timeout waiting for connection: 0 -10000(s):armci_AcceptSockAll:timeout waiting for connection: 0 -10000(s):armci_AcceptSockAll:timeout waiting for connection: 0 ----------------------------------------------------------------------------- One of the processes started by mpirun has exited with a nonzero exit code. This typically indicates that the process finished in error. If your process did not finish in error, be sure to include a "return 0" or "exit(0)" in your C code before exiting the application. PID 7605 failed on node n0 (172.26.117.167) with exit status 1. ----------------------------------------------------------------------------- I have configure my lam mpi with ifort and gcc compiler. and armci with mpif77 and mpicc. Can anyone of you please tell me what the problem is with mt TCP connection. Peeyush From kameyama at pccluster.org Mon Apr 10 12:30:32 2006 From: kameyama at pccluster.org (kameyama at pccluster.org) Date: Mon, 10 Apr 2006 12:30:32 +0900 Subject: [SCore-users] Armci problem with lammpi Message-ID: <20060410033032.0F51B21EF56@neal.il.is.s.u-tokyo.ac.jp> Note that SCore-users mailling list is supported SCore cluster system. SCore dose not include LAM or ARMCI. In article <57334b600604070707w19eb0e82s34b8ebeffa04317e at mail.gmail.com> "Pee yush Jain" wrotes: > I am doing ARMCI with lammpi-7.1.1 on two processor. When I check > test.x on one processor, it works fine but when i check it on more > than two computers, then i get > the following error: > > $mpirun -np 2 -v ./test.x > 7605 ./test.x running on n0 (o) > 3168 ./test.x running on n1 > ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP Sockets'. > 1:trying connect to host=3Dpeeyush, port=3D37912 t=3D5 111 The ARMCI program exchange gethostname() as hostname. And the program try to connect the hostname. This means all host must be able to connect by each gethostname(). But your 2 hosts cannot connect gethostname (peeyush and localhost). Please check hostname on each hosts. from Kameyama Toyohisa