[SCore-users] Bug found!?
Shinji Sumimoto
s-sumi at bd6.so-net.ne.jp
Mon Sep 23 23:09:34 JST 2002
Hi.
Could you run the HPL program with setting PM_DEBUG=1 shell
environment variable?
$ export PM_DEBUG=1
$ scrun -nodes=128x2,mpi_max_eager_myrinet=2000000 ./xhpl
The error:
amik> <13:0> SCORE:WARNING MPICH/SCore: pmReceive(pmc=0x85bc7a0) failed,
amik> errno=5
shows an error has occurred on Myrinet communication.
By setting PM_DEBUG=1, more informative messages are displayed.
Maybe you are using Myrinet2000. Is you Myrinet2000 with fiber link or
serial link ?
PS: The mpi_max_eager_myrinet option is only for ch_score2 mpi device.
Are you using MPICH/SCore2?
Shinji.
From: Amik St-Cyr CFD Lab <amik at cfdlab.mcgill.ca>
Subject: [SCore-users] Bug found!?
Date: 23 Sep 2002 09:40:35 -0400
Message-ID: <1032788436.9725.1156.camel at stan.cfdlab.mcgill.ca>
amik> Hi all,
amik>
amik> First of all our tech team found the
amik> bug in our setup. Thank you for the help of mr.
amik> Toyohisa. We had mixed up two cables...
amik>
amik> We then have tried to benckmark the machine
amik> with the LINPACK suite. The behaviour of the linpack
amik> was very strange depending on the parameters given.
amik> The for a specified setup I was able to crank up the
amik> size of the matrix but then SCORE gave me the
amik> following message:
amik>
amik>
amik>
amik> | amik at stokes 18:22:56 Linux_ATHLON_CBLAS> scrun
amik> -nodes=128x2,mpi_max_eager_myrinet=2000000 ./xhpl
amik> SCore-D 5.0.1 connected.
amik> <0:0> SCORE: 256 nodes (128x2) ready.
amik> ============================================================================
amik> HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27,
amik> 2000
amik> Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs.,
amik> UTK
amik> ============================================================================
amik>
amik> An explanation of the input/output parameters follows:
amik> T/V : Wall time / encoded variant.
amik> N : The order of the coefficient matrix A.
amik> NB : The partitioning blocking factor.
amik> P : The number of process rows.
amik> Q : The number of process columns.
amik> Time : Time in seconds to solve the linear system.
amik> Gflops : Rate of execution for solving the linear system.
amik>
amik> The following parameter values will be used:
amik>
amik> N : 200000
amik> NB : 100
amik> P : 16
amik> Q : 16
amik> PFACT : Crout
amik> NBMIN : 1
amik> NDIV : 16
amik> RFACT : Right
amik> BCAST : 2ringM
amik> DEPTH : 1
amik> SWAP : Mix (threshold = 16)
amik> L1 : transposed form
amik> U : transposed form
amik> EQUIL : yes
amik> ALIGN : 8 double precision words
amik>
amik> ----------------------------------------------------------------------------
amik>
amik> - The matrix A is randomly generated for each test.
amik> - The following scaled residual checks will be computed:
amik> 1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
amik> 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
amik> 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
amik> - The relative machine precision (eps) is taken to be
amik> 2.220446e-16
amik> - Computational tests pass if scaled residuals are less than
amik> 16.0
amik>
amik> <13:0> SCORE:WARNING MPICH/SCore: pmReceive(pmc=0x85bc7a0) failed,
amik> errno=5
amik> <13:0> SCORE:PANIC MPICH/SCore: critical error on message transfer
amik> <13:0> Trying to attach GDB (DISPLAY=localhost:13.0): PANIC
amik> SCORE: Program aborted.
amik> | amik at stokes 23:14:53 Linux_ATHLON_CBLAS>
amik>
amik>
amik> Can I prevent this from happening?
amik>
amik> Here are the details:
amik>
amik> | amik at stokes 09:40:11 Linux_ATHLON_CBLAS> cat HPL.dat
amik> HPLinpack benchmark input file
amik> Innovative Computing Laboratory, University of Tennessee
amik> HPL256.out output file name (if any)
amik> 6 device out (6=stdout,7=stderr,file)
amik> 1 # of problems sizes (N)
amik> 200000 Ns
amik> 1 # of NBs
amik> 100 NBs
amik> 1 # of process grids (P x Q)
amik> 16 Ps
amik> 16 Qs
amik> 16.0 threshold
amik> 1 # of panel fact
amik> 1 PFACTs (0=left, 1=Crout, 2=Right)
amik> 1 # of recursive stopping criterium
amik> 1 NBMINs (>= 1)
amik> 1 # of panels in recursion
amik> 16 NDIVs
amik> 1 # of recursive panel fact.
amik> 2 RFACTs (0=left, 1=Crout, 2=Right)
amik> 1 # of broadcast
amik> 3 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
amik> 1 # of lookahead depth
amik> 1 DEPTHs (>=0)
amik> 2 SWAP (0=bin-exch,1=long,2=mix)
amik> 16 swapping threshold
amik> 0 L1 in (0=transposed,1=no-transposed) form
amik> 0 U in (0=transposed,1=no-transposed) form
amik> 1 Equilibration (0=no,1=yes)
amik> 8 memory alignment in double (> 0)
amik>
amik> Make file using Portland group compiler and linking with
amik> the mpi in SCORE.
amik>
amik>
amik> | amik at stokes 09:40:47 hpl> cat Make.Linux_ATHLON_CBLAS
amik> #
amik> # -- High Performance Computing Linpack Benchmark (HPL)
amik> # HPL - 1.0 - September 27, 2000
amik> # Antoine P. Petitet
amik> # University of Tennessee, Knoxville
amik> # Innovative Computing Laboratories
amik> # (C) Copyright 2000 All Rights Reserved
amik> #
amik> # -- Copyright notice and Licensing terms:
amik> #
amik> # Redistribution and use in source and binary forms, with or without
amik> # modification, are permitted provided that the following conditions
amik> # are met:
amik> #
amik> # 1. Redistributions of source code must retain the above copyright
amik> # notice, this list of conditions and the following disclaimer.
amik> #
amik> # 2. Redistributions in binary form must reproduce the above copyright
amik> # notice, this list of conditions, and the following disclaimer in the
amik> # documentation and/or other materials provided with the distribution.
amik> #
amik> # 3. All advertising materials mentioning features or use of this
amik> # software must display the following acknowledgement:
amik> # This product includes software developed at the University of
amik> # Tennessee, Knoxville, Innovative Computing Laboratories.
amik> #
amik> # 4. The name of the University, the name of the Laboratory, or the
amik> # names of its contributors may not be used to endorse or promote
amik> # products derived from this software without specific written
amik> # permission.
amik> #
amik> # -- Disclaimer:
amik> #
amik> # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
amik> # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
amik> # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
amik> # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
amik> # OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
amik> # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
amik> # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
amik> # DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
amik> # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
amik> # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
amik> # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
amik> # ######################################################################
amik> #
amik> # ----------------------------------------------------------------------
amik> # - shell --------------------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> SHELL = /bin/sh
amik> #
amik> CD = cd
amik> CP = cp
amik> LN_S = ln -s
amik> MKDIR = mkdir
amik> RM = /bin/rm -f
amik> TOUCH = touch
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Platform identifier ------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> ARCH = Linux_ATHLON_CBLAS
amik> #
amik> # ----------------------------------------------------------------------
amik> # - HPL Directory Structure / HPL library ------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> TOPdir = $(HOME)/hpl
amik> INCdir = $(TOPdir)/include
amik> BINdir = $(TOPdir)/bin/$(ARCH)
amik> LIBdir = $(TOPdir)/lib/$(ARCH)
amik> #
amik> HPLlib = $(LIBdir)/libhpl.a
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Compilers / linkers - Optimization flags ---------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> CC = pgcc
amik> NOOPT =
amik> CCFLAGS = -O2 -Munroll=c:4 -Mvect=prefetch -Mvect=cachesize:32768
amik>
amik> LINKER = pgcc
amik> LINKFLAGS = $(CCFLAGS)
amik> #
amik> ARCHIVER = ar
amik> ARFLAGS = r
amik> RANLIB = echo
amik> #
amik> # ----------------------------------------------------------------------
amik> # - MPI directories - library ------------------------------------------
amik> # ----------------------------------------------------------------------
amik> # MPinc tells the C compiler where to find the Message Passing library
amik> # header files, MPlib is defined to be the name of the library to be
amik> # used. The variable MPdir is only used for defining MPinc and MPlib.
amik> #
amik> MPdir = /opt/score/mpi/mpich-1.2.0/i386-redhat7-linux2_4
amik> MPinc = -I$(MPdir)/include
amik> MPlib = -Bstatic -L$(MPdir)/lib
amik> -L/opt/score/lib/i386-redhat7-linux2_4 -lmpich -lscoreusr -lpm -lult
amik> -lscorecommon -lpthread -lscboard -lscwrap -L/usr/pgi/linux86/lib
amik> #
amik> # ----------------------------------------------------------------------
amik> # - F77 / C interface --------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> # You can skip this section if and only if you are not planning to use
amik> # a BLAS library featuring a Fortran 77 interface. Otherwise, it is
amik> # necessary to fill out the F2CDEFS variable with the appropriate
amik> # options. **One and only one** option should be chosen in **each** of
amik> # the 3 following categories:
amik> #
amik> # 1) name space (How C calls a Fortran 77 routine)
amik> #
amik> # -DAdd_ : all lower case and a suffixed underscore (Suns,
amik> # Intel, ...),
amik> # -DNoChange : all lower case (IBM RS6000),
amik> # -DUpCase : all upper case (Cray),
amik> # -Df77IsF2C : the FORTRAN compiler in use is f2c.
amik> #
amik> # 2) C and Fortran 77 integer mapping
amik> #
amik> # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int,
amik> # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
amik> # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
amik> #
amik> # 3) Fortran 77 string handling
amik> #
amik> # -DStringSunStyle : The string address is passed at the string loca-
amik> # tion on the stack, and the string length is then
amik> # passed as an F77_INTEGER after all explicit
amik> # stack arguments,
amik> # -DStringStructPtr : The address of a structure is passed by a
amik> # Fortran 77 string, and the structure is of the
amik> # form: struct {char *cp; F77_INTEGER len;},
amik> # -DStringStructVal : A structure is passed by value for each Fortran
amik> # 77 string, and the structure is of the form:
amik> # struct {char *cp; F77_INTEGER len;},
amik> # -DCrayStyle : Special option for Cray machines, which uses
amik> # Cray fcd (fortran character descriptor) for
amik> # interoperation.
amik> #
amik> F2CDEFS =
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Linear Algebra library (BLAS or VSIPL) -----------------------------
amik> # ----------------------------------------------------------------------
amik> # LAinc tells the C compiler where to find the Linear Algebra library
amik> # header files, LAlib is defined to be the name of the library to be
amik> # used. The variable LAdir is only used for defining LAinc and LAlib.
amik> #
amik> LAdir = /home/amik/Linux_ATHLON
amik> LAinc = -I$(LAdir)/include
amik> LAlib = $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a
amik> #
amik> # ----------------------------------------------------------------------
amik> # - HPL includes / libraries / specifics -------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
amik> HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
amik> #
amik> # - Compile time options -----------------------------------------------
amik> #
amik> # -DHPL_COPY_L force the copy of the panel L before bcast;
amik> # -DHPL_CALL_CBLAS call the cblas interface;
amik> # -DHPL_CALL_VSIPL call the vsip library;
amik> # -DHPL_DETAILED_TIMING enable detailed timers;
amik> #
amik> # By default HPL will:
amik> # *) not copy L before broadcast,
amik> # *) call the Fortran 77 BLAS interface
amik> # *) not display detailed timing information.
amik> #
amik> HPL_OPTS = -DHPL_CALL_CBLAS
amik> #
amik> # ----------------------------------------------------------------------
amik> #
amik> HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
amik> #
amik> # ----------------------------------------------------------------------
amik>
amik>
amik> --
amik> _____________________________________________________
amik> Dr. A. St-Cyr
amik> Research Associate, CFD Lab
amik> Department of Mechanical Engineering
amik> McGill University
amik> 688 Sherbrooke Street West, 7th floor
amik> Montreal, Qc, Canada H3A 2S6
amik> Tel: +1 (514) 398-1710, Admin. Fax : 2203
amik> amik at cfdlab.mcgill.ca
amik> _____________________________________________________
amik>
amik> _______________________________________________
amik> SCore-users mailing list
amik> SCore-users at pccluster.org
amik> http://www.pccluster.org/mailman/listinfo/score-users
amik>
-----
Shinji Sumimoto E-Mail: s-sumi at bd6.so-net.ne.jp
More information about the SCore-users
mailing list