[SCore-users] Bug found!?

Shinji Sumimoto s-sumi at bd6.so-net.ne.jp
Mon Sep 23 23:09:34 JST 2002


Hi.

Could you run the HPL program with setting PM_DEBUG=1 shell
environment variable?

$ export PM_DEBUG=1
$ scrun -nodes=128x2,mpi_max_eager_myrinet=2000000 ./xhpl 

The error:
amik> <13:0> SCORE:WARNING MPICH/SCore: pmReceive(pmc=0x85bc7a0) failed,
amik> errno=5
shows an error has occurred on Myrinet communication. 

By setting PM_DEBUG=1, more informative messages are displayed.

Maybe you are using Myrinet2000. Is you Myrinet2000 with fiber link or
serial link ?

PS: The mpi_max_eager_myrinet option is only for ch_score2 mpi device.
    Are you using MPICH/SCore2?   

Shinji.

From: Amik St-Cyr CFD Lab <amik at cfdlab.mcgill.ca>
Subject: [SCore-users] Bug found!?
Date: 23 Sep 2002 09:40:35 -0400
Message-ID: <1032788436.9725.1156.camel at stan.cfdlab.mcgill.ca>

amik> Hi all,
amik> 
amik> 	First of all our tech team found the 
amik> bug in our setup. Thank you for the help of mr. 
amik> Toyohisa. We had mixed up two cables...
amik> 
amik> 	We then have tried to benckmark the machine
amik> with the LINPACK suite. The behaviour of the linpack 
amik> was very strange depending on the parameters given.
amik> The for a specified setup I was able to crank up the
amik> size of the matrix but then SCORE gave me the 
amik> following message:
amik> 
amik> 
amik> 
amik> | amik at stokes 18:22:56 Linux_ATHLON_CBLAS> scrun
amik> -nodes=128x2,mpi_max_eager_myrinet=2000000 ./xhpl 
amik> SCore-D 5.0.1 connected.
amik> <0:0> SCORE: 256 nodes (128x2) ready.
amik> ============================================================================
amik> HPLinpack 1.0  --  High-Performance Linpack benchmark  --  September 27,
amik> 2000
amik> Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs., 
amik> UTK
amik> ============================================================================
amik> 
amik> An explanation of the input/output parameters follows:
amik> T/V    : Wall time / encoded variant.
amik> N      : The order of the coefficient matrix A.
amik> NB     : The partitioning blocking factor.
amik> P      : The number of process rows.
amik> Q      : The number of process columns.
amik> Time   : Time in seconds to solve the linear system.
amik> Gflops : Rate of execution for solving the linear system.
amik> 
amik> The following parameter values will be used:
amik> 
amik> N      :  200000 
amik> NB     :     100 
amik> P      :      16 
amik> Q      :      16 
amik> PFACT  :   Crout 
amik> NBMIN  :       1 
amik> NDIV   :      16 
amik> RFACT  :   Right 
amik> BCAST  :  2ringM 
amik> DEPTH  :       1 
amik> SWAP   : Mix (threshold = 16)
amik> L1     : transposed form
amik> U      : transposed form
amik> EQUIL  : yes
amik> ALIGN  : 8 double precision words
amik> 
amik> ----------------------------------------------------------------------------
amik> 
amik> - The matrix A is randomly generated for each test.
amik> - The following scaled residual checks will be computed:
amik>    1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
amik>    2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
amik>    3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
amik> - The relative machine precision (eps) is taken to be         
amik> 2.220446e-16
amik> - Computational tests pass if scaled residuals are less than          
amik> 16.0
amik> 
amik> <13:0> SCORE:WARNING MPICH/SCore: pmReceive(pmc=0x85bc7a0) failed,
amik> errno=5
amik> <13:0> SCORE:PANIC MPICH/SCore: critical error on message transfer
amik> <13:0> Trying to attach GDB (DISPLAY=localhost:13.0): PANIC
amik> SCORE: Program aborted.
amik> | amik at stokes 23:14:53 Linux_ATHLON_CBLAS>
amik> 
amik> 
amik> Can I prevent this from happening?
amik> 
amik> Here are the details:
amik> 
amik> | amik at stokes 09:40:11 Linux_ATHLON_CBLAS> cat HPL.dat 
amik> HPLinpack benchmark input file
amik> Innovative Computing Laboratory, University of Tennessee
amik> HPL256.out   output file name (if any)
amik> 6            device out (6=stdout,7=stderr,file)
amik> 1            # of problems sizes (N)
amik> 200000         Ns
amik> 1            # of NBs
amik> 100           NBs
amik> 1            # of process grids (P x Q)
amik> 16           Ps
amik> 16           Qs
amik> 16.0         threshold
amik> 1            # of panel fact
amik> 1            PFACTs (0=left, 1=Crout, 2=Right)
amik> 1            # of recursive stopping criterium
amik> 1            NBMINs (>= 1)
amik> 1            # of panels in recursion
amik> 16            NDIVs
amik> 1            # of recursive panel fact.
amik> 2            RFACTs (0=left, 1=Crout, 2=Right)
amik> 1            # of broadcast
amik> 3            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
amik> 1            # of lookahead depth
amik> 1            DEPTHs (>=0)
amik> 2            SWAP (0=bin-exch,1=long,2=mix)
amik> 16           swapping threshold
amik> 0            L1 in (0=transposed,1=no-transposed) form
amik> 0            U  in (0=transposed,1=no-transposed) form
amik> 1            Equilibration (0=no,1=yes)
amik> 8            memory alignment in double (> 0)
amik> 
amik> Make file using Portland group compiler and linking with
amik> the mpi in SCORE.
amik> 
amik> 
amik> | amik at stokes 09:40:47 hpl> cat Make.Linux_ATHLON_CBLAS
amik> #  
amik> #  -- High Performance Computing Linpack Benchmark (HPL)                
amik> #     HPL - 1.0 - September 27, 2000                          
amik> #     Antoine P. Petitet                                                
amik> #     University of Tennessee, Knoxville                                
amik> #     Innovative Computing Laboratories                                 
amik> #     (C) Copyright 2000 All Rights Reserved                            
amik> #                                                                       
amik> #  -- Copyright notice and Licensing terms:                             
amik> #                                                                       
amik> #  Redistribution  and  use in  source and binary forms, with or without
amik> #  modification, are  permitted provided  that the following  conditions
amik> #  are met:                                                             
amik> #                                                                       
amik> #  1. Redistributions  of  source  code  must retain the above copyright
amik> #  notice, this list of conditions and the following disclaimer.        
amik> #                                                                       
amik> #  2. Redistributions in binary form must reproduce  the above copyright
amik> #  notice, this list of conditions,  and the following disclaimer in the
amik> #  documentation and/or other materials provided with the distribution. 
amik> #                                                                       
amik> #  3. All  advertising  materials  mentioning  features  or  use of this
amik> #  software must display the following acknowledgement:                 
amik> #  This  product  includes  software  developed  at  the  University  of
amik> #  Tennessee, Knoxville, Innovative Computing Laboratories.             
amik> #                                                                       
amik> #  4. The name of the  University,  the name of the  Laboratory,  or the
amik> #  names  of  its  contributors  may  not  be used to endorse or promote
amik> #  products  derived   from   this  software  without  specific  written
amik> #  permission.                                                          
amik> #                                                                       
amik> #  -- Disclaimer:                                                       
amik> #                                                                       
amik> #  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
amik> #  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
amik> #  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
amik> #  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
amik> #  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
amik> #  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
amik> #  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
amik> #  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
amik> #  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
amik> #  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
amik> #  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
amik> # ######################################################################
amik> #  
amik> # ----------------------------------------------------------------------
amik> # - shell --------------------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> SHELL        = /bin/sh
amik> #
amik> CD           = cd
amik> CP           = cp
amik> LN_S         = ln -s
amik> MKDIR        = mkdir
amik> RM           = /bin/rm -f
amik> TOUCH        = touch
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Platform identifier ------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> ARCH         = Linux_ATHLON_CBLAS
amik> #
amik> # ----------------------------------------------------------------------
amik> # - HPL Directory Structure / HPL library ------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> TOPdir       = $(HOME)/hpl
amik> INCdir       = $(TOPdir)/include
amik> BINdir       = $(TOPdir)/bin/$(ARCH)
amik> LIBdir       = $(TOPdir)/lib/$(ARCH)
amik> #
amik> HPLlib       = $(LIBdir)/libhpl.a 
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Compilers / linkers - Optimization flags ---------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> CC           = pgcc
amik> NOOPT        =
amik> CCFLAGS      = -O2 -Munroll=c:4 -Mvect=prefetch -Mvect=cachesize:32768
amik> 
amik> LINKER       = pgcc
amik> LINKFLAGS    = $(CCFLAGS)
amik> #
amik> ARCHIVER     = ar
amik> ARFLAGS      = r
amik> RANLIB       = echo
amik> #
amik> # ----------------------------------------------------------------------
amik> # - MPI directories - library ------------------------------------------
amik> # ----------------------------------------------------------------------
amik> # MPinc tells the  C  compiler where to find the Message Passing library
amik> # header files,  MPlib  is defined  to be the name of  the library to be
amik> # used. The variable MPdir is only used for defining MPinc and MPlib.
amik> #
amik> MPdir        = /opt/score/mpi/mpich-1.2.0/i386-redhat7-linux2_4
amik> MPinc        = -I$(MPdir)/include
amik> MPlib        = -Bstatic  -L$(MPdir)/lib
amik> -L/opt/score/lib/i386-redhat7-linux2_4 -lmpich -lscoreusr -lpm -lult
amik> -lscorecommon -lpthread -lscboard -lscwrap -L/usr/pgi/linux86/lib
amik> #
amik> # ----------------------------------------------------------------------
amik> # - F77 / C interface --------------------------------------------------
amik> # ----------------------------------------------------------------------
amik> # You can skip this section  if and only if  you are not planning to use
amik> # a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
amik> # necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
amik> # options.  **One and only one**  option should be chosen in **each** of
amik> # the 3 following categories:
amik> #
amik> # 1) name space (How C calls a Fortran 77 routine)
amik> #
amik> # -DAdd_              : all lower case and a suffixed underscore  (Suns,
amik> #                       Intel, ...),
amik> # -DNoChange          : all lower case (IBM RS6000),
amik> # -DUpCase            : all upper case (Cray),
amik> # -Df77IsF2C          : the FORTRAN compiler in use is f2c.
amik> #
amik> # 2) C and Fortran 77 integer mapping
amik> #
amik> # -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,
amik> # -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
amik> # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
amik> #
amik> # 3) Fortran 77 string handling
amik> #
amik> # -DStringSunStyle    : The string address is passed at the string loca-
amik> #                       tion on the stack, and the string length is then
amik> #                       passed as  an  F77_INTEGER  after  all  explicit
amik> #                       stack arguments,
amik> # -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
amik> #                       Fortran 77  string,  and the structure is of the
amik> #                       form: struct {char *cp; F77_INTEGER len;},
amik> # -DStringStructVal   : A structure is passed by value for each  Fortran
amik> #                       77 string,  and  the  structure is  of the form:
amik> #                       struct {char *cp; F77_INTEGER len;},
amik> # -DCrayStyle         : Special option for  Cray  machines,  which  uses
amik> #                       Cray  fcd  (fortran  character  descriptor)  for
amik> #                       interoperation.
amik> #
amik> F2CDEFS      =
amik> #
amik> # ----------------------------------------------------------------------
amik> # - Linear Algebra library (BLAS or VSIPL) -----------------------------
amik> # ----------------------------------------------------------------------
amik> # LAinc tells the  C  compiler where to find the Linear Algebra  library
amik> # header files,  LAlib  is defined  to be the name of  the library to be
amik> # used. The variable LAdir is only used for defining LAinc and LAlib.
amik> #
amik> LAdir        = /home/amik/Linux_ATHLON
amik> LAinc        = -I$(LAdir)/include
amik> LAlib        = $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a
amik> #
amik> # ----------------------------------------------------------------------
amik> # - HPL includes / libraries / specifics -------------------------------
amik> # ----------------------------------------------------------------------
amik> #
amik> HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
amik> HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
amik> #
amik> # - Compile time options -----------------------------------------------
amik> #
amik> # -DHPL_COPY_L           force the copy of the panel L before bcast;
amik> # -DHPL_CALL_CBLAS       call the cblas interface;
amik> # -DHPL_CALL_VSIPL       call the vsip  library;
amik> # -DHPL_DETAILED_TIMING  enable detailed timers;
amik> #
amik> # By default HPL will:
amik> #    *) not copy L before broadcast,
amik> #    *) call the Fortran 77 BLAS interface
amik> #    *) not display detailed timing information.
amik> #
amik> HPL_OPTS     = -DHPL_CALL_CBLAS
amik> # 
amik> # ----------------------------------------------------------------------
amik> #
amik> HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
amik> #
amik> # ----------------------------------------------------------------------
amik> 
amik> 
amik> -- 
amik> _____________________________________________________
amik> Dr. A. St-Cyr
amik> Research Associate, CFD Lab
amik> Department of Mechanical Engineering
amik> McGill University
amik> 688 Sherbrooke Street West, 7th floor
amik> Montreal, Qc, Canada H3A 2S6
amik> Tel: +1 (514) 398-1710, Admin. Fax : 2203
amik> amik at cfdlab.mcgill.ca
amik> _____________________________________________________
amik> 
amik> _______________________________________________
amik> SCore-users mailing list
amik> SCore-users at pccluster.org
amik> http://www.pccluster.org/mailman/listinfo/score-users
amik> 
-----
Shinji Sumimoto    E-Mail: s-sumi at bd6.so-net.ne.jp



More information about the SCore-users mailing list