Questions |
Answers |
|
1. |
The content of the result of "ypcat hosts" and the /etc/hosts file is different. | Please execute the following to reflect the content
of /etc/hosts in NIS. # cd /var/yp # makeNote: It is not a bug that two or more same lines appear by ypcat. % ypcat hosts | sort -uThere is no problem if the output of the above-mentioned command is corresponding to the content of the /etc/hosts file. |
2. |
We can't execute a system test
"sceptic -v -g pcc". Also, fails to execute "telnet" from a client node to
server node. |
If you are using NIS at a server node, please check
whether "localhost" is defined by a server name ("127.0.0.1 server.domain
server"). If so, remove this line and add a new correct line. Finally, please
execute "#cd /var/yp;make" in order to update NIS database. |
3. |
"msgbserv" doesn't start up. |
Please confirm the setting of NIS, DNS, "/etc/hosts". |
4. |
"scout" doesn't start up. |
If you are using SCore version before "5.0.0",
please set FQDN and a simple host name in "/etc/hosts". This problem is solved
in SCore5.0.1. |
5. |
When executing "scrun", an error
message "No self host error" appears only on one host. |
Please confirm the setting of "/etc/hosts" of the
host in which an error message appears. |
Questions |
Answers |
|
1. |
My machine does not have FDD. Can SCore be installed in the machine without FDD? How can I boot CD with CD of SCore? |
There is a way to write score client boot image to CD-R
in order to install SCore. The boot disk image of SCore clients is in /opt/score/ndboot/images/. So, to create bootable CD-ROM. # mkisofs -b /opt/score/ndboot/images/100Mbps_Ethernet.img -c boot.catalog-o /tmp/score-boot.iso -J -r -Tand cdrecord. or using xcdroast. |
2. |
Can EIT set up Compute Hosts using eth2? | Currently, EIT is only working with eth0. If you want to use eth1 or eth2, you install SCore using eth0 and then change from eth0 to eth2. |
3. |
EIT doesn't start up, because
it failed to set "parameter (domainName)". |
Please confirm the setting of NIS. And please execute
"% ypmatch your_servers_hostname | awk '{print $2}'" and confirm whether
the return value is FQDN host name. |
4. |
While installing SCore using
EIT, I had an error message "Cannot resolve the host clusterus1.dciem.dnd.ca
IP address", and couldn't continue the installation. |
Please confirm the setting of NIS, DNS, "/etc/hosts". |
5. |
While installing SCore using
EIT, I had an error message "Cannot resolve the server's hostname from IP
address", and couldn't continue the installation. |
Please confirm the setting of NIS, DNS, "/etc/hosts". |
6. |
An error message "grab failed:
another application has grab" appears when generating a boot disk. |
Another program uses the floppy disk driver, please
check whether the floppy disk drive is used by other program or not. |
7. |
We use Easy Installation Tool(EIT)
to install, Before creating boot floppy, it appears error dialog box saying
that it can not find /opt/score/setup//RedHat/instimage/compconf/.conf directory. What problem it mean? |
Please check disk space of /opt partition on Server Host
as follows:% df /opt/opt/score needs about 500 MB. |
8. |
When installing computation nodes
in SCore5.0.0, starting up of anaconda fails. |
This problem is solved in SCore5.0.1. |
9. |
We can't install SCore using
EIT with e1000. |
A boot floppy doesn't include e1000 driver before SCore 5.2.0. If you want to install SCore 5.2.0 or later, please select 1Gbps_Ethernet on Select Boot Network Device window. |
10. |
When installing SCore by EIT, the following
messages were received and a file transfer will stop. The file mnt/source2/RedHat/RPMS/ This is due to a missing file, a bad package, or bad media. Press |
Two causes are thought. (1) Can not read from CD-ROM. Please try to copy the CD-ROM image onto the disk, make mount point there, and execute it. (2) NFS error. Please avoid in this case as follows. 1) Change to shell screen. (pushing "Alt + F2") 2) Change directory. # cd /mnt/cdrom/RedHat/RPMS/ # ls -l "Package name" 3) Then, the OK button of the error message dialog is clicked on Server Host. 4) When the message like "ls: Package Name: State NFS file handle" appears, it tries several times until getting the normal result of ls command. |
11. |
We can't boot PC, because a root
file system is too large. Also, we can't set a partition like "/boot" using
EIT. |
Please set the size of a root partition less than
8GB. EIT ON SCore 5.2.0 or later can setting /boot partition. |
12. |
When using EIT, does environment variables need to be set up? | There is no necessary because of being set when login to Server Host. |
13. |
I want to get more information in "By binary rpm files" of "SCore Cluster System Software Installation Guide". | As follows. 3. Compute Host Settings - SCore Linux Kernel Installation The following of the kernel image name are correct in /etc/lilo.conf file. *-2.4.18-3SCOREsmp *-2.4.18-3SCORE - SCore System Installation ./bininstall command must execute twice or more, all files might not be copied. 4. Server Host Settings - Sample scorehosts.db file is set as follows. doc/html/installation/ -> doc/html/en/installation/ - /var/log/msgbserv.out file does not exist after msgbserv started. - The setting of the PM-II device is executed as follows. # /opt/score/deploy/mkpmethernetconf -speed 100 pm-udp.conf -> /opt/score/deploy/mkpmethernetconf -speed 100 -g pcc + others - To use Server Host as Compute Host, it sets it as follows. ./bininstall -compute command is executed in Server Host. Work same as the setting of Compute Host is executed. |
Questions |
Answers |
|
1. |
What kind of Gigabit Ethernet does
SCore support ? |
PM/Ethernet doesn't depend on an Ethernet NIC
and a switch. But its performance depends on them. Please refer to a recommended
H/W list. |
2. |
Is Network Trunking possible at Gigabit Ethernet? | If you use Network Trunking on Gigabit Ethernet, you
should use Ethernet Switches and NICs which support JUMBO FRAME on 66MHz
64bit PCI in order to achieve high bandwidth because of slackness of PCI
DMA bandwidth. On PCI-X or on multiple PCI buses, the performance may be increased. Please refer to "PM Communication Performance" of "SCore Cluster System Software Overview". Note: We have tested SysKonnect 9843 NICs, 3Com 996B-T and Broadcom 5701 NICs using Network Trunking with JUMBO FRAMEs. We have also tested Intel PRO100T, PRO1000XT but not tested with JUMBO FRAMEs. |
3. |
How to write a configuration file,
when connecting two PCs directly without Myrinet switch. |
Please set "pm-myrinet.conf" as follows: 0 node0.pccluster.org 1 node1.pccluster.org |
4. |
When executing "etherpmctl", an
error message "resource busy" appears. |
|
5. |
PM/Ethernet communication test such as rpmtest(scstest, rcstest etc) failed. |
PM/Ethernet communication is failed by following reasons:
|
6. |
When executing PM test, some commands
like "scstest" don't work well. |
Please check whether IRQ is duplicated nor not.
If you are using an automatic setting for IRQ, then please set IRQ manually
using BIOS. |
7. |
If we try to execute "mandel"
with Ethernet and SMP, the program crashes. |
It's a bug of PM/Ethernet of SCore5.0.0. This
bug is fixed in SCore5.0.1. |
8. |
If we try to execute a SCore program
with "SK-9D21", the program crashes. |
It's a problem caused by this type of NIC." SysKonnect
SK-984x" and "Intel pro1000/T" realize a high bandwidth and low latency. |
9. |
Running over Myrinet 2000, but are
now getting some errors with a code that was working ok (DLPOLY chemistry
code): SCore-D 4.0 connected. <3> ULT:SYSCALLPANIC(../recv.c:85) PM Error (pmReceive) (32:Broken pipe) <5> SCore-D:WARNING Some job(s) will not stop (4 more retry) <5> SCore-D:WARNING Force to stop JOB 1 ... <5> SCore-D:WARNING Failed to stop job(s). <5> SCore-D:WARNING Force to kill JOB 1 |
This error means the Myrinet NIC has reset by timeout on packet receiving. If the error is not occurred again, you do not have to care about the error. If the error is occurred again, the error may come from hardware problems. |
10. |
The performance of Ethernet Trunking
is not good. It's the same level as that of one NIC. |
Please confirm "scorehosts.db". |
Questions |
Answers |
|
1. |
Can compilers other than GNU be used for MPICH of SCore? | Yes, it is possible. After editing site file, only the source of mpi is extracted, and does the following operations. # cd /opt/score/score-src/runtime/mpi # smake # smake install |
2. |
When executing "make" to compile
SCore source codes, we have an error message. |
It's a bug of SCore5.0.0. This bug is fixed in SCore5.0.1. |
3. |
We can't compile mpi using PGI
compiler. |
Please check the path of pgf90 compiler (/opt/pgi/linux86/bin/pgf90). |
4. |
How do I use Intel compiler version 8 on SCore 5.6.1? |
Please execute following step:
|
Questions |
Answers |
|
1. |
When the sample program of MPICH was
compiled with mpicc, and executed, the following error messages were received.
<8> ULT: Exception Signal (11) then the system appears to "hang". |
Please do the following command in a scout environment.% scout ls -l /opt/score/deploy/bin.i386-redhat7-linux2_4/scored*If the entire binary look the same, then it is OK, but not, you have to copy the SCore-D binary so that you have all the same binary files. |
2. |
An environment variable "DISPLAY"
isn't set automatically, like described in "howtouse/xwindow.html" |
It isn't set automatically. The document was old
and not correct. |
3. |
An scrun program outputs following warnings on SMP cluster system.
$ scrun ./a.out |
This warning says that the nodes have two CPUs but there is no entry on the scorehosts.db file. To avoid these warnings, check follwings in /opt/score/etc/scorehosts.db file
# /etc/rc.d/init.d/scoreboard reload |
4. |
The SCore demo applications which use X-window is failed to execute with following errors:
% scrun -nodes=2 /opt/score/demo/bin/pmandel |
The DISPLAY environment variable is not set, or displaying permission from other hosts is not allowed. Set DISPLAY variable and permission free using following commands.
% export DISPLAY=server.pcc.org:0.0 |
5. |
I used mpirun command and ran one application
(with an '&'i.e. in the background). I was not able to start a new job because it gave me an error message saying "SCOUT busy". |
Please use SCore-D Multi-User Environment. You must run scored with root. Then you must execute mpirun with -score scored= option without scout environment. Please refer to "Getting Started" of "How to Use SCore Cluster System Software" and "Executing SCore-D for the Multi-User Environment" of "SCore Cluster System Software Reference Guide". |
6. |
How does SCore assign jobs to
each CPU ? |
SCore doesn't assign jobs to CPU but to host. Therefore,
you can't specify CPU to execute a job as you like. |
7. |
How can I execute it with SCore though there is a program which needs the standard input? | SCore support standard input on SCore 5.2.0 or later.
SCore 5.0.1 or before, SCore does not support standard input directly. Do the following: % scrun scatter -node 0 == ./a.out |
8. |
When the Spare Hosts function uses, does the active program keep moving even if one Compute Host stops due to the breakdown? | It is possible only by multi user mode of SCore-D. Restart is done from the stage where checkpoint was gathered by restarting scored. Please refer to "Automatic Operation and High Availability of SCore-D" or "SCore Cluster System Software Reference Guide" for details. |
Questions |
Answers |
|
1. |
When compiling NPB on OpenMP,
we have an error. |
Please check "make.def", and confirm whether there
is CLINKFLAGS -lm. If you'll execute this program in SCASH environment, please
add CFLAGS and CLINKFLAGS in -omniconfig=scash. |
2. |
We can't execute LU of NPB on
OpenMP. |
Please set environment variables OMNI_SCASH_ARGS_SIZE
and OMNI_SCASH_ARGS_SIZE, according to "/opt/omni/doc/omni-scash-status.html". |
Questions |
Answers |
|
1. |
I want to do some performance analysis of MPI based program in SCore. | You can use profiling library in MPE. In MPICH/SCore, you can use upshot and Jumpshot 3 log viewer. For example, you want to use Jumpshot 3: 1. Compile and link mpi program with -mpilog option. % mpicc -mpilog foo.c -o foo2. Set PE_LOG_FORMAT environment variable to SLOG % setenv PE_LOG_FORMAT SLOHG3. Execute the program. % scrun ./fooThis program is created "program_name.slog". 4. viewing log file by logviewer. % logviewer foo.logTo use Jumpshot3, please see: /opt/score/doc/mpi/jumpshot/index.html For more detail for MPE profile library, please see also "MPE user guide". |
2. |
How do I use Intel compiler version 8 on SCore 5.6.1? |
Please execute following step:
|
Questions |
Answers |
|
1. |
We can't use "-l" option to sc_qsub. |
You can't use the option in SCore5.0.0. You can use the option in SCore 5.2.0 or later. |
2. |
Can I execute resources_max.walltime
with pbs ? |
You can execute it, but the response is very slow. |
Questions |
Answers |
|
1. |
Scstest or rcstest fails on PM/Ethernet | pm-ethernet.conf. Or use timeout
option such as:
% scstest -network ethernet -timeout 10 % rcstest node00 ethernet -v -timeout 10
|
2. |
Time until the processing beginning takes gradually in execution of the easy self-made program used in MPI. | Please check the following: 1. Whether does IRQ of ether overlap or not? - The overlap of IRQ can be judged by executing the following commands on Compute Host. % cat /proc/interrupts2. Does switching hub operate normally? Please do the power supply of the switch in off/on at once. Moreover, try to connect to other port because might be break specific port. Do the tuning of the following parameters in the pm-ethernet.conf file when it has no problem for the above-mentioned. maxnsend backoff |
3. |
MPICH/SCore on PM/Myrinet achieves
less performance than MPICH/GM on Myrinet2000. |
MPICH/GM uses Zero-copy communication at default,
MPICH/SCore does not use zero-copy communication at default. Please try to
use mpi_zerocopy=on at scrun option, such as :
% scrun-nodes=4x1,mpi_zerocopy=on a.out |
4. |
MPICH/SCore on PM/Ethernet achieves less performance than MPICH/p4(LAM) on Ethernet. | The default parameters defined in /opt/score/etc/pm-ethernet.conf
are not optimized. Please optimize the parameters using maxnsend and backoff
in pm-ethernet.conf . Or use mpi_eager option, such as:
scrun -nodes=4x1,mpi_eager=1000000 a.out |
Questions |
Answers |
|
1. |
Does SCore workings depend on
CPU architecture ? |
If CPU architecture is x86 or alpha, and you set
SCore environment correctly, then SCore works well. For different type of
processor of x86, EIT recognize that all hosts have the same processor and
the same performance, and register them in "/opt/score/etc/scorehosts.db".
|
2. |
Is it migrated to PowerPC? | No, SCore is not migrated to PowerPC. |
3. |
How can I execute a mpi program
not for SCore on SCore cluster ? |
What you have to do is only to install another mpi.
|
4. |
Does the program for commerce of MPI operate by SCore? | This is rather common and there is of course a work around. You may install in addition to SCore also a normal non optimized MPICH (using TCP over Ethernet or tcp over Myrinet) and run your application. |
5. |
Can the Compute Hosts be dual bootable? | If your cluster is installed RedHat 7.2, you may install
by binary rpm or by source without separate partition. (On Compute Host, SCore requires 50 MB on /opt and 1GB on /var/scored.) If your cluster is installed the other distribution, you may install RedHat 7.2 on a separate partition, and please install SCore by binary rpm or by source. Please look at "SCore Cluster System Software Installation Guide" for "By binary rpm files" and "By source" installation. In the any method, you must build kernel by source, and install the kernel to be dual bootable. This is depended by boot loader. |
6. |
Does SCore5.0.1 work on RedHat7.3
? |
it does. But if you wan to recompile SCore itself, please use SCrore 5.2. |
7. |
When Spare Hosts are defined in the scorehosts.db file, is this group name what may be set up by the same group name as other Compute Hosts? | Please do not put a Spare Hosts in the same group as
Compute Hosts. However, please match settings (network,msgbserv) other than group. |
8. |
Linux kernel hung when rpmtest
is executed on Myrinet. |
Check IRQ dispatching using as follows:
% cat /proc/interrupts If the IRQ number of Myrinet is same as the other devices, change
Myrinet IRQ number by changing BIOS setting or changing PCI slots of Myrinet
|
![]() |
PC Cluster Consortium |