PM/Myrinet Test Procedure

This test procedure assumes that the X window system is already running on the server. All commands must be invoked on the server host. For Myrinet cards having LANai 7 (M2M-PCI64A, M2L-PCI64A and M2M-PMC64A), make sure for the cables properly connected before the test.

If your Myrinet network is Myrinet 2000 with serial or fiber link interface (M3S-PCI64B and M3F-PCI64B), please use myrinet2k network type and lanaiM2k.mcp firmware. If your Myrinet network is Myrinet XP (M3F-PCIXD), please use myrinetxp network type and lanaixp.mcp firmware. In order to change network type in rpmtest and scstest, you must modify your scorehosts.db on your server. Please refer to SCore System Installation on Server section.

When you don't use myrinet network name, please replace network name myrinet to myrinet2k in following commands.
Ex. $ ./rpminit comp0 myrinet2k

  1. Loopback test
    Issue the following commands:
    $ cd /opt/score/sbin
    $ ./rpminit comp0 myrinet
    $ ./rpmtest comp0 myrinet -dest 0 -ping
    	Success: Two numbers similar to "8     1.2269e-05" are printed
                     The first number is the data size (in bytes)
                     The second number is the latency (in seconds)
    	Failure: Error and dump messages are printed
    
    If this test fails, read the Troubleshooting section.

  2. Point-to-Point test (Message)
    Make sure that the PM kernel driver is installed on all hosts. To test the PM communication function, use the rpmtest command. For example, if you want to test communication from comp0 (node 0) to comp1 (node 1), please follow the instruction below:

    Issue the following commands:

    $ cd /opt/score/sbin
    $ ./rpmtest comp1 myrinet -reply
    
    Issue the following commands in another window on the server host:
    $ cd /opt/score/sbin
    $ ./rpmtest comp0 myrinet -dest 1 -ping
    	Results should be the same as the loopback test
    
    When the test finishes, do not forget to kill the pmtest process on the server host because the process is in an infinite loop.
    If this test fails, read the Troubleshooting section.

  3. Point-to-Point test (Zero Copy)
    To test communication from comp0 (node 0) to comp1 (node 1), please follow the instruction below:

    Issue the following commands:

    $ cd /opt/score/sbin
    $ ./rpmtest comp1 myrinet -vreply
    
    Issue the following commands in another window on the server host:
    $ cd /opt/score/sbin
    $ ./rpmtest comp0 myrinet -dest 1 -vwrite
    	Success: Two numbers similar to "8     913452" are printed
                     The first number is the data size (in bytes)
                     The second number is the bandwidth (in bytes/sec)
    	Failure: Error and dump messages are printed
    
    When the test finishes, do not forget to kill the pmtest process on the server host because the process is in an infinite loop.
    If this test fails, read the Troubleshooting section.

  4. Total test
    This is the stress test. All hosts randomly send burst messages to other hosts. Issue the following commands:
    $ cd /opt/score/deploy
    $ scout -g pcc
    SCOUT: Spawn done.   
    SCOUT: session started
    $ ./scstest -network myrinet
    CSTEST: BURST on myrinet
    50 K messages.
    100 K messages.
    150 K messages.
    200 K messages.
    250 K messages.
    300 K messages.
            Success: number of messages to send are printed.
            Failure: Error messages are printed.
    
    To stop this test, press Ctrl-C or another interrupt command.
    If this test fails, read the Troubleshooting section.

Troubleshooting

Pminit causes an error.
Loop back test fails.
Point-to-point test fails.
Total test fails.
Some Troubleshooting on Myrinet2000 with serial or fiber links (M3S-PCI64B and M3F-PCI64B).(new)

$Id: pm-testmyrinet.html,v 1.5 2003/10/20 09:22:16 kameyama Exp $