Network Trunking for PM/Ethernet Administrator's Guide

Index

  1. Introduction
  2. Hardware Setup
    1. Hardware Requirements
    2. Hardware Configuration
  3. Configuration File Setup
    1. Making pm-ethernet.conf files
    2. Modifying scorehosts.db file
    3. Modifying /etc/rc.d/init.d/pm_ethernet file
    4. Restarting PM/Ethernet and scoreborad
  4. Test Procedure of Single Network
  5. Test Procedure of Multiple Network
    1. Using rpmtest
    2. Using rcstest
  6. Performance tuning

1. Introduction

Network trunking is a technique to increase communication bandwidth by connecting multiple ethernet NICs(especially 100Base/T Ethernet). To realize network trunking communication, multiple ethernet NICs on one PC and ethernet switches for the ethernet NICs are needed, and PM/Ethernet configuration files for each ethernet NIC must be prepared and tested.

PM/Ethernet manages multiple NICs using a unit number (which is defiend in pm-ethernet.conf, and specified on etherpmctl command), and only NICs with same unit number on cluster nodes can communicate with each other. Moreover, Ethernet MAC address is directly used on PM/Ethernet communication, NICs with the same unit number must be installed on the same Ethernet network as well as each node can communicate with the other nodes using Ethernet address directly. However, you do not have to connect NICs with different unit number into same Ethernet network like the "Beowulf Channel Bonding" technique.

2. Hardware Setup

  1. Hardware Requirements

    Network Interface Cards:

    When you want to use within 2 NICs on one PC, a combination of different NIC hardware can be acceptable (ex, tulip + eepro100). But you want to use more than 2 NICs on one PC, same NIC hardware is recommended. Here is a list of tested NICs of network trunking.

    Number of NICsTested NICs
    2 NICsDEC Tulip, Intel EEPRO100, 3Com 3C905B, VIA chipset NICs
    3 NICsDEC Tulip, Intel EEPRO100, 3Com 3C905B
    4 NICsDEC Tulip, Intel EEPRO100
    Comments: VIA chipset NICs did not work on more than 2 NICs because of hardware error. 3Com 3C905B NICs worked on 4 NICs, but bandwidth performance did not increase. The network trunking using de4x5 driver causes system hung-up.

    Ethernet Switches:

    When you want to use 3 NICs on 8 node cluster, 3 eight port ethernet switches (or 1 sixteen port switch and 1 eight port switch) are needed, and no connection along the switches are needed. If you want to connect the cluster to other network one ethernet switch with more than 8 port is required.

  2. Hardware Configuration

    If you build a new cluster same motherboard is recommended because of allocation of ethernet device number such as a number XX in ethXX. If you use different motherboard, be careful to allocation of ethernet device number.

3. Configuration File Setup

Configuration files needed for network trunking are pm-ethernet.conf files for each ethernet device (such as eth0, eth1, eth2...). In this document, sample configuration files for 4 node cluster with 4 NICs are described.

  1. Following is a list used in this cluster.

    Compute hosts
    comp0.score.rwcp.or.jp
    comp1.score.rwcp.or.jp
    comp2.score.rwcp.or.jp
    comp3.score.rwcp.or.jp

  2. Create a Configuration file(pm-udp.conf) of PM/UDP(Agent)

    # Configuration file for PM/UDP(Agent)
    0 comp0.score.rwcp.or.jp
    1 comp1.score.rwcp.or.jp
    2 comp2.score.rwcp.or.jp
    3 comp3.score.rwcp.or.jp

  3. Making pm-ethernet.conf files
    1. Create a Configuration file(pm-ethernet-0.conf) of PM/Ethernet for eth0 device using following command. If you installed SCore3.2 using EIT this file is same as /opt/score/etc/pm-ethernet.conf
      # mkpmethernetconf -unit 0 -speed 100 -device eth0 pm-udp.conf pm-ethernet-0.conf
      # cat pm-ethernet-0.conf
      unit 0
      maxnsend 8
      0 00:90:CC:0F:B9:A0 comp0.score.rwcp.or.jp
      1 00:90:CC:0F:B9:A3 comp1.score.rwcp.or.jp
      2 00:20:18:58:AC:DA comp2.score.rwcp.or.jp
      3 00:20:18:58:BC:00 comp3.score.rwcp.or.jp

    2. Create a Configuration file(pm-ethernet-1.conf) of PM/Ethernet for eth1 device using following command
      # mkpmethernetconf -unit 1 -speed 100 -device eth1 pm-udp.conf pm-ethernet-1.conf
      # cat pm-ethernet-1.conf
      unit 1
      maxnsend 8
      0 00:90:CC:0F:B8:03 comp0.score.rwcp.or.jp
      1 00:90:CC:0F:B9:A9 comp1.score.rwcp.or.jp
      2 00:20:18:58:AC:EE comp2.score.rwcp.or.jp
      3 00:20:18:58:AE:61 comp3.score.rwcp.or.jp

    3. Create a Configuration file(pm-ethernet-2.conf) of PM/Ethernet for eth2 device using following command
      # mkpmethernetconf -unit 2 -speed 100 -device eth2 pm-udp.conf pm-ethernet-2.conf
      # cat pm-ethernet-2.conf
      unit 2
      maxnsend 8
      0 00:90:CC:0F:B8:25 comp0.score.rwcp.or.jp
      1 00:90:CC:0F:B9:C1 comp1.score.rwcp.or.jp
      2 00:20:18:58:AC:3E comp2.score.rwcp.or.jp
      3 00:20:18:58:AC:8B comp3.score.rwcp.or.jp

    4. Create a Configuration file(pm-ethernet-3.conf) of PM/Ethernet for eth3 device using following command
      # mkpmethernetconf -unit 3 -speed 100 -device eth3 pm-udp.conf pm-ethernet-3.conf
      # cat pm-ethernet-3.conf
      unit 3
      maxnsend 8
      0 00:90:CC:0F:B8:06 comp0.score.rwcp.or.jp
      1 00:90:CC:0F:B9:AD comp1.score.rwcp.or.jp
      2 00:20:18:58:AC:3C comp2.score.rwcp.or.jp
      3 00:20:18:58:AC:EC comp3.score.rwcp.or.jp

  4. Copy the configuration files (pm-ethernet-[0123].conf) to /opt/score/etc

    # cp pm-ethernet-[0123] /opt/score/etc

  5. Modifying scorehosts.db file
    Add following entries to /opt/score/etc/scorehosts.db and add network (ethernet-0,ethernet-1,ethernet-2,ethernet-3,ethernet-x2,ethernet-x3,ethernet-x4) to scorehosts.db.

    ethernet-0 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-0.conf
    ethernet-1 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-1.conf
    ethernet-2 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-2.conf
    ethernet-3 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-3.conf
    ethernet-x2 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-1.conf \
    	-trunk0:file=/opt/score/etc/ethernet-2.conf
    ethernet-x3 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-2.conf \
    	-trunk0:file=/opt/score/etc/ethernet-1.conf \
    	-trunk1:file=/opt/score/etc/ethernet-0.conf
    ethernet-x4 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-3.conf \
    	-trunk0:file=/opt/score/etc/ethernet-0.conf \
    	-trunk1:file=/opt/score/etc/ethernet-1.conf \
    	-trunk2:file=/opt/score/etc/ethernet-2.conf
    

    # cat /opt/score/etc/scorehosts.db
    
    /* PM/Ethernet */
    ethernet        type=ethernet \
                    -config:file=/opt/score/etc/pm-ethernet.conf
    ethernet-0 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-0.conf
    ethernet-1 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-1.conf
    ethernet-2 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-2.conf
    ethernet-3 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-3.conf
    ethernet-x2 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-1.conf \
    	-trunk0:file=/opt/score/etc/ethernet-2.conf
    ethernet-x3 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-2.conf \
    	-trunk0:file=/opt/score/etc/ethernet-1.conf \
    	-trunk1:file=/opt/score/etc/ethernet-0.conf
    ethernet-x4 type=ethernet \
    	-config:file=/opt/score/etc/ethernet-3.conf \
    	-trunk0:file=/opt/score/etc/ethernet-0.conf \
    	-trunk1:file=/opt/score/etc/ethernet-1.conf \
    	-trunk2:file=/opt/score/etc/ethernet-2.conf
    #include "/opt/score/etc/ndconf/0"
    #include "/opt/score/etc/ndconf/1"
    #include "/opt/score/etc/ndconf/2"
    #include "/opt/score/etc/ndconf/3"
    
    #define MSGBSERV        msgbserv=(server.score.rwcp.or.jp:8764)
    
    comp0.score.rwcp.or.jp NODE_0 \
     network=ethernet,ethernet-0,ethernet-1,ethernet-2,ethernet-3,ethernet-x2,ethernet-x3,ethernet-x4 group=_scoreall_,pccall smp=1 MSGBSERV
    comp1.score.rwcp.or.jp NODE_1 \
     network=ethernet,ethernet-0,ethernet-1,ethernet-2,ethernet-3,ethernet-x2,ethernet-x3,ethernet-x4 group=_scoreall_,pccall smp=1 MSGBSERV
    comp2.score.rwcp.or.jp NODE_2 \
     network=ethernet,ethernet-0,ethernet-1,ethernet-2,ethernet-3,ethernet-x2,ethernet-x3,ethernet-x4 group=_scoreall_,pccall smp=1 MSGBSERV
    comp3.score.rwcp.or.jp NODE_3 \
     network=ethernet,ethernet-0,ethernet-1,ethernet-2,ethernet-3,ethernet-x2,ethernet-x3,ethernet-x4 group=_scoreall_,pccall smp=1 MSGBSERV
    

    In this file, ethernet-0, ethernet-1, ethernet-2 and ethernet-3
    networks should be used for test purpose only, and should be removed
    after following communication tests are finished. Because, these
    definition causes a trouble in SCore-D multiuser environment.
    

  6. Modifying /etc/rc.d/init.d/pm_ethernet file

    A sample code for /etc/rc.d/init.d/pm_ethernet file is as follows:
    #!/bin/sh
    #
    # pm_ethernet:  Starts the PM Ethernet driver
    #
    # Version:      @(#) /etc/rc.d/init.d/pm_ethernet 1.00
    #
    # Author:       Shinji Sumimoto (Real World Computing Partnership)
    # chkconfig: 345 90 18
    # description: PM Ethernet driver
    # probe: true
    
    IF=eth0
    UNIT=0
    INTERRUPT_REAPING=on
    
    # Source function library.
    . /etc/rc.d/init.d/functions
    
    # check module
    module=`modprobe -l pm_ethernet_dev.o`
    
    # See how we were called.
    case "$1" in
      start)
            echo
            if [ x$module != x ]; then
                modprobe pm_ethernet_dev
            fi
            ifconfig eth1 up
            ifconfig eth2 up
            ifconfig eth3 up
            /sbin/etherpmctl $IF -pm on -ir $INTERRUPT_REAPING -unit $UNIT -sc off
            /sbin/etherpmctl eth1 -pm on -ir $INTERRUPT_REAPING -unit 1 -sc off
            /sbin/etherpmctl eth2 -pm on -ir $INTERRUPT_REAPING -unit 2 -sc off
            /sbin/etherpmctl eth3 -pm on -ir $INTERRUPT_REAPING -unit 3 -sc off
            touch /var/lock/subsys/pm_ethernet
            ;;
      stop)
            echo -n "Stopping PM/Ethernet: "
            if [ x$module != x ]; then
                rmmod pm_ethernet_dev
            fi
            /sbin/etherpmctl $IF -pm off
            /sbin/etherpmctl eth1 -pm off
            /sbin/etherpmctl eth2 -pm off
            /sbin/etherpmctl eth3 -pm off
            ifconfig eth1 down
            ifconfig eth2 down
            ifconfig eth3 down
            echo
            rm -f /var/lock/subsys/pm_ethernet
            ;;
      status)
            if [ x$module != x ]; then
                /sbin/lsmod
            fi
            ;;
      restart)
            $0 stop
            $0 start
            ;;
      *)
            echo "Usage: $0 {start|stop|status|restart}"
            exit 1
    esac
    

  7. Restarting PM/Ethernet and scoreborad

    Send HUP signal to scoreboard, and execute
    #/etc/rc.d/init.d/pm_ethernet restart

4. Test Procedure of Single Network

  1. Test sequence of eth1, eth2, eth3 network using rpmtest

    See PM/Ethernet Test Procedure, and use network ethernet-1, ethernet-2 or ethernet-3 instead of ethernet.

  2. Test sequence using rcstest
    Test sequence of eth0 network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-0 -v
    starting master 0 : pe=4
    starting slave: 2 3 1.

    testing*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.
    *.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.*.*.*.*.*.*.*.*.*.*comp3( 3) Signal: Interrupted system call(4)
    comp0( 0) Signal: Interrupted system call(4)
    comp1( 1) Signal: Interrupted system call(4)
    comp2( 2) Signal: Interrupted system call(4)

    Use Ctrl-C to quit this test program

  3. Test sequence of eth1 network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-1 -v

  4. Test sequence of eth2 network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-2 -v

  5. Test sequence of eth3 network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-3 -v

5. Test Procedure of Multiple Network

  1. Test sequence of 2 NICs trunking network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-x2 -v
    starting master 0 : pe=4
    starting slave: 2 3 1.

    testing*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.
    *.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
    *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*
    .*.*.*.*.*.*.*.*.*.*.*comp3( 3) Signal: Interrupted system call(4)
    comp0( 0) Signal: Interrupted system call(4)
    comp1( 1) Signal: Interrupted system call(4)
    comp2( 2) Signal: Interrupted system call(4)

    Use Ctrl-C to quit this test program

  2. Test sequence of 3 NICs trunking network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-x3 -v

  3. Test sequence of 4 NICs trunking network using rcstest

    # /opt/score/sbin/rcstest comp0.score.rwcp.or.jp ethernet-x4 -v

6. Performance Tuning

  1. You can tune network trunking communication performance by changing the maxnsend and backoff value in pm-ethernet.conf.

Parallel and Distributed System Software Laboratory
Real World Computing Partnership
score-info@rwcp.or.jp
CREDIT
This document is a part of the SCore cluster system software developed at Real World Computing Partnership, Japan. Copyright (c) 2000, 1999 Real World Computing Partnership.