Known Problems Related to Linux/Hardware


  1. Ethernet drivers - tulip and de4x5
    If your Ethernet driver is a tulip or de4x5, the network may be frozen with heavy communication traffic. This problem depends on your driver card and the driver version. The following is our experience:
                                 tulip     de4x5
    Alpha 21164 + DC21140 NIC    STABLE    UNSTABLE
    Pentium Pro + DC21140 NIC    STABLE    STABLE
    Alpha 21264 + DC21143 NIC    UNSTABLE  STABLE
    
  2. Ethernet Driver - eepro100
    The eepro100 driver has several versions.

    It depends on the driver version and EEPRO100 card revision. Our experiences with several EEPRO100 card revisions are as follows:
    VersionTransmission problemsk_buff problem
    v1.06 10/16/98 in Redhat 6.1No ProblemProblem
    v1.08 5/3/99ProblemNo Problem
    v1.09r2 10/15/99 in SuSE 6.3No ProblemNo Problem
    v1.09j-t 9/29/99No ProblemNo Problem

    The SCore 3.1 distribution contains the v1.09j-t version.

    Related to sk_buff:
    Using the eepro100 ethernet driver, the sk_buff (the communication buffer in the Linux kernel), it is possible to consume kernel memory leading to a system down. This phenomena happens if the sending buffer of the driver is full. We think that if your network uses switches instead of HUB and no applications sending large messages are run, you will not meet it.

    Anyway, it is better to watch whether or not the size of the sk_buff head cache does not increase. To see the usage, look at /proc/slabinfo:

    	$ cat /proc/slabinfo
    	.....
    	.....
    	skbuff_head_cache     20      105
            .....
    
    If the size increases rapidly, the system will eventually be frozen.

  3. Pentium II Deschutes
    PM/Myrinet does not run using Pentium II Deschutes stepping 1 chips. We think that error A37 reported in Intel's documentation might cause our problem on stepping 1. See http://developer.intel.com/design/pentiumii/specupdt/243337.htm The following processors can possibly have Deschutes stepping 1:

    If you have such a processor, it is better to check it. If your operating system is Linux, please perform the following:

    $ cat /proc/cpuinfo
    ....
    model        : Pentium II (Deschutes)
    vender_id    : .......
    stepping     : 1
    ....
    
    Other Pentium II chips are OK, as far as we have tested.

  4. Myrinet
    1. During our testing we noticed some Myrinet cards have a heat problem. Please consult your local distributor if you experience the same problem.
    2. We have tested Myrinet cards made after December, 1998. We found that some cards made before that date do not run correctly.
    3. Communication LANai 4.X between LANai 7.X or higher does not work. This limitation comes from PM/Myrinet LANai firmware.

  5. PM/Ethernet on Compaq Alpha SMP
    PM/Ethernet does not achieve good performance on Compaq Alpha SMP platform because of Alpha SMP kernel problem. We checked that preliminary implementation of PM/Ethernet on Linux 2.3.99-pre4 kernel achieved good performance. Uni-processor Alpha does not have this problem, and other PM/Myrinet does not have this problem on Alpha SMP kernel.

    If you want to use PM/Ethernet on Compaq Alpha SMP system, Linux 2.2.16 or upper version must be used because older version kernel causes system hung up.


PCCC logo PC Cluster Consotium

CREDIT
This document is a part of the SCore cluster system software developed at PC Cluster Consortium, Japan. Copyright (C) 2003 PC Cluster Consortium.