[SCore-users-jp] Re: [SCore-users] some timeout problem?

2005年 3月 26日 (土) 01:46:22 JST

On Fri, Mar 25, 2005 at 11:06:59AM +0900, Shinji Sumimoto wrote:
> Hi.
> 
> From: David Werner <david.werner ＠ iws.uni-stuttgart.de>
> Subject: [SCore-users] some timeout problem?
> Date: Thu, 24 Mar 2005 11:04:21 +0100
> Message-ID: <20050324100421.GA3244 ＠ nalle.bauingenieure.uni-stuttgart.de>
> 
> david.werner> Dear List, 
> david.werner> 
> david.werner> We use score with the network-trunking facility over two ethernet
> david.werner> network-cards (100 Mbit/sec).  All network cards in the score have its
> david.werner> own exclusive interrupt and are also exclusively used
> david.werner> by the pm_ethernet driver.
> david.werner> Every node-computer has a third network card which is only used
> david.werner> for the TCP/IP-traffic.  We run SCore-5.6.0 on a 2.4.21 Linux kernel. 
> david.werner> What I occasionally observe is the following kernel message occuring
> david.werner> randomly once in a few weeks on some singly nodes of the cluster: 
> david.werner> 
> david.werner> eth2: TX underrun, threshold adjusted.
> david.werner> or
> david.werner> eth1: TX underrun, threshold adjusted.
> 
> Maybe you are using eepro100 NIC, isn't it?
> This is not an error, especially, the error occurs once per week.

Dear Shinji Sumimoto, Hello again list,

Yes, this is true we have eepro100 NICs.
Thank you for passing me the description below.

> 
> Here is a description about the problem:
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0401.1/0651.html
> ==========================================================================
> This isn't really an error, it's an indicator that the pci-bus doesn't
> really keep up, then the NIC has to increase the threshold (it tries to
> start sending the packet out before it's fully transferred from main
> memory to the NIC, it hopes the rest of the packet will have been
> transferred in time, this message indicates that it wasn't so the NIC
> had to increase the threshold of how much of the packet has to have been
> transferred before it starts sending it out)
> 
> This happens with the eepro100 driver as well but it doesn't tell you
> about it, it just increases the threshold and goes on.
> The e100 driver tells you about it _and_ it actually decreases the
> threshold if there hasn't been any underruns for a while, and when it is
> decreased, the threshold gets too small and you get an underrun
> again....
> ==========================================================================
> 
> david.werner> As we use eth1 and eth2 for pm_ethernet.
> david.werner> Today I observed that it's occurence correlated with a crash of scored 
> david.werner> run by sc_watch.
> david.werner> Is there something I can do to improve the stability of our
> david.werner> score installation?
> 
> How many nodes are you using scored multi-user mode? And are there any
> problems about your cluster hardware? 

Currently about 36 nodes in multiuser mode. 
We have had some hardware problems with the current-support
on the motherboards, but now only repaired 
boards are included in the score-cluster which seem to run fine.
Maybe we should go to a smaller "maxnsend" paramater in 
pm_ethernet-?.conf currently we have there 16, backoff is 
20000 (mysecs).  I'll try to do again some scs-tests.

The points below are addressed to the userland of 
version 5.6.0 we use, I can't say much about 5.8.x. 

I mailed a few month back that i tried to setup "stable" operation 
with instable boards via "sc_watch" and the things described in 
"Automatic Operation and High Availability of Score-D".
I got it pretty far where I used an own
shell-script version of "sceptic" to detect failures.
But then I found in practise the if I use two or more spare-nodes, 
that when it comes to a failure of a second node and with it 
to use of two spare nodes it strongly depends whether it lies 
before the first failed node or after it that the system can 
recover automatically.  If it lies before the firstly failed node,
it can hardly, as now the first spare node the one which was 
used for the firstly failed node is now set by score to fill the 
new gap created by the second failure, and the second spare node
is set to fill the gap which before the first spare node filled.
Thus to score it looks as a failure of two nodes at the same time 
which is not recoverable. 
I think it may be possible to restart if you edit scorehosts.db 
and to change the order of sparenodes, according to your current 
failure-situation.

What I also found a subject to improve was that you can't pass 
"-restart" option to scored when it get started for the first 
time via sc_watch.
You can generic pass arguments to scored when it is started 
by sc_watch, the problem is just that sc_watch adds "-reset" 
to the options for scored which is contradictorily to "-restart" and is 
this is processed firstly.
I created thus a modified version where I pass for simplicity
reasons of my patch always "-restart" to scored. 
This allows me to "restart" even if i had to end sc_watch 
in between for some checking of the cluster.
Maybe one should just omit the adding "-reset" by sc_watch.

Another small problem with automatic operation via sc_watch is
the lack of doing some operations after it has restarted the 
scored in a failure situation.  What I need is the possibility to 
set automatically limits which I usually do via sc_console-command
in a script, but i can't do that in the "local command" of sc_watch
as that runs before the restart of scored.
Maybe that can be included in the design of scored to 
just set some defaults when it starts up.

> 
> If there is no problem on your cluster hardware, 
> please try to increase a timeout of sc_watch.
> 
> Kameyama-san, could you explain the method to increase a timeout of
> sc_watch?

I'll increase it. 
But I think as he told it is 10 minutes (the manpage says it too)
are already quite a long time to restart a driver.

Greetings, 
	David

> 
> Shinji.
> 
> david.werner> Greetings,
> david.werner> 	David
> david.werner> _______________________________________________
> david.werner> SCore-users mailing list
> david.werner> SCore-users ＠ pccluster.org
> david.werner> http://www.pccluster.org/mailman/listinfo/score-users
> david.werner> 
> ------
> Shinji Sumimoto, Fujitsu Labs
_______________________________________________
SCore-users mailing list
SCore-users ＠ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users