[SCore-users-jp] [SCore-users] Replacement with sparenodes

David Werner david.werner @ iws.uni-stuttgart.de
2004年 3月 15日 (月) 20:47:49 JST


Hello together, 

I have a question regarding the replacement with 
sparenodes.  We have in our cluster due to hardware problems 
a lot of hassle. And so we defined three nodes to be spare 
and let run via sc_watch some replacement routine. 
What now happens is that at first a one node was automatically
excluded replaced due to problem and some days later 
another node fails and again I would expect that it can be replaced. 
But this does not happen: 
The nodes which are sparenodes are node40, node41 and node42.

At first node19 failed. It was replaced by node41. 
Now node12 failed and the scored did not start through:
From syslog I get the following messages: 

	12/Mar/2004 19:58:23 SYSLOG: /opt/score/deploy/scored
	12/Mar/2004 19:58:23 SYSLOG: SCore-D 5.6.0 $Id: init.cc,v 1.69 2003/09/26 07:16:45 hori Exp $
	12/Mar/2004 19:58:23 SYSLOG: Compile option(s):
	12/Mar/2004 19:58:23 SYSLOG: SCore-D network: ethernet-x3/ethernet
	12/Mar/2004 19:58:24 SYSLOG: Cluster[0]: (0..41)x1.i386-redhat7-linux2_4.i686.1800
	12/Mar/2004 19:58:24 SYSLOG:   Memory: 1010[MB], Swap: 1028[MB], Disk: 6046[MB]
	12/Mar/2004 19:58:24 SYSLOG:   Network[0]: ethernet-x3/ethernet
	12/Mar/2004 19:58:24 SYSLOG:   Queue[1] activated, time-sharing scheduling
	12/Mar/2004 19:58:24 SYSLOG:   Queue[2] activated, time-sharing scheduling
	12/Mar/2004 19:58:24 SYSLOG: Session ID: 0
	12/Mar/2004 19:58:24 SYSLOG: Server Host: node31.cluster
	12/Mar/2004 19:58:24 SYSLOG: Backup Host: node7.cluster
	12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:WARNING Host node12.cluster is replaced by node41.cluster.
	12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:WARNING Host node41.cluster is replaced by node40.cluster.
	12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:ERROR Unable to continue session-0.


When I list the hosts with "scorehosts -g pcc -r". 
I get a list that shows me that now 
node41 is in place for node12 and node40 is in place for node19. 
This also is to me not so logical, as I would expect that 
node41 should be stay at the place of node19.
Where I deviated from the documentation is that we restart 
in a failure situation all scored daemons on the score server 
that is msgbserv, scoreboard, sc_syslog and scbcast. 
I did this because someone reported to me that sc_syslog sometimes
silently disappears when scored is restarted.
Am I wrong to restart all those daemons? (I can imagine that 
one must not restart msgbserv).
Or have I reckon with a similiar behaviour when I do not? 
That would render the use of more then one sparenode in 
many cases as useless.
I'll try to do this afternoon some tests. I only tested 
the replacement with one host before.
Any comments? 
Our scorehosts.db-file is to this mail attached.

Greetings, 
	David


-------------- next part --------------
文字コード指定の無い添付文書を保管しました...
名前: scorehosts.db
URL:  <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20040315/82e99311/attachment.ksh>


SCore-users-jp メーリングリストの案内