[SCore-users-jp] [SCore-users] Replacement with sparenodes
David Werner
david.werner @ iws.uni-stuttgart.de
2004年 3月 15日 (月) 20:47:49 JST
Hello together,
I have a question regarding the replacement with
sparenodes. We have in our cluster due to hardware problems
a lot of hassle. And so we defined three nodes to be spare
and let run via sc_watch some replacement routine.
What now happens is that at first a one node was automatically
excluded replaced due to problem and some days later
another node fails and again I would expect that it can be replaced.
But this does not happen:
The nodes which are sparenodes are node40, node41 and node42.
At first node19 failed. It was replaced by node41.
Now node12 failed and the scored did not start through:
From syslog I get the following messages:
12/Mar/2004 19:58:23 SYSLOG: /opt/score/deploy/scored
12/Mar/2004 19:58:23 SYSLOG: SCore-D 5.6.0 $Id: init.cc,v 1.69 2003/09/26 07:16:45 hori Exp $
12/Mar/2004 19:58:23 SYSLOG: Compile option(s):
12/Mar/2004 19:58:23 SYSLOG: SCore-D network: ethernet-x3/ethernet
12/Mar/2004 19:58:24 SYSLOG: Cluster[0]: (0..41)x1.i386-redhat7-linux2_4.i686.1800
12/Mar/2004 19:58:24 SYSLOG: Memory: 1010[MB], Swap: 1028[MB], Disk: 6046[MB]
12/Mar/2004 19:58:24 SYSLOG: Network[0]: ethernet-x3/ethernet
12/Mar/2004 19:58:24 SYSLOG: Queue[1] activated, time-sharing scheduling
12/Mar/2004 19:58:24 SYSLOG: Queue[2] activated, time-sharing scheduling
12/Mar/2004 19:58:24 SYSLOG: Session ID: 0
12/Mar/2004 19:58:24 SYSLOG: Server Host: node31.cluster
12/Mar/2004 19:58:24 SYSLOG: Backup Host: node7.cluster
12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:WARNING Host node12.cluster is replaced by node41.cluster.
12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:WARNING Host node41.cluster is replaced by node40.cluster.
12/Mar/2004 19:58:24 SYSLOG: <27> SCore-D:ERROR Unable to continue session-0.
When I list the hosts with "scorehosts -g pcc -r".
I get a list that shows me that now
node41 is in place for node12 and node40 is in place for node19.
This also is to me not so logical, as I would expect that
node41 should be stay at the place of node19.
Where I deviated from the documentation is that we restart
in a failure situation all scored daemons on the score server
that is msgbserv, scoreboard, sc_syslog and scbcast.
I did this because someone reported to me that sc_syslog sometimes
silently disappears when scored is restarted.
Am I wrong to restart all those daemons? (I can imagine that
one must not restart msgbserv).
Or have I reckon with a similiar behaviour when I do not?
That would render the use of more then one sparenode in
many cases as useless.
I'll try to do this afternoon some tests. I only tested
the replacement with one host before.
Any comments?
Our scorehosts.db-file is to this mail attached.
Greetings,
David
-------------- next part --------------
文字コード指定の無い添付文書を保管しました...
名前: scorehosts.db
URL: <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20040315/82e99311/attachment.ksh>
SCore-users-jp メーリングリストの案内