Automatic Operation and High Availability of SCore-D

Automatic Operation

sc_watch command enables automati operation of SCore-D. First, sc_watch creates a scout environment, then invokes SCore-D in the scout environment. Then sc_watch keeps watching SCore-D's response in a watch-dog way. When SCore-D does not responds more than a few minutes, then sc_watch assumes that something happened on the cluster, and tries to reboot the system from the beginning.

sc_watch program can invoke a Unix command when it detects a failure. If the Unix command is a shell script, and there is a Unix mail command to send a mail to the administrater, then the administrater will get an e-mail when the system goes down. When sc_watch detects a system failure, it can also invoke a scout command to cleanup side-effects.

Following is an example of sc_watch execution:

# sc_watch -g pcc scored
[14/Sep/2001,16:31:43] SC_WATCH (4.1) started.
[14/Sep/2001,16:31:43] Interval is set to 10 minutes.
[14/Sep/2001,16:31:43] Local Action  = (none)
[14/Sep/2001,16:31:43] Remote Action = (none)
[14/Sep/2001,16:31:43] Abort action  = (none)
[14/Sep/2001,16:31:43] Boot Retry Max. = 10
[14/Sep/2001,16:31:43] Booting System: scored 
SCOUT: Spawning done.    
14/Sep/2001 16:31:51 SYSLOG: /opt/score/deploy/scored
14/Sep/2001 16:31:51 SYSLOG: SCore-D 4.1 $Id: init.cc,v 1.63 2001/09/07 09:10:26 hori Exp $
14/Sep/2001 16:31:51 SYSLOG: Compile option(s): 
14/Sep/2001 16:31:51 SYSLOG: SCore-D network: myrinet/myrinet2k
14/Sep/2001 16:31:51 SYSLOG: Cluster[0]: (0..15)x2.i386-redhat7-linux2_4.i686.800
14/Sep/2001 16:31:51 SYSLOG:   Memory: 501[MB], Swap: 259[MB], Disk: 3027[MB]
14/Sep/2001 16:31:51 SYSLOG:   Network[0]: myrinet/myrinet2k
14/Sep/2001 16:31:51 SYSLOG:   Network[1]: ethernet/ethernet
14/Sep/2001 16:31:51 SYSLOG: Scheduler initiated: Timeslice = 500 [msec]
14/Sep/2001 16:31:51 SYSLOG:   Queue[0] activated,  exclusive scheduling
14/Sep/2001 16:31:51 SYSLOG:   Queue[1] activated,  time-sharing scheduling
14/Sep/2001 16:31:51 SYSLOG:   Queue[2] activated,  time-sharing scheduling
14/Sep/2001 16:31:51 SYSLOG: Session ID: 0
14/Sep/2001 16:31:51 SYSLOG: Server Host: comp00.pccluster.org
14/Sep/2001 16:31:51 SYSLOG: Backup Host: comp0f.pccluster.org
14/Sep/2001 16:31:51 SYSLOG: Operated by: root
14/Sep/2001 16:31:51 SYSLOG: SCore-D Watcher (server.pccluster.orgf:46514)
14/Sep/2001 16:31:51 SYSLOG: --------- SCore-D (4.1) bootup --------
...

When a system failure is detected, sc_watch tries to terminate SCore-D and then reboot the system.

[14/Sep/2001 16:41:22] System failure detected.
SCOUT: session done
[14/Sep/2001 16:41:24] System has been shutdown.
[14/Sep/2001 16:41:30] Booting System: scored
SCOUT: Spawn done.    
14/Sep/2001 16:41:51 SYSLOG: /opt/score/deploy/scored
14/Sep/2001 16:41:51 SYSLOG: SCore-D 4.1 $Id: init.cc,v 1.63 2001/09/07 09:10:26 hori Exp $
...

Unlike most of the other SCore commands, sc_watch must be invoked OUTSIDE of the scout environment. Because it kills SCore-D processes running on a cluster via scout. Here in this example, sc_watch is invoked with a host group option, similar to the scout command.

In the last SYSLOG output before the bootup message, there is a message that SCore-D is successfully connected SCore-D watcher, that is a sc_watch process invoked by a user. Through this TCP connection, sc_watch is watching SCore-D .

sc_watch process terminates when SCore-D is normally shutdown or by ^C (SIGINT).

Automatic Fault Host Replacement

The sc_watch, scoreboard and sceptic programs can cooperate so that SCore-D can survive with one host failure situation. Fisrt thing you have to do is sepcifying spare hosts. Spare hosts can be specified in the scorehosts.db file. If a host record has an attibute named spare, and the host is also listed in the defects file of the scoreboard command, then the value of the spare attribute is assumed to be the name of a spare host wo be replaced with the defected host. The spare hosts must have the same CPU, OS, and network(s) attirbutes. Let us assume that the file named replace.sh script file is defined as a local action of the sc_watch command. The script file may look like this;

host_group=pcc
install_root=/opt/score
#
$install_root/bin/sceptic -g $host_group >> /opt/score/etc/scorehosts.defects
echo defected hosts
cat /opt/score/etc/scorehosts.defects
echo new host list
$install_root/bin/scorehosts -r $host_group
/etc/rc.d/init.d/scoreboard stop
/etc/rc.d/init.d/scoreboard start
echo scoreboard is restarted.

The sceptic command investigates the hosts in the host group specified in the host_group shell variable. And it outputs the list of defected host(s) to the scorehosts.defects file.

The output of sceptic command must be appended to the /opt/score/etc/scorehosts.defects file. Otherwise, when defected host is repaired and come back, and then another host goes down, eventually two hosts are simultaneously replaced. In SCore 4.1, checkpointing file has parity blocks within the file so that the lost of a file on a host can be recovered. When two hosts are replaced at once, restarting from a checkpoint may fail if the parallel process was running on the replaced hosts.

The next things you have to do for the high availability is modifying the /etc/rc.d/init.d/scoreboard script file. You will find the following function in the script.

startsccoreboard() {
	pid=`pidofproc scoreboard`
        [ -n "$pid" ] && ps h $pid >/dev/null 2>&1 && return
	ulimit -c 0
	su nobody -c "$INSTALL_ROOT/sbin/scoreboard -file /opt/score/etc/scorehosts.db -pid" > /var/run/scoreboard.pid && success
}

This function must be modified like the folowing, so that the scoreboard command can locate the file listing defected hostnames.

startsccoreboard() {
	pid=`pidofproc scoreboard`
        [ -n "$pid" ] && ps h $pid >/dev/null 2>&1 && return
	ulimit -c 0
	su nobody -c "$INSTALL_ROOT/sbin/scoreboard -file /opt/score/etc/scorehosts.db -defects /opt/score/etc/scorehosts.defects -pid" > /var/run/scoreboard.pid && success
}

Finally, the sc_watch command is invoked on the server host where the scoreboard process is running.

# sc_watch -g pcc -l replace.sh scored
[14/Sep/2001,16:31:43] SC_WATCH (4.1) started.
[14/Sep/2001,16:31:43] Interval is set to 10 minutes.
[14/Sep/2001,16:31:43] Local Action  = replace.sh
[14/Sep/2001,16:31:43] Remote Action = (none)
[14/Sep/2001,16:31:43] Abort action  = (none)
[14/Sep/2001,16:31:43] Boot Retry Max. = 10
[14/Sep/2001,16:31:43] Booting System: scored 
SCOUT: Spawning done.    
...

Everytime SCore-D crashes, the sceptic command checks the cluster hosts. If there is a defected host, the name of defected host is recorded in the defect file. When the scoreboard process is restarted by the local action script, the defected host is replaced by the host specified by the spare attribute in the scorehosts.db file. Finally, SCore-D is restarted by the sc_watch command. If some user parallel processes have been checkpointed, then the lost checkpoint file on the defectd host is recovered using the parity blocks in the checkpoint files on the other hosts. Eventually user program execution is totally recovered.

Automatic Operation and High Availability of SCore-D

Automatic Operation

Automatic Fault Host Replacement

See Also