Resuming SCore-D from an Unexpected Failure

SCore-D checkpoints itself every time a user logs in and out. If SCore-D is re-invoked with the restart option, it tries to recover itself from its most recent checkpoint. The command-line is as follows:
scored -restart
By resuming SCore-D, restartable users parallel processes are recovered. User parallel processes which were specified with a restart or checkpoint option, or those which were checkpointed with "^\" (SIGQUIT) are considered restartable. If a checkpoint image was found for a restartable parallel process, SCore-D tries to resume it from the checkpoint.

Following is an example of a successful restart of SCore-D and a user-parallel process:
# scored -restart
SYSLOG: Timeslice is set to 500[ms]
SYSLOG: Cluster[0]: comp0.trc.rwcp.or.jp@0...comp3.trc.rwcp.or.jp@3
SYSLOG:   BIN=linux, CPUGEN=pentium-iii, SMP=1, SPEED=500
SYSLOG:   Network[0]: myrinet/myrinet
SYSLOG: SCore-D network: myrinet/myrinet
SYSLOG: Recover: user1@host1.trc.rwcp.or.jp:4681
SYSLOG: SCore-D server: comp3.trc.rwcp.or.jp:9901
If the restart option is not specified when re-invoking SCore-D, previously checkpointed user parallel processes are not restarted and the checkpoint images are lost:
# scored
SYSLOG: Timeslice is set to 500[ms]
SYSLOG: Cluster[0]: comp0.trc.rwcp.or.jp@0...comp3.trc.rwcp.or.jp@3
SYSLOG:   BIN=linux, CPUGEN=pentium-iii, SMP=1, SPEED=500
SYSLOG:   Network[0]: myrinet/myrinet
SYSLOG: SCore-D network: myrinet/myrinet
SYSLOG: Recover canceled by SCore-D: user1@host1.trc.rwcp.or.jp:4672
SYSLOG: SCore-D server: comp3.trc.rwcp.or.jp:9901
If the restart option is specified but the user parallel process has already been killed by the user, then the following messages will be observed:
# scored -restart
SYSLOG: Timeslice is set to 500[ms]
SYSLOG: Cluster[0]: comp0.trc.rwcp.or.jp@0...comp3.trc.rwcp.or.jp@3
<7> SCore-D:WARNING connect_fep(host1.trc.rwcp.or.jp:4679)=111 failed !!
SYSLOG:   BIN=linux, CPUGEN=pentium-iii, SMP=1, SPEED=500
SYSLOG:   Network[0]: myrinet/myrinet
SYSLOG: SCore-D network: myrinet/myrinet
SYSLOG: Recover canceled by user: user1@host1.trc.rwcp.or.jp:4679
SYSLOG: SCore-D server: comp3.trc.rwcp.or.jp:9901
If restart option does not work well, then reset SCore-D environment must be done. Use reset option in this case. Note that user programs will not be restarted when reset option is specified.
CREDIT
This document is a part of the SCore cluster system software developed at Real World Computing Partnership, Japan. Copyright (c) 2000, 1999 Real World Computing Partnership.