[SCore-users] Re: checkpointing
Atsushi HORI
hori at swimmy-soft.com
Thu Jul 11 10:32:14 JST 2002
Hi, Mike,
#subject is changed.
I am Atsushi Hori and I designed and wrote (still writing) SCore-D
staff.
>but i still wish to know how SCore checkpoints tasks( memory, network
>contexts, etc ). where can i find the description of checkpointing
>process? and, please, do not refer to source code, may be i am stupid,
>but i had not saw it( the scheme of checkpointing ) when looked through
>the SCored's sources( there are too little comments :-(, and i have too
>little experience in "reverse engineering" )
>
>with hope on answer, mike.
Well, SCore-D is very complicated and I am sure nobody else can
understand :-)
The basic idea of SCore-D checkpointing comes from Network
Preemption. This idea is applied to gang scheduling and checkpointing
in SCore-D.
In the current OSes, communication is thought to be not so frequent
and everytime user wants to communicate it must issue system call. We
thought the frequency of communication in parallel computation is
much higher than that of distributed computing, and we designed to
allow user processes to access network interface without any
systemcalls, but when user processes are switched the network context
is saved and restored. The network context includes the status of
network interface hardware (NIC) and the messages in a network.
Well, in checkpointing, the 'whole context' of user's parallel
process (a set of processes derived from the same program) consists
of contexts of Unix (Linux) processes and the network context. Once
network preemption is implemented, then the process contexts and
network contexts are saved into disks, and restored when a user
parallel process is restarted from the checkpoint.
The saving and restoring of the process context is not a new but a
well-known technique and I am sure you can find some papers via web
search engines.
I attached the paper presented in SC98 on the network preemption.
----
Atsushi HORI
Swimmy Software, Inc.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HORI98.PDF
Type: application/octet-stream
Size: 300411 bytes
Desc: not available
URL: <http://new1.pccluster.org/pipermail/score-users/attachments/20020711/5570615c/attachment.obj>
More information about the SCore-users
mailing list