[SCore-users-jp] [SCore-users] Re: checkpointing

Atsushi HORI hori @ swimmy-soft.com
2002年 7月 11日 (木) 10:32:14 JST


Hi, Mike,

#subject is changed.

I am Atsushi Hori and I designed and wrote (still writing) SCore-D 
staff.

>but i still wish to know how SCore checkpoints tasks( memory, network 
>contexts, etc ). where can i find the description of checkpointing 
>process? and, please, do not refer to source code, may be i am stupid, 
>but i had not saw it( the scheme of checkpointing ) when looked through 
>the SCored's sources( there are too little comments :-(, and i have too 
>little experience in "reverse engineering" )
>
>with hope on answer, mike.

Well, SCore-D is very complicated and I am sure nobody else can 
understand :-)

The basic idea of SCore-D checkpointing comes from Network 
Preemption. This idea is applied to gang scheduling and checkpointing 
in SCore-D.

In the current OSes, communication is thought to be not so frequent 
and everytime user wants to communicate it must issue system call. We 
thought the frequency of communication in parallel computation is 
much higher than that of distributed computing, and we designed to 
allow user processes to access network interface without any 
systemcalls, but when user processes are switched the network context 
is saved and restored. The network context includes the status of 
network interface hardware (NIC) and the messages in a network.

Well, in checkpointing, the 'whole context' of user's parallel 
process (a set of processes derived from the same program) consists 
of contexts of Unix (Linux) processes and the network context. Once 
network preemption is implemented, then the process contexts and 
network contexts are saved into disks, and restored when a user 
parallel process is restarted from the checkpoint.

The saving and restoring of the process context is not a new but a 
well-known technique and I am sure you can find some papers via web 
search engines.

I attached the paper presented in SC98 on the network preemption.

----
Atsushi HORI
Swimmy Software, Inc.

-------------- next part --------------
テキスト形式以外の添付ファイルを保管しました...
ファイル名: HORI98.PDF
型:         application/octet-stream
サイズ:     300411 バイト
説明:       無し
URL:        <http://new1.pccluster.org/pipermail/score-users-jp/attachments/20020711/5570615c/attachment.obj>


SCore-users-jp メーリングリストの案内