[SCore-users] sleep or signal problems

Bogdan Costescu bogdan.costescu at iwr.uni-heidelberg.de
Wed Oct 9 21:54:51 JST 2002


On Wed, 9 Oct 2002, Atsushi HORI wrote:

> Aha, now I understand your problem (a little bit).

You gave me all the right clues. Thank you !

> SCore runtime library forks (actually clone) another process which is 
> to have the same memory space. We call this process "shadow process." 

I found it in scoredlib/usr/shadow.c. I tried to replicate it in my 
test program with fork(2) but it didn't work as you described, so I copied 
the code from shadow.c which uses clone(2) with syscalls translated to 
getpid(2) and kill(2) and this worked.

> So, the wait() function in your code returns when the closed shadow 
> process stops because of SIGSTOP.

Yes, I was able to replicate this in my test program. Whenever the child 
sends itself SIGSTOP, the parent receives SIGCHLD. If at this point, 
another SIGSTOP is sent to the child, the parent is not signalled again. 
However, if a SIGCONT is sent to the child, the child "wakes up" and sends 
itself again a SIGSTOP at which point the parent receives again SIGCHLD. 
So I assume that the SCore scheduler sends SIGCONT every half a second to 
all processes belonging to SCore jobs, which makes the parent receive a 
SIGCHLD signal every half a second.

So, SIGCHLD is another signal that cannot be used with SCore. It would 
probably be helpful for other developers to mention this somewhere in the 
documentation.

I'll try to modify the ARMCI library from GlobalArrays to prevent it from 
dying when wait(2) returns an error code; if successfull I'll post later a 
patch here for the benefit of all trying to get ARMCI/GA to work on SCore.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De




More information about the SCore-users mailing list