[SCore-users-jp] kernel 不具合?

vqm_mp vqm_mp @ yahoo.co.jp
2006年 9月 12日 (火) 11:12:16 JST


お世話になります。明治大学の鈴木です。

> 
> 3 ばんめのプロセス (<2> の部分, この数字は 0 origin
> なので...)
> の SCore が SIGSEGV をおこしたことを現しています.
>    scrun -nodes=4,scoredtrace=100 ./a.out
> などとやるとどのあたりで落ちているかわかるかもしれませ
ん.

早速,
 scrun -nodes=4,scoredtrace=100 ./a.out
をおこなってみました. プログラムは example/mttl/hello.cc
です.すると,何も反応がない場合と次の3通りの応答がある
場合がありました.

*******(1パターン目)********
<0> SCore-D:DEBUG fd_max(NULL) = 199
<3> SCore-D:DEBUG fd_max(NULL) = 199
<1> SCore-D:DEBUG fd_max(NULL) = 199
<2> SCore-D:DEBUG fd_max(NULL) = 199
<3> SCore-D:TRACE(../fep.cc:458)
<3> SCore-D:TRACE(../fep.cc:468)
<3> SCore-D:DEBUG control=(null)
SCore-D 5.8.3 connected.
<3> SCore-D:DEBUG >> user_control
<0> SCore-D:DEBUG >> createSubjob(JID=1,subjobID=0)
<3> SCore-D:DEBUG
isnot_ready_to_run(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG
reset_wchan(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG run_fep(jid=1,status=1,kill=0)
<0> SCore-D:DEBUG << createSubjob(JID=1,subjobID=0)
<0> SCORE-D:DEBUG set_process_group_id(14164,14164)
<0> SCore-D:DEBUG set_process_group_id(14164,14164)
<1> SCORE-D:DEBUG set_process_group_id(14156,14156)
<1> SCore-D:DEBUG set_process_group_id(14156,14156)
<3> SCore-D:DEBUG fep_stopped(key=414844649,jid=1,uid=0)
<3> SCore-D:DEBUG
isnot_ready_to_run(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG
reset_wchan(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG run_fep(jid=1,status=3,kill=0)
<2> SCORE-D:DEBUG set_process_group_id(13212,13212)
<2> SCore-D:DEBUG set_process_group_id(13212,13212)
<3> SCORE-D:DEBUG set_process_group_id(13149,13149)
<3> SCore-D:DEBUG set_process_group_id(13149,13149)
<3> SCore-D:DEBUG TSS timer STARTS (jid=1)
<3> SCore-D:DEBUG wakeup_job(jid=1,ident=1)
<0> SCore-D:TRACE(../idle.cc:628) fd_syscall is closed


*******(2パターン目)********
<2> SCore-D:WARNING Unable to open PM ethernet/ethernet
(error=2).
<2> SCore-D:WARNING   argv[0] -config
<2> SCore-D:WARNING   argv[1]
/var/scored/scoreboard/server.Eo0:1Mg7c
<2> SCore-D:ERROR No PM device opened.
<2> SCore-D:DEBUG >> exit_handler()
<2> SCore-D:DEBUG << exit_handler()

*******(3パターン目)********
<0> SCore-D:DEBUG fd_max(NULL) = 199
<3> SCore-D:DEBUG fd_max(NULL) = 199
<1> SCore-D:DEBUG fd_max(NULL) = 199
<2> SCore-D:DEBUG fd_max(NULL) = 199
<3> SCore-D:TRACE(../fep.cc:458)
<3> SCore-D:TRACE(../fep.cc:468)
<3> SCore-D:DEBUG control=(null)
SCore-D 5.8.3 connected.
<3> SCore-D:DEBUG >> user_control
<0> SCore-D:DEBUG >> createSubjob(JID=1,subjobID=0)
<3> SCore-D:DEBUG
isnot_ready_to_run(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG
reset_wchan(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG run_fep(jid=1,status=1,kill=0)
<0> SCore-D:DEBUG << createSubjob(JID=1,subjobID=0)
<3> SCore-D:DEBUG fep_stopped(key=1228188647,jid=1,uid=0)
<0> SCORE-D:DEBUG set_process_group_id(14343,14343)
<0> SCore-D:DEBUG set_process_group_id(14343,14343)
<3> SCore-D:DEBUG
isnot_ready_to_run(jid=1,wchan=0,gchan=0,temp=0,death=0)
<1> SCORE-D:DEBUG set_process_group_id(14338,14338)
<1> SCore-D:DEBUG set_process_group_id(14338,14338)
<3> SCore-D:DEBUG
reset_wchan(jid=1,wchan=0,gchan=0,temp=0,death=0)
<3> SCore-D:DEBUG run_fep(jid=1,status=3,kill=0)
<3> SCORE-D:DEBUG set_process_group_id(13302,13302)
<2> SCORE-D:DEBUG set_process_group_id(13184,13184)
<2> SCore-D:DEBUG set_process_group_id(13184,13184)
<3> SCore-D:DEBUG set_process_group_id(13302,13302)
<3> SCore-D:DEBUG TSS timer STARTS (jid=1)
<3> SCore-D:DEBUG wakeup_job(jid=1,ident=1)
<2> SCore-D:TRACE(../idle.cc:628) fd_syscall is closed
<3> SCore-D:TRACE(../subjob.cc:337) fep_signaled()
<3> SCore-D:DEBUG stop_fep(jid=1,st=2)
<3> SCore-D:DEBUG TSS timer EXPIRES (jid=1)
<3> SCore-D:DEBUG fep_stopped(key=1228188647,jid=1,uid=0)
<0> SCore-D:TRACE(../idle.cc:628) fd_syscall is closed
<1> SCore-D:TRACE(../idle.cc:628) fd_syscall is closed
<3> SCore-D:DEBUG
reset_wchan(jid=1,wchan=0,gchan=1,temp=0,death=ffffff)
<3> SCore-D:DEBUG run_fep(jid=1,status=3,kill=1)
<3> SCore-D:TRACE(../idle.cc:628) fd_syscall is closed
<3> SCore-D:DEBUG
all_subjob_exited(jid=1,jobstep=1/1,recover=0)
<3> SCore-D:DEBUG fep_stopped(key=1228188647,jid=1,uid=0)
<3> SCore-D:DEBUG <<<<<<<<<<< TERMINATED (jid=1)
>>>>>>>>>>>
<3> SCore-D:DEBUG
remove_job_file(/var/scored/singleuser/0/job-descs/jid-1)
<3> SCore-D:DEBUG >> free_fep(jid=1,node=2,exit=0xb)
<3> SCore-D:TRACE(../fep.cc:838)  free_fep()
<3> SCore-D:TRACE(../fep.cc:841)  free_fep()
<0> SCore-D:TRACE(../fepio.cc:443) >> flush_fepio()
<0> SCore-D:DEBUG    flush_fepio(status=3)
<0> SCore-D:TRACE(../fepio.cc:466) << flush_fepio()
<3> SCore-D:TRACE(../fepio.cc:443) >> flush_fepio()
<0> SCore-D:TRACE(../subjob.cc:221) >> free_subjob()
<0> SCore-D:DEBUG free_pegroup(flag_dontclear=0)
<0> SCore-D:DEBUG killpg(14343,9)=3
<1> SCore-D:DEBUG free_pegroup(flag_dontclear=0)
<0> SCore-D:DEBUG >> free_pe(scio=0)
<2> SCore-D:DEBUG free_pegroup(flag_dontclear=0)
<1> SCore-D:DEBUG killpg(14338,9)=3
<0> SCore-D:TRACE(../pe.cc:487) >> flush_pe()
<2> SCore-D:DEBUG killpg(13184,9)=3
<2> SCore-D:DEBUG >> free_pe(scio=0)
<1> SCore-D:DEBUG >> free_pe(scio=0)
<1> SCore-D:TRACE(../pe.cc:487) >> flush_pe()
<0> SCore-D:TRACE(../pe.cc:512)    flush_pe()
<0> SCore-D:TRACE(../pe.cc:516) << flush_pe()
<2> SCore-D:TRACE(../pe.cc:487) >> flush_pe()
<2> SCore-D:TRACE(../pe.cc:512)    flush_pe()
<1> SCore-D:TRACE(../pe.cc:512)    flush_pe()
<0> SCore-D:TRACE(../pe.cc:531)    free_pe
<0> SCore-D:TRACE(../pe.cc:535)    free_pe
<0> SCore-D:DEBUG >> close_attach_fds(netset_num=1)
<0> SCore-D:DEBUG    close_attach_fds(dev=1,np=1)
<2> SCore-D:TRACE(../pe.cc:516) << flush_pe()
<0> SCore-D:DEBUG   
close_attach_fds(dev=0,np=0,cntxt=0x81e5ed8)
<0> SCore-D:DEBUG << close_attach_fds()
<2> SCore-D:TRACE(../pe.cc:531)    free_pe
<0> SCore-D:TRACE(../pe.cc:539)    free_pe
<0> SCore-D:TRACE(../pe.cc:543)    free_pe
<0> SCore-D:TRACE(../pe.cc:547)    free_pe
<2> SCore-D:TRACE(../pe.cc:535)    free_pe
<2> SCore-D:DEBUG >> close_attach_fds(netset_num=1)
<2> SCore-D:DEBUG    close_attach_fds(dev=1,np=1)
<0> SCore-D:TRACE(../pe.cc:551)    free_pe
<0> SCore-D:TRACE(../pe.cc:574)    free_pe
<2> SCore-D:DEBUG   
close_attach_fds(dev=0,np=0,cntxt=0x81e5f38)
<3> SCore-D:DEBUG    flush_fepio(status=3)
<3> SCore-D:TRACE(../fepio.cc:466) << flush_fepio()
<3> SCore-D:TRACE(../fep.cc:843)  free_fep()
<3> SCore-D:TRACE(../fep.cc:845)  free_fep()
<3> SCore-D:TRACE(../fepio.cc:443) >> flush_fepio()
<3> SCore-D:DEBUG    flush_fepio(status=3)
<3> SCore-D:TRACE(../fepio.cc:466) << flush_fepio()
<3> SCore-D:TRACE(../fep.cc:847)  free_fep()
<3> SCore-D:TRACE(../fep.cc:849)  free_fep()
<3> SCore-D:TRACE(../fep.cc:851)  free_fep()
<0> SCore-D:TRACE(../pe.cc:578) << free_pe
<2> SCore-D:DEBUG << close_attach_fds()
<2> SCore-D:TRACE(../pe.cc:539)    free_pe
<2> SCore-D:TRACE(../pe.cc:543)    free_pe
<2> SCore-D:TRACE(../pe.cc:547)    free_pe
<2> SCore-D:TRACE(../pe.cc:551)    free_pe
<2> SCore-D:TRACE(../pe.cc:574)    free_pe
<2> SCore-D:TRACE(../pe.cc:578) << free_pe
<3> SCore-D:DEBUG free_pegroup(flag_dontclear=0)
<3> SCore-D:DEBUG killpg(13302,9)=3
<3> SCore-D:DEBUG >> free_pe(scio=0)
<3> SCore-D:TRACE(../pe.cc:487) >> flush_pe()
<3> SCore-D:TRACE(../pe.cc:512)    flush_pe()
<1> SCore-D:TRACE(../pe.cc:516) << flush_pe()
<1> SCore-D:TRACE(../pe.cc:531)    free_pe
<1> SCore-D:TRACE(../pe.cc:535)    free_pe
<1> SCore-D:DEBUG >> close_attach_fds(netset_num=1)
<1> SCore-D:DEBUG    close_attach_fds(dev=1,np=1)
<1> SCore-D:DEBUG   
close_attach_fds(dev=0,np=0,cntxt=0x81e5e80)
<1> SCore-D:DEBUG << close_attach_fds()
<1> SCore-D:TRACE(../pe.cc:539)    free_pe
<1> SCore-D:TRACE(../pe.cc:543)    free_pe
<1> SCore-D:TRACE(../pe.cc:547)    free_pe
<1> SCore-D:TRACE(../pe.cc:551)    free_pe
<1> SCore-D:TRACE(../pe.cc:574)    free_pe
<1> SCore-D:TRACE(../pe.cc:578) << free_pe
<3> SCore-D:TRACE(../pe.cc:516) << flush_pe()
<3> SCore-D:TRACE(../pe.cc:531)    free_pe
<3> SCore-D:TRACE(../pe.cc:535)    free_pe
<3> SCore-D:DEBUG >> close_attach_fds(netset_num=1)
<3> SCore-D:DEBUG    close_attach_fds(dev=1,np=1)
<3> SCore-D:DEBUG   
close_attach_fds(dev=0,np=0,cntxt=0x81e5f38)
<3> SCore-D:DEBUG << close_attach_fds()
<3> SCore-D:TRACE(../pe.cc:539)    free_pe
<3> SCore-D:TRACE(../pe.cc:543)    free_pe
<3> SCore-D:TRACE(../pe.cc:547)    free_pe
<3> SCore-D:TRACE(../pe.cc:551)    free_pe
<3> SCore-D:TRACE(../pe.cc:574)    free_pe
<3> SCore-D:TRACE(../pe.cc:578) << free_pe
<0> SCore-D:TRACE(../subjob.cc:225)    free_subjob()
<0> SCore-D:TRACE(../subjob.cc:232)    free_subjob()
<0> SCore-D:DEBUG fepio_close()
<0> SCore-D:TRACE(../subjob.cc:236)    free_subjob()
<0> SCore-D:TRACE(../subjob.cc:241) << free_subjob()
<0> SCore-D:DEBUG >> finalize_host(0)
<0> SCore-D:TRACE(../scoredir.cc:389) cleanup_scored_dir()
<0> SCore-D:DEBUG << finalize_host()
<1> SCore-D:DEBUG >> finalize_host(0)
<1> SCore-D:TRACE(../scoredir.cc:389) cleanup_scored_dir()
<1> SCore-D:DEBUG << finalize_host()
<2> SCore-D:DEBUG >> finalize_host(0)
<3> SCore-D:TRACE(../fep.cc:853)  free_fep()
<2> SCore-D:TRACE(../scoredir.cc:389) cleanup_scored_dir()
<2> SCore-D:DEBUG << finalize_host()
<3> SCore-D:TRACE(../fep.cc:856)  free_fep()
<3> SCore-D:DEBUG fepio_close()
<3> SCore-D:DEBUG fds_select[199]
<3> SCore-D:TRACE(../fep.cc:862)    free_fep(jobc)
<3> SCore-D:TRACE(../fep.cc:866) << free_fep
<3> SCore-D:DEBUG >> finalize_host(0)
<3> SCore-D:TRACE(../scoredir.cc:389) cleanup_scored_dir()
<3> SCore-D:DEBUG << finalize_host()


また,
 scrun -nodes=4,scoredtrace ./a.out
とすると,
<1> SCore-D:TRACE(../tss.cc:834) TSS scheduler ... idling
<1> SCore-D:TRACE(../tss.cc:834) TSS scheduler ... idling
<1> SCore-D:TRACE(../tss.cc:834) TSS scheduler ... idling
<1> SCore-D:TRACE(../tss.cc:834) TSS scheduler ... idling
<1> SCore-D:TRACE(../tss.cc:834) TSS scheduler ... idling
という出力が続けて出てくることがあります.

診断よろしくお願いします.



--------------------------------------
[10th Anniversary] special auction campaign now!
http://pr.mail.yahoo.co.jp/auction/



SCore-users-jp メーリングリストの案内