[SCore-users-jp] [SCore-users] Job can't be killed

PH1303 @ gmx.de PH1303 @ gmx.de
2003年 8月 15日 (金) 15:10:41 JST


Hi,

I installed SCore 5.4.0 on a 16 SMP node Red Hat 7.3 Cluster with GE and
shmem. Everything works fine. I started a multi-user environment with

sc_syslog mysyslogserver /my/syslogfile

sc_watch -g mygroup -l /my/message -f /my/sc_watch.log scored -sysmon
mysysmonserver -syslog mysyslogserver

A user submitted a job, his shell crashed (logged out before job was
finished) and now the job can't be killed/aborted or anything else by sc_console
command. Here the scored.messages:

---------------------------
14/Aug/2003 18:51:06 SYSLOG: Login accepted: user01 @ comp01.domain.de:33488,
JID: 13, Hosts: 1(1x1)@0, Priority: 1, Command:
/baw/daten01/user01/benchmarks/telemac/cas21870_tmp/out21870
14/Aug/2003 18:51:45 SYSLOG: Logout: user01 @ comp01.domain.de:33488, JOB-ID:
13, CPU Time: 37.93[S]
14/Aug/2003 18:52:51 CONSOLE: >> info job
14/Aug/2003 18:52:51 CONSOLE: 13 user01 @ comp01:33488 1(1x1)@0.IRL - z
37.93[S]/105.0[S] 89.25[MB] 1[MB] out21870
14/Aug/2003 18:52:51 CONSOLE: 1 jobs.
14/Aug/2003 18:52:57 CONSOLE: >> abort 13
14/Aug/2003 18:52:57 CONSOLE: ERROR: job (13) not found.
14/Aug/2003 18:53:00 CONSOLE: >> kill 13
14/Aug/2003 18:53:26 CONSOLE: >> kill all
14/Aug/2003 18:53:43 CONSOLE: >> info queue
14/Aug/2003 18:53:43 CONSOLE: Queue[0] activated,  0 running exclusively,  0
waiting
14/Aug/2003 18:53:43 CONSOLE: Queue[1] activated,  0 running time-shared
14/Aug/2003 18:53:43 CONSOLE: Queue[2] activated,  0 running time-shared
14/Aug/2003 18:53:43 CONSOLE: 0 job(s) suspended
14/Aug/2003 18:53:43 CONSOLE: 0 job(s) aborted
14/Aug/2003 18:53:43 CONSOLE: 0 job(s) waiting for login
15/Aug/2003 07:44:18 CONSOLE: >> info all
15/Aug/2003 07:44:18 CONSOLE: SCore-D 5.4.0 SCORE_NOT_SECURE
15/Aug/2003 07:44:18 CONSOLE: Cluster[0]:
(0..15)x2.i386-redhat7-linux2_4.pentium-iv.2800
15/Aug/2003 07:44:18 CONSOLE:   Memory: 504[MB], Swap: 2048[MB], Disk:
5040[MB]
15/Aug/2003 07:44:18 CONSOLE:   Network[0]: ethernet/ethernet
15/Aug/2003 07:44:18 CONSOLE:      Hostname             Load     Memory    
Swap         Disk
15/Aug/2003 07:44:18 CONSOLE: comp01.domain.de @ 0      0.00
1375.95[MB]/2527.11[MB]  2047.96[MB]/2047.96[MB]  7076.59[MB]/10079.13[MB]
15/Aug/2003 07:44:18 CONSOLE: comp02.domain.de @ 1      0.00
102.40[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4562.91[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp03.domain.de @ 2      0.00
102.38[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.50[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp04.domain.de @ 3      0.00
102.63[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.47[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp05.domain.de @ 4      0.02
102.35[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.50[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp06.domain.de @ 5      0.00
102.59[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.50[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp07.domain.de @ 6      0.01
102.59[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.51[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp08.domain.de @ 7      0.00
102.84[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.51[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp09.domain.de @ 8      0.00
102.81[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.76[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp10.domain.de @ 9      0.00
102.65[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.80[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp11.domain.de @ 10     0.00
102.37[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.78[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp12.domain.de @ 11     0.00
102.74[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.78[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp13.domain.de @ 12     0.00
102.58[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.79[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp14.domain.de @ 13     0.02
102.59[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4563.79[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp15.domain.de @ 14     0.00
103.55[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4565.97[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: comp16.domain.de @ 15     0.04
137.74[MB]/503.20[MB]  2047.96[MB]/2047.96[MB]  4565.81[MB]/5039.53[MB]
15/Aug/2003 07:44:18 CONSOLE: no device.
15/Aug/2003 07:44:18 CONSOLE: Queue[0] activated,  0 running exclusively,  0
waiting
15/Aug/2003 07:44:18 CONSOLE: Queue[1] activated,  0 running time-shared
15/Aug/2003 07:44:18 CONSOLE: Queue[2] activated,  0 running time-shared
15/Aug/2003 07:44:18 CONSOLE: 0 job(s) suspended
15/Aug/2003 07:44:18 CONSOLE: 0 job(s) aborted
15/Aug/2003 07:44:18 CONSOLE: 0 job(s) waiting for login
15/Aug/2003 07:44:18 CONSOLE: Cluster        Nodes     Memory       Disk    
 #Jobs
15/Aug/2003 07:44:18 CONSOLE:  [0]           0..15    504[MB]   5040[MB]    
(none)
15/Aug/2003 07:44:18 CONSOLE: Queue        Time    Remain   Memory Disk    
Group      Min      Max
15/Aug/2003 07:44:18 CONSOLE:  [0]        (none)   (none)   (none) (none)  
(none)   (none)   (none)
15/Aug/2003 07:44:18 CONSOLE:  [1]        (none)   (none)   (none) (none)  
(none)   (none)   (none)
15/Aug/2003 07:44:18 CONSOLE:  [2]        (none)   (none)   (none) (none)  
(none)   (none)   (none)
15/Aug/2003 07:44:18 CONSOLE: JID:13  CPU time limit: 0.0[m]
15/Aug/2003 07:44:18 CONSOLE: JID:13  Cluster[0].i386-redhat7-linux2_4
Memory limit: (none)  Disk limit: (none)
15/Aug/2003 07:44:18 CONSOLE: 13 user01 @ comp01:33488 1(1x1)@0.IRL - z
37.93[S]/12.88[H] 89.25[MB] 1[MB] out21870
15/Aug/2003 07:44:18 CONSOLE: 1 jobs.
----------------------

sctop command after 13 hours (!) shows following output:

-----------------------
16 Hosts, comp01.domain.de .. comp16.domain.de
Up 15.34[H], Load Average: 0.00, 1 Jobs logged in, 13 Jobs accumulated
Host#: 0---------1----1
       0123456789012345
#Jobs: 0000000000000000

JID           User @ Host:Port     Resource P S    TIME(CPU/Elps) Memory  Disk
 Command
 13 user01 @ comp01:33488 1(1x1)@0.IRL - z 37.93[S]/13.14[H] 89.25[MB] 1[MB]
out21870
------------------------

What can I do to get rid of this job?

Best regards,
Patrick

-- 
COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test
--------------------------------------------------
1. GMX TopMail - Platz 1 und Testsieger!
2. GMX ProMail - Platz 2 und Preis-Qualitätssieger!
3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post

_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内