[SCore-users-jp] [SCore-users] Queue hangs for one user

Michael Koehne kraehe @ copyleft.de
2003年 3月 27日 (木) 03:10:36 JST


Moin Guru's,

  we have a 40node/80cpu SCore system at CLAMV (http://www.clamv.iu-bremen.de/)
  that is used by half a dozen people. We had a CPU/FAN problem a few days
  ago, and Ulrich who noticed it did the following :

  - removed cell05 from /var/scored/pbs/server_priv/nodes
  - insert cell05 into /opt/score/etc/scorehosts.defects
  - /etc/rc.d/init.d/pbs_server restart
  - and a shutdown of cell05, that saved the CPU

  We got the FAN yesterday - I installed it and reversed Ulrichs
  changes. I did not restart the pbs server, as there had been
  jobs running, that had not been my jobs.

  Now mhoeft has the problem, that all of his jobs hang in the
  queue. When he came to me, he was also unable to qdel his jobs,
  so i did the `/etc/rc.d/init.d/pbs_server restart`, as there
  had been no other users at that time. Now he is able to submit
  and delete jobs, but his jobs will never run, just blocked and
  waiting in the queue.
  
  I could start job, schroedi can start jobs, but Matthias jobs look like :

  7933.muscle.clu mhoeft   default  flash         --    4  --    -- --  Q   -- 
     cell10+cell10+cell09+cell09+cell08+cell08+cell07+cell07

  now the funny, if i start a job and immediate look at `qstat -rn` the
  job of mhoeft will get an R status for a tick of second and to fall
  back to Q nearly immediate. Time elapsed stays -- ??? any idea ?

Bye Michael
-- 
  mailto:kraehe @ copyleft.de             UNA:+.? 'CED+2+:::Linux:2.4.18'UNZ+1'
  http://www.xml-edifact.org/           CETERUM CENSEO WINDOWS ESSE DELENDAM
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users



SCore-users-jp メーリングリストの案内