[SCore-users] Queue hangs for one user

Michael Koehne kraehe at copyleft.de
Thu Mar 27 03:10:36 JST 2003


Moin Guru's,

  we have a 40node/80cpu SCore system at CLAMV (http://www.clamv.iu-bremen.de/)
  that is used by half a dozen people. We had a CPU/FAN problem a few days
  ago, and Ulrich who noticed it did the following :

  - removed cell05 from /var/scored/pbs/server_priv/nodes
  - insert cell05 into /opt/score/etc/scorehosts.defects
  - /etc/rc.d/init.d/pbs_server restart
  - and a shutdown of cell05, that saved the CPU

  We got the FAN yesterday - I installed it and reversed Ulrichs
  changes. I did not restart the pbs server, as there had been
  jobs running, that had not been my jobs.

  Now mhoeft has the problem, that all of his jobs hang in the
  queue. When he came to me, he was also unable to qdel his jobs,
  so i did the `/etc/rc.d/init.d/pbs_server restart`, as there
  had been no other users at that time. Now he is able to submit
  and delete jobs, but his jobs will never run, just blocked and
  waiting in the queue.
  
  I could start job, schroedi can start jobs, but Matthias jobs look like :

  7933.muscle.clu mhoeft   default  flash         --    4  --    -- --  Q   -- 
     cell10+cell10+cell09+cell09+cell08+cell08+cell07+cell07

  now the funny, if i start a job and immediate look at `qstat -rn` the
  job of mhoeft will get an R status for a tick of second and to fall
  back to Q nearly immediate. Time elapsed stays -- ??? any idea ?

Bye Michael
-- 
  mailto:kraehe at copyleft.de             UNA:+.? 'CED+2+:::Linux:2.4.18'UNZ+1'
  http://www.xml-edifact.org/           CETERUM CENSEO WINDOWS ESSE DELENDAM



More information about the SCore-users mailing list