[SCore-users-jp] [SCore-users] Kernel oops
Bogdan Costescu
bogdan.costescu @ iwr.uni-heidelberg.de
2003年 3月 7日 (金) 04:58:42 JST
Dear SCore developers,
I've postponed trying to test GM on our nodes as I have observed that
whenever SCoreD crashes and takes with it one node there is also an Oops
displayed on the node. This is with the SCore 4.2.1 kernel patch applied
to RH 2.4.18-24, so it might be some error that I have introduced, but the
behaviour (SCoreD taking down one node) is the same with SCore 5.4 and
kernel 2.4.19-1SCORE which I plan to test tomorrow.
So, the (decoded) Oops looks like this:
EIP is at __wake_up [kernel] 0x3c (2.4.18-24SCORE)
eax: c041c998 ebx: c25a4d80 ecx: 00000000 edx: 00000000
esi: 00000001 edi: c041c994 ebp: c25abf1c esp: c25abf08
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c25ab000)
Stack: 00000282 00000001 c041c96c c041c840 c041c994 00000002 c019ec9a c25a4d80
00000001 dbdfa015 00000010 c010a6e3 00000010 c041c840 c25abf7c c25abf7c
c0398000 00000010 c25a4d80 c010a872 00000010 c25abf7c c25a4d80 00000001
Call trace: [<c019ec9a>] myri_pm_intr [kernel] 0x7a (0xc25abf20))
[<c010a63e>] handle_IRQ_event [kernel] 0x5e (oxc25abf34))
[<c010a872>] do_IRQ [kernel] 0xc2 (0c25abf54))
[<c0106e60>] default_idle [kernel] 0x0 (0xc25abf68))
[<c0106e60>] default_idle [kernel] 0x0 (0xc25abf74))
[<c010d098>] call_do_IRQ [kernel] 0x5 (0xc25abf78))
[<c0106e60>] default_idle [kernel] 0x0 (0xc25abf7c))
[<c0106e60>] default_idle [kernel] 0x0 (0xc25abf90))
[<c0106e89>] default_idle [kernel] 0x29 (0xc25abfa4))
[<c0106f02>] cpu_idle [kernel] 0x32 (0xc25abfb0))
[<c011dafb>] call_console_drivers [kernel] 0xeb (0xc25abfd0))
[<c011dca9>] printk [kernel] 0x129 (0xc25abffc))
Code: 8b 02 85 45 f0 74 ed 6a 00 52 e8 75 f0 ff ff 5a 85 c0 59 74
Using defaults from ksymoops -t elf32-i386 -a i386
Trace; c019ec9a <myri_pm_intr+7a/90>
Trace; c010a63e <handle_IRQ_event+5e/90>
Trace; c010a872 <do_IRQ+c2/110>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e60 <default_idle+0/40>
Trace; c010d098 <call_do_IRQ+5/d>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e89 <default_idle+29/40>
Trace; c0106f02 <cpu_idle+32/50>
Trace; c011dafb <call_console_drivers+eb/100>
Trace; c011dca9 <printk+129/140>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 02 mov (%edx),%eax
Code; 00000002 Before first symbol
2: 85 45 f0 test %eax,0xfffffff0(%ebp)
Code; 00000005 Before first symbol
5: 74 ed je fffffff4 <_EIP+0xfffffff4> fffffff4 <END_OF_CODE+1f463075/????>
Code; 00000007 Before first symbol
7: 6a 00 push $0x0
Code; 00000009 Before first symbol
9: 52 push %edx
Code; 0000000a Before first symbol
a: e8 75 f0 ff ff call fffff084 <_EIP+0xfffff084> fffff084 <END_OF_CODE+1f462105/????>
Code; 0000000f Before first symbol
f: 5a pop %edx
Code; 00000010 Before first symbol
10: 85 c0 test %eax,%eax
Code; 00000012 Before first symbol
12: 59 pop %ecx
Code; 00000013 Before first symbol
13: 74 00 je 15 <_EIP+0x15> 00000015 Before first symbol
<0>Kernel panic: Aiee, killing interrupt handler!
Today I was able to reproduce this Oops several times on different nodes.
The trace is always the same, except for the line(s) after cpu_idle, which
can be replaced by:
[<c0105000>] stext [kernel] 0x0 (...))
I looked a bit through the code but I don't really understand Myrinet
programming too well, so maybe this gives you some idea. Spurious
interrupts ? Lost interrupts ? I'm still not confortable with the
interrupt state on my machines as they have Tyan 760MP boards which are
known for instabilities.
Anyway, as I said, I plan to try tomorrow with SCore 5.4 and kernel
2.4.19-1SCORE to see if the locks there are also associated with such
Oopses.
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu @ IWR.Uni-Heidelberg.De
_______________________________________________
SCore-users mailing list
SCore-users @ pccluster.org
http://www.pccluster.org/mailman/listinfo/score-users
SCore-users-jp メーリングリストの案内