Results 1 to 8 of 8
  1. #1

    Unhappy Raq3 os550 random rebooting

    Hi all,

    i'm investingating in a problem on my raqs.

    I've some raq3 upgraded with os550 with almsot 40/50 websites with PHP/MySQL migrated from a raq3.

    The Kernel is upgraded:
    #> uname -a
    #> Linux myserver.net 2.4.19C13_III #1 Tue Apr 13 20:41:16 PDT 2004 i586 unknown

    The big problem is that every 1 or 2 days the system reboot itself.

    I verified the /var/log/messages but there's nothing to abnormal in it.
    I've discovered the raq hangs some minutes before rebooting itself. If i ping it responds, but no other services is responding. Also the Front Panel is not responding so i've to hard reboot it to have the system up earlier..

    Any Help is very appreciated
    Thanks
    Marco

  2. #2
    Join Date
    Nov 2002
    Location
    Michigan
    Posts
    695
    I have found RaQs to "randomly" reboot mainly for memory issues.

    Is yours rebooting in the AM during the cron job processing? Or at other times during the day?

    Sounds like you might have some runaway jobs. If you can get into the server, try running "top" to see what's going on.

    If it seems to be at all regular as to when it goes down, you could temporarily set up a cron job to append "ps -afx" output to a temp file every 2 minutes or something. Then you could go back later and see what was running.
    http://www.lamphowto.com/ - LAMP and LAMP+SSL HowTo
    http://www.cobaltfaqs.com/ - Cobalt FAQs and HowTos

  3. #3
    Hi Bruce,

    the server has 512MB, and is all used, but with the swap there is more space available..

    total used free shared buffers cached
    Mem: 514500 501664 12836 0 0 206260
    -/+ buffers/cache: 295404 219096
    Swap: 524536 42076 482460

    There is no preferred time to reboot..

    I noticed also that servers that are rebooting frequently has the 2.4.19C13_III Kernel. Other servers with the precedent kernel version (2.4.16C12_III) doesn't have the same issue.. Maybe the kernal has some bug??

    Thanks again.
    I'll try with the cron "ps -afx" ..

  4. #4
    Join Date
    Nov 2002
    Location
    Michigan
    Posts
    695
    Your memory should show as all/mostly "used" -- Unix likes to cache things in RAM. Why pay for RAM and have it unused right?

    Because you are using a little of your swap, that does indicate to me that something is occasionally trying to use more RAM than what you have. Could be almost anything; I know the 4 AM log rotate/parse takes up a lot of RAM, especially for a busy server with large log files.

    I'm not sure on the kernel bug issue; I've got servers (real 550s and RaQ 4 with 550 OS) running the most up-to-date kernels without any problems.

    Try running a memory tester; perhaps there's one bad location in RAM and when it gets hit, the server goes down. You can get memtest at

    ftp://ftp-eng.cobalt.com/pub/experim...Tester-1.0.pkg
    http://www.lamphowto.com/ - LAMP and LAMP+SSL HowTo
    http://www.cobaltfaqs.com/ - Cobalt FAQs and HowTos

  5. #5
    Hi Bruce,

    - memory tests done without any problem..
    - i've updated the Qpopper to the 4.05 just to verify if there is some overload with it, but no success..

    This is an extract of the /var/log/kernel just before the reboot that was at 4.01AM

    Any idea??
    Thanks.
    Marco

    ################ START
    Aug 17 03:50:30 rq10 kernel: printing eip:
    Aug 17 03:50:30 rq10 kernel: c012a9c0
    Aug 17 03:50:30 rq10 kernel: *pde = 00000000
    Aug 17 03:50:30 rq10 kernel: Oops: 0000
    Aug 17 03:50:30 rq10 kernel: CPU: 0
    Aug 17 03:50:30 rq10 kernel: EIP: 0010:[vfree+40/104] Not tainted
    Aug 17 03:50:30 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:50:30 rq10 kernel: EIP: c012a9c0 vfree+0x28
    Aug 17 03:50:30 rq10 kernel: eax: e0930000 ebx: 00000200 ecx: d806a000 edx: caa9956c
    Aug 17 03:50:30 rq10 kernel: esi: d806a000 edi: d806a000 ebp: d806bf7c esp: d806bf64
    Aug 17 03:50:30 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:50:30 rq10 kernel: Process httpd (pid: 1569, stackpage=d806b000)
    Aug 17 03:50:30 rq10 kernel: Stack: c498c000 c011815e e0930000 c498c000 0000690e 0000690e 00000000 c0118e46
    Aug 17 03:50:30 rq10 kernel: c498c000 d806a000 00000000 00000000 bffff854 d806a000 d806bfb0 00000000
    Aug 17 03:50:30 rq10 kernel: d806a000 00000000 00000000 00000000 d806a000 d806a0ac d806a0ac c01088b3
    Aug 17 03:50:30 rq10 kernel: Call Trace: c011815e release_task+0x3e
    Aug 17 03:50:30 rq10 kernel: c0118e46 sys_wait4+0x332
    Aug 17 03:50:30 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:50:30 rq10 kernel:
    Aug 17 03:50:30 rq10 kernel:
    Aug 17 03:50:30 rq10 kernel: Code: 39 43 04 75 20 8b 43 0c 89 02 8b 43 08 50 8b 43 04 50 e8 d9
    Aug 17 03:51:31 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:51:31 rq10 kernel: printing eip:
    Aug 17 03:51:31 rq10 kernel: c012a9c0
    Aug 17 03:51:31 rq10 kernel: *pde = 00000000
    Aug 17 03:51:31 rq10 kernel: Oops: 0000
    Aug 17 03:51:31 rq10 kernel: CPU: 0
    Aug 17 03:51:31 rq10 kernel: EIP: 0010:[vfree+40/104] Not tainted
    Aug 17 03:51:31 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:51:31 rq10 kernel: EIP: c012a9c0 vfree+0x28
    Aug 17 03:51:31 rq10 kernel: eax: e092a000 ebx: 0000001b ecx: 00000000 edx: caa9976c
    Aug 17 03:51:31 rq10 kernel: esi: db92a000 edi: db92a000 ebp: db92bf7c esp: db92bf64
    Aug 17 03:51:31 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:51:31 rq10 kernel: Process sendmail (pid: 1057, stackpage=db92b000)
    Aug 17 03:51:31 rq10 kernel: Stack: c4dc2000 c011815e e092a000 c4dc2000 00006939 00000000 bfffd964 c0118e46
    Aug 17 03:51:31 rq10 kernel: c4dc2000 db92a000 00000000 bfffd964 bfffd940 db92a000 db92bfb0 00000000
    Aug 17 03:51:31 rq10 kernel: db92a000 00000000 00000000 00000000 db92a000 db92a0ac db92a0ac c01088b3
    Aug 17 03:51:31 rq10 kernel: Call Trace: c011815e release_task+0x3e
    Aug 17 03:51:31 rq10 kernel: c0118e46 sys_wait4+0x332
    Aug 17 03:51:31 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:51:31 rq10 kernel:
    Aug 17 03:51:31 rq10 kernel:
    Aug 17 03:51:31 rq10 kernel: Code: 39 43 04 75 20 8b 43 0c 89 02 8b 43 08 50 8b 43 04 50 e8 d9
    Aug 17 03:52:05 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:52:05 rq10 kernel: printing eip:
    Aug 17 03:52:05 rq10 kernel: c012a9c0
    Aug 17 03:52:05 rq10 kernel: *pde = 00000000
    Aug 17 03:52:05 rq10 kernel: Oops: 0000
    Aug 17 03:52:05 rq10 kernel: CPU: 0
    Aug 17 03:52:05 rq10 kernel: EIP: 0010:[vfree+40/104] Not tainted
    Aug 17 03:52:05 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:52:05 rq10 kernel: EIP: c012a9c0 vfree+0x28
    Aug 17 03:52:05 rq10 kernel: eax: e0926000 ebx: 0000001b ecx: 00000000 edx: caa9976c
    Aug 17 03:52:05 rq10 kernel: esi: c252a000 edi: c252a000 ebp: c252bf7c esp: c252bf64
    Aug 17 03:52:05 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:52:05 rq10 kernel: Process init (pid: 1, stackpage=c252b000)
    Aug 17 03:52:05 rq10 kernel: Stack: c86ba000 c011815e e0926000 c86ba000 00006937 00000000 bffff530 c0118e46
    Aug 17 03:52:05 rq10 kernel: c86ba000 c252a000 00000000 bffff530 bffff4f4 c252a000 c252bfb0 00000000
    Aug 17 03:52:05 rq10 kernel: c252a000 00000000 00000000 00000000 c252a000 c252a0ac c252a0ac c01088b3
    Aug 17 03:52:05 rq10 kernel: Call Trace: c011815e release_task+0x3e
    Aug 17 03:52:05 rq10 kernel: c0118e46 sys_wait4+0x332
    Aug 17 03:52:05 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:52:05 rq10 kernel:
    Aug 17 03:52:05 rq10 kernel:
    Aug 17 03:52:05 rq10 kernel: Code: 39 43 04 75 20 8b 43 0c 89 02 8b 43 08 50 8b 43 04 50 e8 d9
    Aug 17 03:52:05 rq10 kernel: <0>Kernel panic: Attempted to kill init!
    Aug 17 03:55:31 rq10 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:55:31 rq10 kernel: printing eip:
    Aug 17 03:55:31 rq10 kernel: c012a94a
    Aug 17 03:55:31 rq10 kernel: *pde = 00000000
    Aug 17 03:55:31 rq10 kernel: Oops: 0000
    Aug 17 03:55:31 rq10 kernel: CPU: 0
    Aug 17 03:55:31 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:31 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:31 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:31 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000001b edx: 2d646c2f
    Aug 17 03:55:31 rq10 kernel: esi: caa99760 edi: 00002000 ebp: caa9976c esp: c5529f54
    Aug 17 03:55:31 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:31 rq10 kernel: Process sendmail (pid: 27078, stackpage=c5529000)
    Aug 17 03:55:31 rq10 kernel: Stack: 00001000 00000001 00000001 bfffe6e4 ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:31 rq10 kernel: 00000000 00000001 00000001 bfffe6e4 00000000 c5528000 bfffe6a8 bfffe63c
    Aug 17 03:55:31 rq10 kernel: c5528000 bfffe6ac 00000002 bfffe6a4 bfffe6a8 c011fe11 00000080 000001f2
    Aug 17 03:55:31 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:31 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:31 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:55:31 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:55:31 rq10 kernel: printing eip:
    Aug 17 03:55:31 rq10 kernel: c012a94a
    Aug 17 03:55:31 rq10 kernel: *pde = 00000000
    Aug 17 03:55:31 rq10 kernel: Oops: 0000
    Aug 17 03:55:31 rq10 kernel: CPU: 0
    Aug 17 03:55:31 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:31 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:31 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:31 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000001b edx: 2d646c2f
    Aug 17 03:55:31 rq10 kernel: esi: caa993c0 edi: 00002000 ebp: caa9976c esp: c5c11f54
    Aug 17 03:55:31 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:31 rq10 kernel: Process sendmail (pid: 27079, stackpage=c5c11000)
    Aug 17 03:55:31 rq10 kernel: Stack: 00001000 00000001 00000001 bfffe6e4 ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:31 rq10 kernel: 00000000 00000001 00000001 bfffe6e4 00000000 c5c10000 bfffe6a8 bfffe63c
    Aug 17 03:55:31 rq10 kernel: c5c10000 bfffe6ac 00000002 bfffe6a4 bfffe6a8 c011fe11 00000080 000001f2
    Aug 17 03:55:31 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:31 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:31 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:55:31 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:55:31 rq10 kernel: printing eip:
    Aug 17 03:55:31 rq10 kernel: c012a94a
    Aug 17 03:55:31 rq10 kernel: *pde = 00000000
    Aug 17 03:55:31 rq10 kernel: Oops: 0000
    Aug 17 03:55:31 rq10 kernel: CPU: 0
    Aug 17 03:55:31 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:31 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:31 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:31 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000001b edx: 2d646c2f
    Aug 17 03:55:31 rq10 kernel: esi: caa99560 edi: 00002000 ebp: caa9976c esp: c4b4bf54
    Aug 17 03:55:31 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:31 rq10 kernel: Process sendmail (pid: 27080, stackpage=c4b4b000)
    Aug 17 03:55:31 rq10 kernel: Stack: 00001000 00000001 00000001 bfffe6e4 ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:31 rq10 kernel: 00000000 00000001 00000001 bfffe6e4 00000000 c2326000 c032fd48 00000000
    Aug 17 03:55:31 rq10 kernel: 00000000 c4b4a000 c0366000 d7cbd340 c4b4a000 c011fe11 00000080 000001f2
    Aug 17 03:55:31 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:31 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:31 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:55:31 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:55:31 rq10 kernel: printing eip:
    Aug 17 03:55:31 rq10 kernel: c012a94a
    Aug 17 03:55:31 rq10 kernel: *pde = 00000000
    Aug 17 03:55:31 rq10 kernel: Oops: 0000
    Aug 17 03:55:31 rq10 kernel: CPU: 0
    Aug 17 03:55:31 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:31 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:31 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:31 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000001b edx: 2d646c2f
    Aug 17 03:55:31 rq10 kernel: esi: caa99760 edi: 00002000 ebp: caa9976c esp: c5913f54
    Aug 17 03:55:31 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:31 rq10 kernel: Process sendmail (pid: 27081, stackpage=c5913000)
    Aug 17 03:55:31 rq10 kernel: Stack: 00001000 00000001 00000001 bfffe6e4 ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:31 rq10 kernel: 00000000 00000001 00000001 bfffe6e4 00000000 c5912000 bfffe6a8 bfffe63c
    Aug 17 03:55:31 rq10 kernel: c5912000 bfffe6ac 00000002 bfffe6a4 bfffe6a8 c011fe11 00000080 000001f2
    Aug 17 03:55:31 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:31 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:31 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel:
    Aug 17 03:55:31 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:55:53 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001f
    Aug 17 03:55:53 rq10 kernel: printing eip:
    Aug 17 03:55:53 rq10 kernel: c012a94a
    Aug 17 03:55:53 rq10 kernel: *pde = 00000000
    Aug 17 03:55:53 rq10 kernel: Oops: 0000
    Aug 17 03:55:53 rq10 kernel: CPU: 0
    Aug 17 03:55:53 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:53 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:53 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:53 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000001b edx: 2d646c2f
    Aug 17 03:55:53 rq10 kernel: esi: d5572600 edi: 00002000 ebp: caa9976c esp: c5529f54
    Aug 17 03:55:53 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:53 rq10 kernel: Process in.qpopper (pid: 27098, stackpage=c5529000)
    Aug 17 03:55:53 rq10 kernel: Stack: 00001000 00000001 00000001 bfffc5ec ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:53 rq10 kernel: 00000000 00000001 00000001 bfffc5ec 00000000 c5528000 bfffc5b0 bfffc544
    Aug 17 03:55:53 rq10 kernel: c5528000 bfffc5b4 00000002 bfffc5ac bfffc5b0 c011fe11 00000080 000001f2
    Aug 17 03:55:53 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:53 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:53 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:53 rq10 kernel:
    Aug 17 03:55:53 rq10 kernel:
    Aug 17 03:55:53 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:55:56 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000f
    Aug 17 03:55:56 rq10 kernel: printing eip:
    Aug 17 03:55:56 rq10 kernel: c012a94a
    Aug 17 03:55:56 rq10 kernel: *pde = 00000000
    Aug 17 03:55:56 rq10 kernel: Oops: 0000
    Aug 17 03:55:56 rq10 kernel: CPU: 0
    Aug 17 03:55:56 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:55:56 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:55:56 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:55:56 rq10 kernel: eax: 2d648c2f ebx: 2d646c2f ecx: 0000000b edx: 2d646c2f
    Aug 17 03:55:56 rq10 kernel: esi: caa99560 edi: 00002000 ebp: caa9976c esp: c5529f54
    Aug 17 03:55:56 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:55:56 rq10 kernel: Process in.qpopper (pid: 27099, stackpage=c5529000)
    Aug 17 03:55:56 rq10 kernel: Stack: 00001000 00000001 00000001 bfffc5ec ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:55:56 rq10 kernel: 00000000 00000001 00000001 bfffc5ec 00000000 c5528000 bfffc5b0 bfffc544
    Aug 17 03:55:56 rq10 kernel: c5528000 bfffc5b4 00000002 bfffc5ac bfffc5b0 c011fe11 00000080 000001f2
    Aug 17 03:55:56 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:55:56 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:55:56 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:55:56 rq10 kernel:
    Aug 17 03:55:56 rq10 kernel:
    Aug 17 03:55:56 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:56:16 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000f
    Aug 17 03:56:16 rq10 kernel: printing eip:
    Aug 17 03:56:16 rq10 kernel: c012a94a
    Aug 17 03:56:16 rq10 kernel: *pde = 00000000
    Aug 17 03:56:16 rq10 kernel: Oops: 0000
    Aug 17 03:56:16 rq10 kernel: CPU: 0
    Aug 17 03:56:16 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:56:16 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:56:16 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:56:16 rq10 kernel: eax: 00002000 ebx: 00000000 ecx: 0000000b edx: 00000000
    Aug 17 03:56:16 rq10 kernel: esi: caa993c0 edi: 00002000 ebp: caa9976c esp: c5529f54
    Aug 17 03:56:16 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:56:16 rq10 kernel: Process in.qpopper (pid: 27100, stackpage=c5529000)
    Aug 17 03:56:16 rq10 kernel: Stack: 00001000 00000001 00000001 bfffc5ec ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:56:16 rq10 kernel: 00000000 00000001 00000001 bfffc5ec 00000000 c5528000 bfffc5b0 bfffc544
    Aug 17 03:56:16 rq10 kernel: c5528000 bfffc5b4 00000002 bfffc5ac bfffc5b0 c011fe11 00000080 000001f2
    Aug 17 03:56:16 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:56:16 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:56:16 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:56:16 rq10 kernel:
    Aug 17 03:56:16 rq10 kernel:
    Aug 17 03:56:16 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:56:22 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000f
    Aug 17 03:56:22 rq10 kernel: printing eip:
    Aug 17 03:56:22 rq10 kernel: c012a94a
    Aug 17 03:56:22 rq10 kernel: *pde = 00000000
    Aug 17 03:56:22 rq10 kernel: Oops: 0000
    Aug 17 03:56:22 rq10 kernel: CPU: 0
    Aug 17 03:56:22 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:56:22 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:56:22 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:56:22 rq10 kernel: eax: 00002000 ebx: 00000000 ecx: 0000000b edx: 00000000
    Aug 17 03:56:22 rq10 kernel: esi: caa993c0 edi: 00002000 ebp: caa9976c esp: c9c69f54
    Aug 17 03:56:22 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:56:22 rq10 kernel: Process in.qpopper (pid: 27104, stackpage=c9c69000)
    Aug 17 03:56:22 rq10 kernel: Stack: 00001000 00000001 00000001 bfffc5ec ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:56:22 rq10 kernel: 00000000 00000001 00000001 bfffc5ec 00000000 c9c68000 bfffc5b0 bfffc544
    Aug 17 03:56:22 rq10 kernel: c9c68000 bfffc5b4 00000002 bfffc5ac bfffc5b0 c011fe11 00000080 000001f2
    Aug 17 03:56:22 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:56:22 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:56:22 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:56:22 rq10 kernel:
    Aug 17 03:56:22 rq10 kernel:
    Aug 17 03:56:22 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69
    Aug 17 03:56:37 rq10 kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000f
    Aug 17 03:56:37 rq10 kernel: printing eip:
    Aug 17 03:56:37 rq10 kernel: c012a94a
    Aug 17 03:56:37 rq10 kernel: *pde = 00000000
    Aug 17 03:56:37 rq10 kernel: Oops: 0000
    Aug 17 03:56:37 rq10 kernel: CPU: 0
    Aug 17 03:56:37 rq10 kernel: EIP: 0010:[get_vm_area+102/180] Not tainted
    Aug 17 03:56:37 rq10 kernel: EFLAGS: 00010206
    Aug 17 03:56:37 rq10 kernel: EIP: c012a94a get_vm_area+0x66
    Aug 17 03:56:37 rq10 kernel: eax: 00002000 ebx: 00000000 ecx: 0000000b edx: 00000000
    Aug 17 03:56:37 rq10 kernel: esi: caa993c0 edi: 00002000 ebp: caa9976c esp: c9c69f54
    Aug 17 03:56:37 rq10 kernel: ds: 0018 es: 0018 ss: 0018
    Aug 17 03:56:37 rq10 kernel: Process in.qpopper (pid: 27118, stackpage=c9c69000)
    Aug 17 03:56:37 rq10 kernel: Stack: 00001000 00000001 00000001 bfffc5ec ffffa000 c012aa3d 00002000 00000002
    Aug 17 03:56:37 rq10 kernel: 00000000 00000001 00000001 bfffc5ec 00000000 c9c68000 bfffc5b0 bfffc544
    Aug 17 03:56:37 rq10 kernel: c9c68000 bfffc5b4 00000002 bfffc5ac bfffc5b0 c011fe11 00000080 000001f2
    Aug 17 03:56:37 rq10 kernel: Call Trace: c012aa3d __vmalloc+0x3d
    Aug 17 03:56:37 rq10 kernel: c011fe11 sys_setgroups+0x71
    Aug 17 03:56:37 rq10 kernel: c01088b3 system_call+0x33
    Aug 17 03:56:37 rq10 kernel:
    Aug 17 03:56:37 rq10 kernel:
    Aug 17 03:56:37 rq10 kernel: Code: 8b 51 04 39 d0 76 15 8b 59 08 01 d3 3b 5c 24 10 77 27 8d 69

    >>>>>>>>>>>>>>>REBOOT>>>>>>>>>>>

    Aug 17 04:01:51 rq10 kernel: klogd 1.3-3, log source = /proc/kmsg started.
    Aug 17 04:01:51 rq10 kernel: Inspecting /boot/System.map
    Aug 17 04:01:52 rq10 kernel: Loaded 16974 symbols from /boot/System.map.
    Aug 17 04:01:52 rq10 kernel: Symbols match kernel version 2.4.19.
    Aug 17 04:01:52 rq10 kernel: Loaded 124 symbols from 3 modules.
    Aug 17 04:01:52 rq10 kernel: Linux version 2.4.19C13_III ([email protected]) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Tue Apr 13 20:41:16 PDT 2004
    Aug 17 04:01:52 rq10 kernel: BIOS-provided physical RAM map:
    Aug 17 04:01:52 rq10 kernel: BIOS-e801: 0000000000000000 - 000000000009f000 (usable)
    Aug 17 04:01:52 rq10 kernel: BIOS-e801: 0000000000100000 - 0000000020000000 (usable)
    Aug 17 04:01:52 rq10 kernel: 512MB LOWMEM available.
    Aug 17 04:01:52 rq10 kernel: On node 0 totalpages: 131072
    Aug 17 04:01:52 rq10 kernel: zone(0): 4096 pages.
    Aug 17 04:01:52 rq10 kernel: zone(1): 126976 pages.
    Aug 17 04:01:52 rq10 kernel: zone(2): 0 pages.
    Aug 17 04:01:52 rq10 kernel: Kernel command line: console=ttyS0,115200 debug ip=off
    Aug 17 04:01:52 rq10 kernel: Initializing CPU#0
    Aug 17 04:01:52 rq10 kernel: Detected 448.211 MHz processor.
    Aug 17 04:01:52 rq10 kernel: Calibrating delay loop... 894.56 BogoMIPS
    Aug 17 04:01:52 rq10 kernel: Memory: 512232k/524288k available (1470k kernel code, 9492k reserved, 979k data, 92k init, 0k highmem)
    Aug 17 04:01:52 rq10 kernel: Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
    Aug 17 04:01:52 rq10 kernel: Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
    Aug 17 04:01:52 rq10 kernel: Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
    Aug 17 04:01:52 rq10 kernel: Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
    Aug 17 04:01:52 rq10 kernel: Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
    Aug 17 04:01:52 rq10 kernel: CPU: Before vendor init, caps: 008021bf 808029bf 00000000, vendor = 2
    Aug 17 04:01:52 rq10 kernel: CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line)
    Aug 17 04:01:52 rq10 kernel: CPU: After vendor init, caps: 008021bf 808029bf 00000000 00000002
    Aug 17 04:01:52 rq10 kernel: CPU: After generic, caps: 008021bf 808029bf 00000000 00000002
    Aug 17 04:01:52 rq10 kernel: CPU: Common caps: 008021bf 808029bf 00000000 00000002
    Aug 17 04:01:52 rq10 kernel: CPU: AMD-K6(tm) 3D processor stepping 0c
    Aug 17 04:01:52 rq10 kernel: Checking 'hlt' instruction... OK.
    Aug 17 04:01:52 rq10 kernel: POSIX conformance testing by UNIFIX
    Aug 17 04:01:52 rq10 kernel: mtrr: v1.40 (20010327) Richard Gooch ([email protected])
    Aug 17 04:01:52 rq10 kernel: mtrr: detected mtrr type: AMD K6
    Aug 17 04:01:52 rq10 kernel: PCI: Using configuration type 1
    Aug 17 04:01:52 rq10 kernel: PCI: Probing PCI hardware
    Aug 17 04:01:52 rq10 kernel: PCI: Probing PCI hardware (bus 00)
    Aug 17 04:01:52 rq10 kernel: Unknown bridge resource 0: assuming transparent
    Aug 17 04:01:52 rq10 kernel: Unknown bridge resource 1: assuming transparent
    Aug 17 04:01:52 rq10 kernel: Unknown bridge resource 2: assuming transparent
    Aug 17 04:01:52 rq10 kernel: Linux NET4.0 for Linux 2.4
    Aug 17 04:01:52 rq10 kernel: Based upon Swansea University Computer Society NET3.039
    Aug 17 04:01:52 rq10 kernel: Initializing RT netlink socket
    Aug 17 04:01:52 rq10 kernel: Starting kswapd
    Aug 17 04:01:52 rq10 kernel: VFS: Disk quotas vdquot_6.5.1
    Aug 17 04:01:52 rq10 kernel: SGI XFS snapshot 2.4.19-2002-09-27_04:22_UTC with ACLs, realtime, quota, no debug enabled
    ################ END
    Last edited by kommand; 08-17-2005 at 05:14 AM.

  6. #6
    I had the same problem with a raq4 and thought it was a software problem as well. I checked all logfiles and monitored all processes and load. Couldn't find a thing.

    In the end it turned out to be a fan problem. Check your CPU temperature and fan speed.
    Last edited by tommienbp; 08-17-2005 at 10:52 AM.

  7. #7
    Hi guys,

    maybe we're approaching to the solution of this issue:

    found an article that discusses about the crash problem that causes the reboot...

    Go to cobaltsupport.com/updates/raq550/ and download the kernel patch RaQ550-csUPDATE12_III - Kernel for RaQ3 and RaQ4 converted to RaQ550.

    It speaks about a flaws found in all RAQ Kernel..

    Critical kernel update for RaQ3 and RaQ4 running RaQ550 OS (Gen III machines). This update addresses:

    1. Denial of Service
    linuxreviews.org/news/2004/06/11_kernel_crash/

    2. Privilege escalation - sys_uselib() race vulnerability [CAN-2004-1235]
    isec.pl/vulnerabilities/isec-0021-uselib.txt

    If you download the c script and then you compile it to test the kernel crash..

    Cut&Paste the scripts on link 1. to the file, compile and execute it:
    #> vi script.c
    #> gcc script.c -o kcheck
    #> ./kcheck

    it acts like the problem of our raqs: ping responds, but no other services running at all..

    Once patched the RAQ, the kcheck script still crash the kernel so the only way is to hard-reboot it.

    Maybe the patch is not the right solution?!

    Anyone has already tested it?

    Bye
    Marco

  8. #8
    Join Date
    Nov 2002
    Location
    Michigan
    Posts
    695
    I have seen a real RaQ 550 reboot as well. It seems to have something to do with qpopper and/or sendmail... if you will notice in the logfile you posted, the kernel oops messages all contain mention of one or the other in the block for each message.

    I still have not found out what causes this problem. The real 550 I admin that was doing this hasn't done it in a long time; I have not touched it either.

    Because it is so intermittent, that's what's so frustrating about it. Double check the RAM is ok, and make sure your fans are all good as well...
    http://www.lamphowto.com/ - LAMP and LAMP+SSL HowTo
    http://www.cobaltfaqs.com/ - Cobalt FAQs and HowTos

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •