Results 1 to 3 of 3

Hybrid View

  1. #1
    Join Date
    Sep 2008
    Location
    Melbourne
    Posts
    405

    IBM x3550 M2 Diagnostics

    Hi all,

    Happy New Year!

    Just last week I had a few issues with an old IBM System x3550 M2 we've been colocating which had been been running pretty well for quite some times except two occasions where it's "just died".

    By just died I mean, it's still powered on but the network connectivity seems to be lost and the "Remote Control" feature from the RSA II is just a blank screen. Both occasions I ended up being out of the country so had no chance to do a physical troubleshoot.

    The solution ended up being having to do a power cycle (full power off for a few minutes). The reboot function did not work, as it'd boot into the bios/system debug/raid controller etc. but just before it would start to boot the OS it would just blank screen again.

    Crash "Just Died" -> Reboot -> BIOS -> Blank Screen
    Crash "Just Died" -> Power Off -> Wait 5 min -> Power On -> BIOS -> Boots OS

    I've never seen something like this happen before, and I'm more concerned on why it happened and what would be possible to prevent it from happening again.

    CentOS logs reported having issues accessing the drives right before it "crashed", could it possibly be a power failure issue? There are dual redundant power supplies and but my other servers in the same rack didn't get effected.

    The first time it happened, someone on site took a photo (photo attached) of it for me before they initiated a hard reboot (power plug).

    Unfortunately it's an old server, well past any support from IBM.

    Any help or suggestions would be much appreciated.

    Cheers,
    Andrew.
    Attached Thumbnails Attached Thumbnails photo.JPG  

  2. #2
    Join Date
    Sep 2008
    Location
    Melbourne
    Posts
    405
    Last Log messages:

    Dec 25 11:36:34 hv01 kernel: device-mapper: multipath: Failing path 8:16.
    Dec 25 11:36:34 hv01 multipathd: SServeRA_DATA_CE438969: sdb - directio checker reports path is down
    Dec 25 11:36:34 hv01 multipathd: checker failed path 8:16 in map SServeRA_DATA_CE438969
    Dec 25 11:36:34 hv01 multipathd: SServeRA_DATA_CE438969: remaining active paths: 0
    Dec 25 11:36:35 hv01 kernel: Buffer I/O error on device dm-4, logical block 61035358
    Dec 25 11:36:35 hv01 kernel: lost page write due to I/O error on dm-4
    Dec 25 11:36:35 hv01 kernel: Buffer I/O error on device dm-4, logical block 61081038
    Dec 25 11:36:35 hv01 kernel: lost page write due to I/O error on dm-4
    Dec 25 11:36:35 hv01 kernel: Buffer I/O error on device dm-4, logical block 61081039
    Dec 25 11:36:35 hv01 kernel: lost page write due to I/O error on dm-4
    Dec 25 11:36:35 hv01 kernel: Buffer I/O error on device dm-4, logical block 63217125
    Dec 25 11:36:35 hv01 kernel: lost page write due to I/O error on dm-4
    Dec 25 11:36:35 hv01 kernel: XFS (dm-4): metadata I/O error: block 0x1d12f800 ("xlog_iodone") error 5 buf count 262144
    Dec 25 11:36:35 hv01 kernel: XFS (dm-4): xfs_do_force_shutdown(0x2) called from line 1052 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa0539131
    Dec 25 11:36:35 hv01 kernel: XFS (dm-4): Log I/O Error Detected. Shutting down filesystem
    Dec 25 11:36:35 hv01 kernel: XFS (dm-4): Please umount the filesystem and rectify the problem(s)
    Dec 25 11:36:39 hv01 multipathd: SServeRA_DATA_CE438969: sdb - directio checker reports path is down

  3. #3
    Join Date
    Nov 2005
    Location
    BC, Canada
    Posts
    776
    I would suggest asking your host to check the front LEDs again and if the one furthest to the right is lit up (amber alert), they can push that blue tab and the light path alert module will slide out. That module will indicate specifically where the alert is on the server. Physical hardware failure will likely show up as BRD whereas power supply failures will be PS1 or PS2 depending on the configuration. I can't recall which light path LED gets lit up for drive/controller, though in my experience I only see those when the drive has flat out failed and dropped from the array (orange LED also shows up on the HD). You could also install the Adaptec "arcconf" utility and run "arcconf getconfig 1" to give you health status of the controller and disks.
    || Higher Intellect || Half a million documents and climbing.
    || OMGWTFBBQ || Nothing of value here.

Similar Threads

  1. IBM X3550
    By mnm00 in forum Colocation, Data Centers, IP Space and Networks
    Replies: 2
    Last Post: 09-23-2011, 03:59 PM
  2. FS: IBM x3550 2xQC Lot
    By fraghost in forum Web Hosting Hardware
    Replies: 0
    Last Post: 12-13-2010, 05:29 PM
  3. SAS + SATA II on RAID 1 , IBM x3550 M2
    By batoo in forum Dedicated Server
    Replies: 0
    Last Post: 08-07-2010, 03:51 PM
  4. IBM x3550 rails for 2 post rack
    By WickedShark in forum Colocation, Data Centers, IP Space and Networks
    Replies: 5
    Last Post: 11-30-2008, 10:58 PM
  5. LAMP Diagnostics
    By M2ESoftworks in forum Hosting Security and Technology
    Replies: 3
    Last Post: 09-30-2005, 12:26 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •