Some of our servers have had a rediculous amount of hardware failures lately, all are either P4 3.06Ghz or Dual Xeon from popular providers.
For example the server we host our main site vonetwork.net (also .com) has been down now for 14 hrs! Unbelievable!

While I do not blame anybody on this, I question the expertise the techs involved had on the matter and they only updated us properly after 13 hrs!

(I know hardware is bound to fail as I myself run a computer peripheral shop. And also I know how to troubleshoot hardware.)

Here is the story behind it, I thought I should share


We have rebooted the server. But the server didn't come up. Our senior tech is looking into the issue. We will update you soon.

I can not find the problem. I can redo the OS, or we can wait until a Level 3 tech is avail.
Tech: Atjeudude, Status: Tech Updated] Tue Mar 2, 2004 - 10:21 AM

Well I was waiting for a level 3 tech I deceided to get the hammer out. The server is up now.

[Tech: Atjeudude, Status: Tech Updated] Tue Mar 2, 2004 - 11:43 AM

OK it looks good now we had to replace a network card. Sorry for the delay.

Your ticket appears to be completed. I am therefore closing this ticket.

[Tech: Atjeudude, Status: Closed] Tue Mar 2, 2004 - 11:43 AM

Your ticket appears to be completed. I am therefore closing this ticket.

If you need any further help, just let us know. Thanks, Atjeu.

Here is a run down so far - new kernel put on earlier today - new kernel does NOT have drivers and everything needed for the existing nic card. Tech had to go through about 5 nics before he found the one that would work with kernel. Client has backup drive - drive was from another server that never got turned over to previous client so drive was reused here as the backup - either our techs just installed drive and let you guys push the buttons in whm to format and set it up as /backup or our techs did that not realizing it had come from another server as the primary drive (again customer never paid for this server so they didnt get it) - problem there is that we put 3 partitions on primary drives with the first one being bootable and being the /boot partition - 2nd being swap and 3rd being / for all the rest. In this case, the partitions were not overwritten, the 2nd was a swap so wasnt used, and the 3rd became this clients backup partition. Usually when backups are setup, the second you push the format new drive button in whm it wipes the drive - obviously it didnt happen in this case and with the new kernel, for whatever reason, it booted from hdc3 which was thhe clients backup partition but also contained an filesystem from before - thats why client thought it wasnt his.

This can be overcome by hardcoding fstab and grub.conf which we have done and now the server is back up just fine - we can obviously blast the 2nd drive and make sure its done correctly this time but there is another problem. In attempting to debug all of this, tech pulled nic out (because server was booted but wasnt pinging) thinking it was related to this mornings issues but it wasnt, then on boot kudzu removed drivers and info for the existing nic - now kernel and kudzu will not recognize the nic so we are attempting to force the driver into the kernel to get the network back up - again clients data is there and cpanel is there and everythings fine now, just have to get it back up responding on the network. will update you shortly.