
|
View Full Version : server hardware - mobo
AlaskanWolf 02-10-2002, 04:57 PM I've got two servers, both identical hardware (mobo...etc) and both have idential symtoms....both seem to die at sporatic times when there is little or no load, both seem to love to wake me up 3-7am to do a hard reboot
Because its happening to both servers, i suspect its the hardware, it i think in part stems to the eepro100 drivers, which i reverted back to e100 on both, still same problem looks like it exists.
On the newwest server, before we dumped our customers on it, we were having issues so we sent it back to Amax, they said bad ram, changed it out, file transfers resumed to normal, no seg faults...
we then had issues with apache, so the admin working on it, recompilied a "custom" apache build on it, not like normal cpanel apaches...since then (and i think this is a different problem from above) i am getting
/etc/rc.d/init.d/httpd: line 187: 21560 Segmentation fault $HTTPD -DSSL on apache restart...Cpanel admin's are going to look into that.......
Others say likely a prob with mod_ssl
xxxxxxxxxxx
Now my question
Rather then spend $4k on 2 new servers and sell these ones on ebay, I think it has alot to do with the motherboards....cant i just change out the motherboard and put a new one in? What would i have to do other then swap out the mobos? would my existing harddrives see the new mobo with any weird errors popping up on reboot?
I would rather spend $300 on a trip down to CA, put in the mobos and come back to Anchorage if i know that will solve my problems...rather then buying a new rack, sending it down, moving customers off...etc....thats just bucks i dont have at the moment
ReliableServers 02-10-2002, 06:18 PM What are the specs of your servers? Might help if someone has/had a similar problem.
AlaskanWolf 02-10-2002, 07:14 PM Server 1
(*) P3 866
(*) 1 gig ram
(*) Motherboard - not known
(*) NIC Card: Intel Ethernet Pro 100 (82557)
Server 2
(*) P3 1 ghz x 2
(*) 1 gig ram
(*) Motherboard - not known (idential to #1)
(*) NIC Card: Intel Ethernet Pro 100 (82557)
They were bought from Amax.com, my ex-partner thought they were a great deal, so we bought them....wrong choice thats for sure
I called Amax and asked where they get the mobo and they gave me some crap how its all "in house" (YAH RIGHT) they have to have a distributor and these must be commerical wide available mobo's
The url we got the server is:
http://www.amaxit.com/x1-1Userver.htm
bitserve 02-10-2002, 08:41 PM Sounds like you're colocating then? When the servers become unavailable, can the techs at the data center do anything besides reboot them?
You haven't seen anything in your logs mentioning any type of errors except for the httpd?
It's going to be hard replacing the motherboards if you don't know what's in them right now. What OS/version are you running?
AlaskanWolf 02-10-2002, 09:04 PM Other then just reboot, thats about all they can do
Linux 7.1 on one and 7.2 on the other
I will be calling that Amax monday to get a clear idea of the motherboard information.
Has any of you done a complete mobo switchout before with a completely different type / model mobo?
I think i am going to put linux on one of my desktops, go buy a $50 mobo from the puter store and see what it does just for the frickels of it.....
Alan - Vox 02-10-2002, 11:02 PM Have you tried running it without apache to see if it still crashes?
Gernot 02-11-2002, 09:35 AM Segmentation Faults (Signal 11) often indicate a hardware failure. Mostly, it's the memory.
From the page you supplied (at amaxit.com) I can see that this server seems to be compsed of a lot of cheap pieces of hardware. You've got a VIA 133A chipset which is one of the cheapest ones you can get but it has got a lot of issues.
Ask your supplier after the motherboard brand / model number. This will allow you to try something else besides swapping the motherboard -> a BIOS upgrade. Especially boards with VIA chipsets seem to need some BIOS upgrades before they actually start to work reasonably reliably. If, after upgrading the BIOS to the latest version, you still experience the same problems you should definitely do the following:
a) Replace your RAM
b) Replace your motherboard with some good one
Before doing this I would check your server's CPU temperature though.
If you have SCSI hard drives I would suggest that you get a SuperMicro P3TDLR board which is rock-stable and not so expensive (only about $500 I think). For IDE drives you should get a Supermicro P3TDEI board which costs about $400. Both seem to be compatible with your current setup so after swapping out the motherboards you should not have to do any further configuration tasks.
For the future, get SuperMicro boards wherever possible. They're really good pieces of hardware, in my opinion.
bitserve 02-12-2002, 08:50 PM I'd hate to think it's something stupid like some powersaving BIOS settings that are putting something to sleep, but it sounds like a possibility.
I don't know of any other reasons why a machine would die when idle, just because it was idle.
You should start some debugging scripts like doing a PS every minute and dumping the output somewhere, or just keeping a shell connection open and running top until it dies.
How do the fsck checks go every morning? Are you using the IDE RAID?
By the way, I think it's going to be pretty impossible to replace the motherboards without knowing the specs of the old ones. I'd call the vendor up and demand that they provide replacements if they are indeed defective.
AlaskanWolf 02-12-2002, 11:39 PM Hi Gernot and bitserve
Thanks for the advice. The funny thing is we got a few value 1u servers that we use to put "high resource" customers on that we bought from Interpro Microsystems. These value servers are to say the least 10x better then the two servers i got from Amax.
I have decided just to cut my losses and get a few new servers and move all the customers from server to server.
I called Amax and all they could tell me was "they are custom made boards" (ohh like your THAT rich to have your own custom made mobos.....) from the purple books they sent with both servers, it had servelinux.com plastered all over it, so i checked them out and they dont have any more info then amax does..
I have been looking at some new servers and i came up with a few. You said to buy Supermicro's and i found one at Interpro Micro (P3TDDE) with raid 1
I also found the another one which seems very very popular with racksaver, qsol.com etc.... its the Tyan S2505T board with built on raid 0/1
Talking with both interpro and John @ qsol, they both stated, that this boards raid will not work with RAID 1.....
So i am thinking of reverting to my 2nd choice of the SuperMicro P3TDDE
The 2nd server, which is basically brand spanking new, had some bad ram which Amax replaced and we are also seeing SCSI controller failures (ide drives though)
so it looks like its clearly hardware. I figure since i cant get my money back, i am going to have the servers shipped back to me, have all the hardware tested and replaced with higher quality items and likely sell them so i can try and recoup the money i paid for them.
AlaskanWolf 02-12-2002, 11:50 PM Configurations I am considering
a.. Q12.i 1U - IDE 250W PS
a.. Tyan Dual CPU ATA VIA S2505
a.. Pentium III 1.00GHz x 2
a.. 512MB REG ECC RAM x 2
a.. HD 1: 80GB IDE 7200RPM x 1
a.. HD 2: 80GB IDE 7200RPM x 1
a.. PCI 1: Empty PCI Slot
a.. Slim Line Teac 24X CD-ROM
a.. Slim Line Teac Floppy Drive
a.. Red Hat Linux 7.2
a.. Standard 3 Year Warranty
hot swap...
*Notes: not clear on Red Hat 7.2 issues, everyone states they do not support 7.2 until Tyan releases drivers for it
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Supermicro P3TDDE
Duel P3 1.13 Ghz 512k
512 x2 MB SDRam (non-ecc...?)
2 x 80gig / 7200 rpm ide
24x cdrom
floppy
Trident 9880 8MB PCI video driver
1u case, hot swap
*notes: likely this is the preferred since i am running 7.1 and 7.2 and tyan board seems not to
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Gernot 02-13-2002, 02:01 PM I would get the second configuration. It seems that you already use the specs from Interpro Micro, so also your vendor is a very good choice. They actually know what they're doing unlike the old vendor you had.
|