Results 101 to 125 of 582
-
01-28-2004, 03:59 PM #101Web Hosting Guru
- Join Date
- Sep 2002
- Posts
- 333
I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.
-
01-28-2004, 04:10 PM #102Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
Right now, I am unsure as to what the issue is. I am hoping it's the ATA controller as mentioned earlier. This happens on boxes with very low loads and sometimes brand new servers so no its not a load issue.
And it's not a rhe issue either for those doubters if you look at the rackshack thread the issue actually exists in 7.2 + Hopefully its fixed in rhe and its a minor hardware problem that needs fixing (fingers crossed)Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 04:18 PM #103Web Hosting Guru
- Join Date
- Mar 2002
- Posts
- 288
load
Originally posted by 0utlier
I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.
JDT▄▀▄ Jeremy, CEO - Batcave Network
▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
▄▀▄ http://www.batcave.net
▄▀▄ We always have a deal going - hosting for cheaper than a domain!
-
01-28-2004, 04:18 PM #104Junior Guru Wannabe
- Join Date
- Jan 2001
- Location
- MN, USA
- Posts
- 97
Originally posted by 0utlier
I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.
Here is a snap of TOP right before my first crash with one site on the box:
Code:11:08:54 up 2 days, 7:48, 1 user, load average: 68.40, 54.12, 28.85 270 processes: 269 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 1.3% 0.0% 0.6% 0.1% 0.3% 97.5% 0.0% cpu00 1.7% 0.0% 0.7% 0.5% 0.1% 96.8% 0.0% cpu01 1.9% 0.0% 0.5% 0.0% 0.0% 97.5% 0.0% cpu02 1.0% 0.0% 0.8% 0.0% 0.0% 98.0% 0.0% cpu03 0.5% 0.0% 0.5% 0.0% 1.0% 97.8% 0.0% Mem: 1028480k av, 1009156k used, 19324k free, 0k shrd, 154628k buff 769784k actv, 180228k in_d, 5316k in_c Swap: 1052248k av, 350420k used, 701828k free 666272k cached
Code:14:17:23 up 16 min, 1 user, load average: 0.29, 0.63, 0.62 229 processes: 228 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 1.8% 0.0% 0.0% 0.0% 0.2% 5.6% 92.0% cpu00 0.5% 0.0% 0.3% 0.1% 0.7% 7.1% 90.8% cpu01 0.7% 0.0% 0.0% 0.0% 0.0% 4.9% 94.2% cpu02 3.3% 0.0% 0.0% 0.0% 0.0% 6.3% 90.2% cpu03 2.7% 0.0% 0.0% 0.0% 0.1% 3.9% 93.0% Mem: 1028480k av, 1000200k used, 28280k free, 0k shrd, 47936k buff 753020k actv, 166272k in_d, 5904k in_c Swap: 1052248k av, 0k used, 1052248k free 561344k cached
~Gertiebeth
-
01-28-2004, 04:22 PM #105Junior Guru Wannabe
- Join Date
- Jan 2001
- Location
- MN, USA
- Posts
- 97
Originally posted by MN-Robert
Right now, I am unsure as to what the issue is. I am hoping it's the ATA controller as mentioned earlier. This happens on boxes with very low loads and sometimes brand new servers so no its not a load issue.
And it's not a rhe issue either for those doubters if you look at the rackshack thread the issue actually exists in 7.2 + Hopefully its fixed in rhe and its a minor hardware problem that needs fixing (fingers crossed)
My fingers are crossed with you!~Gertiebeth
-
01-28-2004, 04:29 PM #106Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Update...
Since last night, with the ata fix and the stress tests, she's still up...
CPU states: cpu user nice system irq softirq iowait idle
total 10.4% 0.0% 9.9% 0.4% 0.0% 6.4% 72.6%
cpu00 9.9% 0.0% 12.8% 0.9% 0.0% 9.9% 66.3%
cpu01 11.0% 0.0% 7.0% 0.0% 0.0% 3.0% 79.0%
So that's 16 hours.. I've been taking a 'siesta' today, after all of those loooong nights, here's hoping that I can continue the siesta
-David
-
01-28-2004, 04:39 PM #107Junior Guru Wannabe
- Join Date
- Jan 2001
- Location
- MN, USA
- Posts
- 97
Originally posted by thedavid
Update...
Since last night, with the ata fix and the stress tests, she's still up...
CPU states: cpu user nice system irq softirq iowait idle
total 10.4% 0.0% 9.9% 0.4% 0.0% 6.4% 72.6%
cpu00 9.9% 0.0% 12.8% 0.9% 0.0% 9.9% 66.3%
cpu01 11.0% 0.0% 7.0% 0.0% 0.0% 3.0% 79.0%
So that's 16 hours.. I've been taking a 'siesta' today, after all of those loooong nights, here's hoping that I can continue the siesta
-David
I tried troubleshooting this last night with a tech, before I found this thread, and s/he had no idea what was wrong.
Thanks! Enjoy your tranquility.~Gertiebeth
-
01-28-2004, 05:16 PM #108Aspiring Evangelist
- Join Date
- Aug 2003
- Posts
- 433
Re: load
Originally posted by batcavenet
well from my experience with maybe closing on 50 crashes now lol is - high mysql load or disk load will crash it - but it also can crash just at random times but mostly under a high load
JDT
Our server (SuperResellerz) for over one month is kind of idle as we were just trying to install sites. We could not install sites except just2 domains with few pages as the POS just kept going down at random. There was hardly any traffic. Anyway, as we know an empty server doesn't get much traffic
So again....load is not a factor.
Thanks.www.QuickDate.com
www.MatchPedia.com --> The largest Dating and MatchMaking network of more than 100 sites!
-
01-28-2004, 05:53 PM #109Junior Guru Wannabe
- Join Date
- Jan 2001
- Location
- MN, USA
- Posts
- 97
Re: Re: load
Originally posted by rickkumar
I don't think load is a factor.
Our server (SuperResellerz) for over one month is kind of idle as we were just trying to install sites. We could not install sites except just2 domains with few pages as the POS just kept going down at random. There was hardly any traffic. Anyway, as we know an empty server doesn't get much traffic
So again....load is not a factor.
Thanks.
~Gertiebeth
-
01-28-2004, 07:00 PM #110Web Hosting Guru
- Join Date
- Mar 2002
- Posts
- 288
smartctl
This solution was provided by a tech guru at servermatrix and so far so good..
in this directory
cd /usr/local/cpanel/3rdparty/bin
mv smartctl smartctl.bak
then edit a new file and put this in it - and save it as smartctl without the start and end
---------------------START
#!/usr/local/bin/perl
#print "S.M.A.R.T. Sense: Okay\n";
#exit 0;
print "Checking $disk....";
my $disk="/dev/sda";
my $result = `/usr/local/cpanel/3rdparty/bin/smartctl -c $disk 2>&1`;
if ($result !~ /S.M.A.R.T. Sense: Okay/ && $result !~ /Check S.M.A.R.T. Passed/ && $result !~ /Device not configured/i && $result !~ /does not support/ && $result
!~ /Log Sense failed: Success/) {
#failure soon.. backup all data now.
$msg .= "Disk Failure soon on $disk\n\n$result\n";
print "Failed\n";
} else {
print "Success\n";
};
exit 0;
----------------------END
Be careful of the spacing - things that wrap should be on one line and not two
then exit and
chmod 755 smartctl
chown root:wheel smartctl
----------------------------
Then you want to disable smartcheck - I'm not sure which way they did it but search for smartcheck on cpanel.net forums
--------- this is what I tried from cpanel forums
you can do this by logging in as root and typing
"touch /var/cpanel/disablesmartcheck"
----------
My server is a SCSI box - and it was crashing due to /scripts/upcp each day. I do not recommend anyone use this script unless you know perl and what you are doing - probably best to just ask servermatrix to disable smartcheck and install the smartctl patch
My boxes have not went down after this was installed to this point but this is not sure thing yet just thought someone would like to know. This thing is supposed to just let smartctl think it's happy
This fix may only apply to nightly crashes on cpanel update scripts thanks
JDTLast edited by batcavenet; 01-28-2004 at 07:23 PM.
▄▀▄ Jeremy, CEO - Batcave Network
▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
▄▀▄ http://www.batcave.net
▄▀▄ We always have a deal going - hosting for cheaper than a domain!
-
01-28-2004, 07:02 PM #111Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Siesta ended - server dropped again, and just before it did I caught this off of top:
06:25:10 up 3:30, 1 user, load average: 342.53, 304.15, 192.53
638 processes: 634 sleeping, 2 running, 1 zombie, 1 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 3.8% 0.0% 1.7% 0.1% 0.5% 93.6% 0.0%
cpu00 6.5% 0.0% 0.9% 0.5% 1.8% 90.0% 0.0%
cpu01 1.4% 0.0% 2.0% 0.0% 0.0% 96.4% 0.0%
cpu02 6.5% 0.0% 2.0% 0.0% 0.0% 91.4% 0.0%
cpu03 0.9% 0.0% 2.1% 0.0% 0.1% 96.7% 0.0%
Mem: 1028480k av, 1003728k used, 24752k free, 0k shrd, 383424k buff
732140k actv, 183060k in_d, 4780k in_c
Swap: 1052248k av, 428860k used, 623388k free 293012k cached
So it hit the iowaits hardcore, let the processes build up since it was doing nothing but that, and then dropped offline.
-David
-
01-28-2004, 07:51 PM #112Junior Guru Wannabe
- Join Date
- Jan 2001
- Location
- MN, USA
- Posts
- 97
load average: 342.53, 304.15, 192.53 Dude!
I've crashed twice since my last post, but I never reached those levels.~Gertiebeth
-
01-28-2004, 07:54 PM #113Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
Dammit, had my gopes up David
Back to square one?Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 07:56 PM #114Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
gertiebeth I don't think we have the same issue then. As ours drops offline with very little IO or load. I am not sure what triggers it but I would like to work it out.
Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 07:56 PM #115Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Here's what I'm doing to resolve this, after all the tests, workarounds, etc..
Moving all the larger customers off of this server, stat! What I'm moving them to is a redhat 9 server, a p4 2.8 ghz machine with a gig of ram... This server was intended for new customers only, but it's going to be a liferaft now. It's been up 100% of the time since the beginning of the month, FWIW.
I'm also contacting them to see if they'll turn the xeon into another duplicate server of the p4.
What I'll do then is have them split up the ip blocks across those servers from what it used to be on the xeon (there are a few blocks of 32 ip's there). That way folks won't have to update their nameserver IP's..
-
01-28-2004, 07:57 PM #116Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
Do you have the rest of that top, I would like to know what process state apache/mysql etc was in.
Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 07:59 PM #117Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Originally posted by gertiebeth
load average: 342.53, 304.15, 192.53 Dude!
I've crashed twice since my last post, but I never reached those levels.
-David
-
01-28-2004, 08:00 PM #118Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
Anyone find it odd that we all virtually have the same servers? What about the poeple running singles any issues?
Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 08:02 PM #119Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Originally posted by MN-Robert
Do you have the rest of that top, I would like to know what process state apache/mysql etc was in.
11 root 15 0 0 0 0 SW 5.2 0.0 2:01 2 kswapd
6 root 15 0 0 0 0 SW 1.0 0.0 0:26 1 keventd
15263 root 15 0 1288 1256 464 R 0.5 0.1 0:40 0 top
25258 root 15 0 1488 1488 652 R 0.5 0.1 0:00 2 top
24259 rimvis2 15 0 3260 2604 2084 S 0.3 0.2 0:00 3 php
25038 dtz 15 0 5248 4916 2364 S 0.3 0.4 0:00 0 php
12 root 15 0 0 0 0 SW 0.2 0.0 0:04 0 kscand
5374 named 25 0 9768 4956 1032 S 0.2 0.4 0:22 2 named
24749 root 15 0 5588 4424 856 D 0.2 0.4 0:00 3 whostmgr
The above matches the expectations, as it appears something caused the swapfile to grow in size to 400 megs of usage.
-David
-
01-28-2004, 08:04 PM #120Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Originally posted by MN-Robert
Anyone find it odd that we all virtually have the same servers? What about the poeple running singles any issues?
-David
-
01-28-2004, 08:04 PM #121Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
So the issue at this time is:
Not RHE (Confirmed problem exists on other systems)
Not APF
Not IP Tables
Not the wrong channel on the SATA Drive
Not HT
What we know
Loads can be low and it dropps off
Ports remain open but no response from port
Machine stays pingable during entire outage
Needs a manual hard reboot to get itself going
Dies randomly
Users now complaining of high load after it "dies"Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 08:10 PM #122Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Add in 'not the kernel' as well, though that's somewhat related to 'not rhe'.
I think, honestly, it's something fubar'd with a particular revision of their motherboards that they got for their dual xeons. I can't think of anything else that it might be, really.
-
01-28-2004, 08:13 PM #123Web Hosting Master
- Join Date
- Nov 2001
- Posts
- 5,383
If we can confirm that no-one else is receiving this issue thats on a non dual xeon well that would be the problem something hardware wise on the Dual Xeon.
Clustered Hosting With Continuous Data Protection (CDP)
http://www.solidinternet.com
8 Years of hosting excellence!
-
01-28-2004, 08:33 PM #124Web Hosting Master
- Join Date
- Nov 2002
- Location
- Hot, hot Michigan...
- Posts
- 3,506
Don't know - the p4 2.8 I'm preparing to move people to has been up all the time, since the thing was brought online (except for reboots for kernel upgrades). No matter what I throw at it, it stays up..
-
01-28-2004, 08:42 PM #125Web Hosting Guru
- Join Date
- Mar 2002
- Posts
- 288
hmm
Well the only time I have seen weird issues like this a long time ago is something related to the proc filesystem.
For example when I do a "w" command it will hang there and not respond - or basic commands like that will not respond. You are right that sockets stay open but do not respond. They all say SYNC_RECV at some point in the final stages when the box is still up and nothing works..
I am still thinking it is a kernel thing myself but I guess we will see as they are working hard on finding a solution
JDT▄▀▄ Jeremy, CEO - Batcave Network
▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
▄▀▄ http://www.batcave.net
▄▀▄ We always have a deal going - hosting for cheaper than a domain!