Page 5 of 24 FirstFirst ... 234567815 ... LastLast
Results 101 to 125 of 582
  1. #101
    Join Date
    Sep 2002
    Posts
    333
    I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.

  2. #102
    Join Date
    Nov 2001
    Posts
    5,383
    Right now, I am unsure as to what the issue is. I am hoping it's the ATA controller as mentioned earlier. This happens on boxes with very low loads and sometimes brand new servers so no its not a load issue.

    And it's not a rhe issue either for those doubters if you look at the rackshack thread the issue actually exists in 7.2 + Hopefully its fixed in rhe and its a minor hardware problem that needs fixing (fingers crossed)
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  3. #103

    load

    Originally posted by 0utlier
    I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.
    well from my experience with maybe closing on 50 crashes now lol is - high mysql load or disk load will crash it - but it also can crash just at random times but mostly under a high load

    JDT
    ▄▀▄ Jeremy, CEO - Batcave Network
    ▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
    ▄▀▄ http://www.batcave.net
    ▄▀▄ We always have a deal going - hosting for cheaper than a domain!

  4. #104
    Join Date
    Jan 2001
    Location
    MN, USA
    Posts
    97
    Originally posted by 0utlier
    I have a dual xeon (for about 30 days) with RHE and haven't had any problems whatsoever. Is this a problem when the machine gets a load? Just wondering if I have something to look forward to.
    Yes, I believe the problem escalates when the load is high. I noticed the high iowait percent while I was configuring the server, then needed to do a graceful reboot once I have one site on the box. I waited 24 hours and everything looked good, so I added the rest of the sites to the box. Now I'm crashing hourly.

    Here is a snap of TOP right before my first crash with one site on the box:

    Code:
    11:08:54  up 2 days,  7:48,  1 user,  load average: 68.40, 54.12, 28.85
    270 processes: 269 sleeping, 1 running, 0 zombie, 0 stopped
    CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
               total    1.3%    0.0%    0.6%   0.1%     0.3%   97.5%    0.0%
               cpu00    1.7%    0.0%    0.7%   0.5%     0.1%   96.8%    0.0%
               cpu01    1.9%    0.0%    0.5%   0.0%     0.0%   97.5%    0.0%
               cpu02    1.0%    0.0%    0.8%   0.0%     0.0%   98.0%    0.0%
               cpu03    0.5%    0.0%    0.5%   0.0%     1.0%   97.8%    0.0%
    Mem:  1028480k av, 1009156k used,   19324k free,       0k shrd,  154628k buff
                        769784k actv,  180228k in_d,    5316k in_c
    Swap: 1052248k av,  350420k used,  701828k free                  666272k cached
    Here is a snap after a reboot 15 minutes ago:

    Code:
     14:17:23  up 16 min,  1 user,  load average: 0.29, 0.63, 0.62
    229 processes: 228 sleeping, 1 running, 0 zombie, 0 stopped
    CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
               total    1.8%    0.0%    0.0%   0.0%     0.2%    5.6%   92.0%
               cpu00    0.5%    0.0%    0.3%   0.1%     0.7%    7.1%   90.8%
               cpu01    0.7%    0.0%    0.0%   0.0%     0.0%    4.9%   94.2%
               cpu02    3.3%    0.0%    0.0%   0.0%     0.0%    6.3%   90.2%
               cpu03    2.7%    0.0%    0.0%   0.0%     0.1%    3.9%   93.0%
    Mem:  1028480k av, 1000200k used,   28280k free,       0k shrd,   47936k buff
                        753020k actv,  166272k in_d,    5904k in_c
    Swap: 1052248k av,       0k used, 1052248k free                  561344k cached
    We'll see how long she stays up this time...
    ~Gertiebeth

  5. #105
    Join Date
    Jan 2001
    Location
    MN, USA
    Posts
    97
    Originally posted by MN-Robert
    Right now, I am unsure as to what the issue is. I am hoping it's the ATA controller as mentioned earlier. This happens on boxes with very low loads and sometimes brand new servers so no its not a load issue.

    And it's not a rhe issue either for those doubters if you look at the rackshack thread the issue actually exists in 7.2 + Hopefully its fixed in rhe and its a minor hardware problem that needs fixing (fingers crossed)
    So this problem is specifically with SM? I have boxes with RS and SM, but the box I am having trouble with is at SM. I have other Xeons, but they are all hosted at RS with RH9. I have one other box with RHE on it at RS but it is a P4. So far the only troubles I am having with it is using up2date.

    My fingers are crossed with you!
    ~Gertiebeth

  6. #106
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Update...

    Since last night, with the ata fix and the stress tests, she's still up...

    CPU states: cpu user nice system irq softirq iowait idle
    total 10.4% 0.0% 9.9% 0.4% 0.0% 6.4% 72.6%
    cpu00 9.9% 0.0% 12.8% 0.9% 0.0% 9.9% 66.3%
    cpu01 11.0% 0.0% 7.0% 0.0% 0.0% 3.0% 79.0%


    So that's 16 hours.. I've been taking a 'siesta' today, after all of those loooong nights, here's hoping that I can continue the siesta

    -David

  7. #107
    Join Date
    Jan 2001
    Location
    MN, USA
    Posts
    97
    Originally posted by thedavid
    Update...

    Since last night, with the ata fix and the stress tests, she's still up...

    CPU states: cpu user nice system irq softirq iowait idle
    total 10.4% 0.0% 9.9% 0.4% 0.0% 6.4% 72.6%
    cpu00 9.9% 0.0% 12.8% 0.9% 0.0% 9.9% 66.3%
    cpu01 11.0% 0.0% 7.0% 0.0% 0.0% 3.0% 79.0%


    So that's 16 hours.. I've been taking a 'siesta' today, after all of those loooong nights, here's hoping that I can continue the siesta

    -David
    David, I'd like to reference this in my TT at SM. Can I use your name, and who were you working with?

    I tried troubleshooting this last night with a tech, before I found this thread, and s/he had no idea what was wrong.

    Thanks! Enjoy your tranquility.
    ~Gertiebeth

  8. #108

    Re: load

    Originally posted by batcavenet
    well from my experience with maybe closing on 50 crashes now lol is - high mysql load or disk load will crash it - but it also can crash just at random times but mostly under a high load

    JDT
    I don't think load is a factor.

    Our server (SuperResellerz) for over one month is kind of idle as we were just trying to install sites. We could not install sites except just2 domains with few pages as the POS just kept going down at random. There was hardly any traffic. Anyway, as we know an empty server doesn't get much traffic

    So again....load is not a factor.

    Thanks.
    www.QuickDate.com


    www.MatchPedia.com --> The largest Dating and MatchMaking network of more than 100 sites!

  9. #109
    Join Date
    Jan 2001
    Location
    MN, USA
    Posts
    97

    Re: Re: load

    Originally posted by rickkumar
    I don't think load is a factor.

    Our server (SuperResellerz) for over one month is kind of idle as we were just trying to install sites. We could not install sites except just2 domains with few pages as the POS just kept going down at random. There was hardly any traffic. Anyway, as we know an empty server doesn't get much traffic

    So again....load is not a factor.

    Thanks.
    In my case it seems load IS a factor in excellerating the problem. With only one site on the box, we crashed once, with multiple sites (I think around 20 or so) it crashes hourly.

    ~Gertiebeth

  10. #110

    smartctl

    This solution was provided by a tech guru at servermatrix and so far so good..

    in this directory

    cd /usr/local/cpanel/3rdparty/bin

    mv smartctl smartctl.bak

    then edit a new file and put this in it - and save it as smartctl without the start and end

    ---------------------START
    #!/usr/local/bin/perl
    #print "S.M.A.R.T. Sense: Okay\n";
    #exit 0;
    print "Checking $disk....";
    my $disk="/dev/sda";
    my $result = `/usr/local/cpanel/3rdparty/bin/smartctl -c $disk 2>&1`;
    if ($result !~ /S.M.A.R.T. Sense: Okay/ && $result !~ /Check S.M.A.R.T. Passed/ && $result !~ /Device not configured/i && $result !~ /does not support/ && $result
    !~ /Log Sense failed: Success/) {
    #failure soon.. backup all data now.
    $msg .= "Disk Failure soon on $disk\n\n$result\n";
    print "Failed\n";
    } else {
    print "Success\n";
    };
    exit 0;

    ----------------------END

    Be careful of the spacing - things that wrap should be on one line and not two

    then exit and

    chmod 755 smartctl
    chown root:wheel smartctl


    ----------------------------

    Then you want to disable smartcheck - I'm not sure which way they did it but search for smartcheck on cpanel.net forums

    --------- this is what I tried from cpanel forums
    you can do this by logging in as root and typing

    "touch /var/cpanel/disablesmartcheck"
    ----------

    My server is a SCSI box - and it was crashing due to /scripts/upcp each day. I do not recommend anyone use this script unless you know perl and what you are doing - probably best to just ask servermatrix to disable smartcheck and install the smartctl patch

    My boxes have not went down after this was installed to this point but this is not sure thing yet just thought someone would like to know. This thing is supposed to just let smartctl think it's happy

    This fix may only apply to nightly crashes on cpanel update scripts thanks

    JDT
    Last edited by batcavenet; 01-28-2004 at 07:23 PM.
    ▄▀▄ Jeremy, CEO - Batcave Network
    ▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
    ▄▀▄ http://www.batcave.net
    ▄▀▄ We always have a deal going - hosting for cheaper than a domain!

  11. #111
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Siesta ended - server dropped again, and just before it did I caught this off of top:

    06:25:10 up 3:30, 1 user, load average: 342.53, 304.15, 192.53
    638 processes: 634 sleeping, 2 running, 1 zombie, 1 stopped
    CPU states: cpu user nice system irq softirq iowait idle
    total 3.8% 0.0% 1.7% 0.1% 0.5% 93.6% 0.0%
    cpu00 6.5% 0.0% 0.9% 0.5% 1.8% 90.0% 0.0%
    cpu01 1.4% 0.0% 2.0% 0.0% 0.0% 96.4% 0.0%
    cpu02 6.5% 0.0% 2.0% 0.0% 0.0% 91.4% 0.0%
    cpu03 0.9% 0.0% 2.1% 0.0% 0.1% 96.7% 0.0%
    Mem: 1028480k av, 1003728k used, 24752k free, 0k shrd, 383424k buff
    732140k actv, 183060k in_d, 4780k in_c
    Swap: 1052248k av, 428860k used, 623388k free 293012k cached


    So it hit the iowaits hardcore, let the processes build up since it was doing nothing but that, and then dropped offline.

    -David

  12. #112
    Join Date
    Jan 2001
    Location
    MN, USA
    Posts
    97

    *

    load average: 342.53, 304.15, 192.53 Dude!

    I've crashed twice since my last post, but I never reached those levels.
    ~Gertiebeth

  13. #113
    Join Date
    Nov 2001
    Posts
    5,383
    Dammit, had my gopes up David

    Back to square one?
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  14. #114
    Join Date
    Nov 2001
    Posts
    5,383
    gertiebeth I don't think we have the same issue then. As ours drops offline with very little IO or load. I am not sure what triggers it but I would like to work it out.
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  15. #115
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Here's what I'm doing to resolve this, after all the tests, workarounds, etc..

    Moving all the larger customers off of this server, stat! What I'm moving them to is a redhat 9 server, a p4 2.8 ghz machine with a gig of ram... This server was intended for new customers only, but it's going to be a liferaft now. It's been up 100% of the time since the beginning of the month, FWIW.

    I'm also contacting them to see if they'll turn the xeon into another duplicate server of the p4.

    What I'll do then is have them split up the ip blocks across those servers from what it used to be on the xeon (there are a few blocks of 32 ip's there). That way folks won't have to update their nameserver IP's..

  16. #116
    Join Date
    Nov 2001
    Posts
    5,383
    Do you have the rest of that top, I would like to know what process state apache/mysql etc was in.
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  17. #117
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Originally posted by gertiebeth
    load average: 342.53, 304.15, 192.53 Dude!

    I've crashed twice since my last post, but I never reached those levels.
    Guess what the load was before the spiral? .95. Yes. Point nintey five.

    -David

  18. #118
    Join Date
    Nov 2001
    Posts
    5,383
    Anyone find it odd that we all virtually have the same servers? What about the poeple running singles any issues?
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  19. #119
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Originally posted by MN-Robert
    Do you have the rest of that top, I would like to know what process state apache/mysql etc was in.
    I only have a small section, but here's the first few lines:
    11 root 15 0 0 0 0 SW 5.2 0.0 2:01 2 kswapd
    6 root 15 0 0 0 0 SW 1.0 0.0 0:26 1 keventd
    15263 root 15 0 1288 1256 464 R 0.5 0.1 0:40 0 top
    25258 root 15 0 1488 1488 652 R 0.5 0.1 0:00 2 top
    24259 rimvis2 15 0 3260 2604 2084 S 0.3 0.2 0:00 3 php
    25038 dtz 15 0 5248 4916 2364 S 0.3 0.4 0:00 0 php
    12 root 15 0 0 0 0 SW 0.2 0.0 0:04 0 kscand
    5374 named 25 0 9768 4956 1032 S 0.2 0.4 0:22 2 named
    24749 root 15 0 5588 4424 856 D 0.2 0.4 0:00 3 whostmgr
    Apache called those PHP entries, but that's about it... httpd is nowhere to be found, probably dropped off sometime before the freeze...

    The above matches the expectations, as it appears something caused the swapfile to grow in size to 400 megs of usage.

    -David

  20. #120
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Originally posted by MN-Robert
    Anyone find it odd that we all virtually have the same servers? What about the poeple running singles any issues?
    None on the p4 2.8 that we have there also, which we're moving people to likely all night tonight. Sucks, I just got rested up from previous bouts of this..

    -David

  21. #121
    Join Date
    Nov 2001
    Posts
    5,383
    So the issue at this time is:

    Not RHE (Confirmed problem exists on other systems)
    Not APF
    Not IP Tables
    Not the wrong channel on the SATA Drive
    Not HT

    What we know

    Loads can be low and it dropps off
    Ports remain open but no response from port
    Machine stays pingable during entire outage
    Needs a manual hard reboot to get itself going
    Dies randomly
    Users now complaining of high load after it "dies"
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  22. #122
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Add in 'not the kernel' as well, though that's somewhat related to 'not rhe'.

    I think, honestly, it's something fubar'd with a particular revision of their motherboards that they got for their dual xeons. I can't think of anything else that it might be, really.

  23. #123
    Join Date
    Nov 2001
    Posts
    5,383
    If we can confirm that no-one else is receiving this issue thats on a non dual xeon well that would be the problem something hardware wise on the Dual Xeon.
    Clustered Hosting With Continuous Data Protection (CDP)
    http://www.solidinternet.com
    8 Years of hosting excellence!

  24. #124
    Join Date
    Nov 2002
    Location
    Hot, hot Michigan...
    Posts
    3,506
    Don't know - the p4 2.8 I'm preparing to move people to has been up all the time, since the thing was brought online (except for reboots for kernel upgrades). No matter what I throw at it, it stays up..

  25. #125

    hmm

    Well the only time I have seen weird issues like this a long time ago is something related to the proc filesystem.

    For example when I do a "w" command it will hang there and not respond - or basic commands like that will not respond. You are right that sockets stay open but do not respond. They all say SYNC_RECV at some point in the final stages when the box is still up and nothing works..

    I am still thinking it is a kernel thing myself but I guess we will see as they are working hard on finding a solution

    JDT
    ▄▀▄ Jeremy, CEO - Batcave Network
    ▄▀▄ Deals on Hosting, VPS, Domains, Dedicated Servers since 1997
    ▄▀▄ http://www.batcave.net
    ▄▀▄ We always have a deal going - hosting for cheaper than a domain!

Page 5 of 24 FirstFirst ... 234567815 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •