    apache web server HELP! extreme laggy pages

    Hello all,

    I work for a company with a highly trafficked site (about 100,000 unique daily visitors). We have our "MANAGED" hosting through Rackspace for quite the pretty penny per month. It includes a dedicated back end / DB server and cloud servers for the front end.

    The setup has been functioning perfect for 10 months now; but this past Monday the speed of the site immediately dropped. Page load times fell from 1-2 seconds to between 10-20 seconds, and sometimes not at all. As far as we know (and as far as Rackspace says), no server setting were modified. No new code was introduced on our end. It's a mainly static site, with minimal user interaction with the backend at all.

    Can any expert offer some advice? We've monitored the traffic, checked IPs, etc. We've even tunes down several site features in the interest of reducing server load. Upon a server reboot, the active threads/processes running on it IMMEDIATELY jump back up to maxed out levels. It seems like once our daily traffic reaches 10MB/s, a type of queue forms and the delays begin. Rackspace assures us that we're not limited to that.

    The symptoms actually look and feel similar to those described in this thread: so I wonder if it could be a hardware issue on their end that they're just being quiet or ignorant about.

    Please advise - thanks! -Jay-

    Some more background info: The site is typically busiest from 7am until 3pm EST. For the past few days, we've noticed that between 7am and 9-10pm the server has just lagged incredibly. However, at around that 9-10pm mark, something changes and the pages go back to loading almost instantly. (There is still decent traffic though.) Then at around 7am again it slows to a crawl.

    Rackspace has offered solutions such as spinning up another server and incorporating their load balancing - they are in the process of this BUT they do NOT think the traffic is the issue. At one point they actually said there was potential packet loss somewhere in the network, but no progress has been made.

    Versions in Use:
    OS: cent OS on cloud
    OS: Redhat on Dedicated Server
    Apache: 2.2
    PHP: 5.3 / MySQL: 5.1.69

    can you say the website name as well and couple of pages that are loading slow?
    since you didn't change anything on the server side, is it possible that the backup is running somewhere between 7-9am somehow?
    the 2nd thing i can think of is the packet loss as you mentioned. I assume you have access to all servers... try to login to one and ping the others while slow pages happen.
    the last thing, do you get any errors on the server side?

    P.S. Just crossed my mind, it could be DNS related as well... maybe you're app is trying to resolve the hostname of the clients IPs... no idea what your website is doing or how it's implemented...
    Did you try to ask rackspace to change it to nginx and see if it improves?
    @silasistefan -

    i can't say the site name, but it's literally every page on the site that is loading slowly. even our text only text pages are taking 10 seconds. the site itself is just a user login system and then static content. not too much there.

    when pinging one server from another, time is an average of 100ms. seems high considering they're probably in the same room (or at least same building). we actually do get some packet loss pinging between the rackspace servers (AWESOME!... ugh.)

    we see occasional errors in the log, but VERY few compared to the amount of visitors. we also try to take care of them ASAP if possible. i did see the DNS question raised elsewhere also; wondering why that might just popup now after so long.

    @net -

    no we haven't inquired at all about nginx. i'm not familiar with it at all honestly. when we switched to rackspace earlier this year, we conferenced called and described the site (and traffic) fully and explained that godaddy hosting wasn't cutting it anymore. they made their recommendations and we went from there. i'm not sure what the average person pays for 2-3 servers such as this, but we were expecting a bit better service, especially in a situation like this.

    i would expect 100ms if the servers were one on the east coast and one on the west one... but still, i wouldn't expect any packet loss... from this superficial investigation, i believe this is the root cause.

    about the dns thing, if you have only couple of servers and they don't change the IP address too often, I would hard-code them in /etc/hosts to test if this is another issue. Even if it's not, it should improve any DNS query with couple of ms... also, check in mysql if you're using "skip-name-resolve" or not... (you should if you're not authenticating users by hostnames)

    good luck && let us know how it went with rackspace
    Huh... where am I again?
    If the server you pinged from is in the same datacenter as the one you pinged, then 100ms and packet loss are signs of network issues.

    From the sounds of it, I'm wondering if the network card is being maxed. You mentioned 10MB/s. Is that 10 Megabyte (MB) or megabit (Mb)? If its Megabyte, then you're at 80 Megabit a second and with overheads and other factors you might be hitting the network card's bandwidth, presuming it is a standard 100mbps connection. Bandwidth usages would need to be checked. Are they offering 1000mbps (1Gbit) connection and card?

    You mentioned even a text file takes 10 seconds to load, which means either network or perhaps apache is being hit to hard and that is queuing up.
    The servers are in the same datacenter. We believe there is some type of network issue. Rackspace originally told us that it looked to be a DDOS attack; they then said it was much more than our normal traffic; and then they said they were investigating their hardware. Eventually we determined SOMETHING had to be done, so another server was thrown up. Of course it helped immensely... for a price of course.

    I'm still curious as to how out of nowhere one day the web server could no longer handle the load. It wasn't like the page load times were slowly getting worse; one day they just diminished to a crawl. Here's 2 charts that show our standard traffic:

    I believe the cap looks to be at/around 10 Mbps. Our IT Director said that he believes we do have a gb card in the hardware. Thoughts on why the traffic seems to have a nice rollercoaster chart for the past 8 months but for 4 days last week (until we load balanced with another server on Friday) it would keep plateauing? (During these times pages would take 15-20 seconds to load, sometimes timing out, and this would last for a good 10 hours from 7am thru the early evening hours; then all of a sudden at 8 or 9 pm the site would start loading instantly again.)

    We don't mind much the cost of the additional server if it's necessary; just feel there is something else at play here.

    $100 that it's an issue on their cloud. Probably just a server that is doing a cron task during those times.

    Packet loss would typically mean someone is maxing things out.

    for thu and wed it looks like being capped. Friday it went over that 10mbps, but just a little bit. Did you show these graphs to rackspace? what did they say about it?
    why not putting a big static file on that server and try to fetch it from outside... i guess you can do 10mbps from home/office, if not put the link here.
    Determine if problem is in backend database or not. Put some logging of query times and see how much is due to DB. Maybe there is a problem on the DB server.

    From the graphs, as flat as they are since Weds mid day, it looks like your port downgraded to 10 Mbps. That could be due to a bad network cable or switch port or NIC. Check for port errors and have Rackspace check the port speed and errors on their end too. Run a speed test on the server, for example using this CLI tool against servers:

    A reboot will cause the port to re-negotiate, probably back to 100 Mbps. You can force that too without rebooting with ethtool -r eth0, etc.
