ì am having troubles with several forums using phpbb causing mysql to go down.
I found out the googlebot causes it, by creating an awful lot of session id's. WE created robots.txt in the forum dirs, to prevent google bot to search them, but why does this causemysqld to go down?? Now mysql keeps running ok
there are no msgs in mysql log
running RH7.3, DA, mysql 4, celeron 1.7 1gb memory
i can't imagine i am the only 1 having this, cause there are thousands forums running phpbb, and i don't think every1 uses the robots.txt file to stop the google bot
this works for now, but whenevr theres a new forum created i have to add the robots.txt file again
Originally posted by Pilgrim Does anyone else have problems with googlebot taking down the server? One site I host got crawled by googlebot twice this week and both times it took down the entire server!
And this is by no means a heavily loaded server (well not until the googlebot started 4 cgi processes per second, every second)
You have a badly configured server or a badly configured script.
I run PHP and Perl as CGIs and Google has thus far not proved to push the server load up considerably at all.
The problem program was MT (movable type - a blog program). The bot tried to blast it's way through thousends of archived pages within a matter of minutes. No problem had they been html based but since they are cgi based the server blew a fuse.
The customer has now edited his robots.txt file to stop the bot from indexing his site again. Luckily he doesn't care much about being indexed.
Can't you lower the allowed requests per second in Apache? You can use directives to target only googlebot. Our server gets hit a lot by MSNbot and Alexa as well, it would take our server down once in a while but by turning our requests down and limiting the max simultaneous connections in the Apache config, the problem seems to have vanished.
I've always found the Googlebot to be very well behaved when spidering my sites. Are you sure you're getting spidered by Google, and not just someone faking their UA-header? We've gotten hit a couple times this month by a spider from mainland china that uses "googlebot" all in lowercase as the user-agent string, doesn't read robots.txt, and requests pages as fast as possible, unlike Google's sedate page every minute or so.
Just a thought... See where the IPs come from. As far as I know, the IPs Google uses for spidering all have reverse DNS set for something like "crawler10.googlebot.com"...
redpin.com - offering amazingly competent email, dns, and web hosting since 2002... because someone has to!
Because Simple Things Should Be Simple - YouCANHasDNS