
05-17-2007, 03:31 PM
|
|
Junior Guru
|
|
Join Date: May 2003
Location: So Cal
Posts: 182
|
|
Dreamhost Asking Client To Block Search Engine Bots?
Was just reading up on some search engine news and noticed this story DreamHost Blocks Googlebot on Clients Site
Seems a client received and email from Dreamhost saying:
Quote:
|
This email is to inform you that a few of your sites were getting hammered by Google bot. This was causing a heavy load on the webserver, and in turn affecting other customers on your shared server. In order to maintain stability on the webserver, I was forced to block Google bot via the .htaccess file.
|
And
Quote:
|
You also want to consider making your files be unsearchable by robots and crawlers, as that usually contributes to high number of hits. If they hit a dynamic file, like php, it can cause high memory usage and consequently high load…
|
I know the search bots can be a drain on bandwidth, but out right blocking the bots to your customers sites seems like a drastic move.
Most of these customers are dependent on search engines for people to find them so wouldn't you be killing their business effectively by blocking the bots?
__________________
-Jay-
Watch out where the huskies go and don't you eat that yellow snow!
|

05-17-2007, 03:55 PM
|
|
Web Hosting Master
|
|
Join Date: Apr 2001
Location: Paradise
Posts: 11,318
|
|
This is odd, as far as I know, you can request to google to make their bots browse slowers which would fix this issue, to not allow AT ALL to the spiders browse their sites it's going to be a big problem for many many people.
Google is our #1 referral, and I guess that's for many other people too.
|

05-17-2007, 03:56 PM
|
|
Web Hosting Master
|
|
Join Date: May 2005
Location: Behind a linux box
Posts: 685
|
|
I can confirm this as the guy that got the mail is a friend of mine.
__________________
Got Fused?
|

05-17-2007, 04:00 PM
|
|
& Goliath
|
|
Join Date: Oct 2003
Location: San Diego
Posts: 8,803
|
|
Well it was either that or suspend his account.
He should consider optimizing his scripts or moving; Not much left for choices beyond that!
Google & Yahoo can hammer sites like crazy so it's not all that drastic as long as it's temporary. Would he have preferred suspension? They notified him upfront which is acceptable.
|

05-17-2007, 04:06 PM
|
|
Too smart for her own good.
|
|
Join Date: Feb 2004
Location: Your Screen
Posts: 3,998
|
|
Just another example of Dreamhost getting caught with their pants down, IMO...
I look at it this way. I run several dozen of my own sites... little, big, few pages, lots of pages. I look forward to when Google arrives, because that's what assures that my sites are getting put into the Great Google Database.
Not only does the Googlebot traffic not cause an increased load, but it does not cause significant drain in CPU nor does it cause anybody's sites to load more slowly. And, I have these sites spread amongst hosting client sites, so I am on the same chuck wagon as clients... I get the same ride. It's not like I have set up a separate server for my own stuff.
IMO, this is simply another sign of Dreamhost's servers being oversold. They have put more on a box than the box can comfortably run without performance being affected. Search engine bots are part of the hosting deal, they are not an "extreme" circumstance...
If hosts don't want to be faced with these kinds of issues, they need to take a closer look at how they have engineered the circumstances that have come to be. Plan sizes, bandwidth allocations, etc. end up basically misleading clients to thinking they can get a level of service that in practical terms, they can't.
And who loses? The hosting customer. Which IMO, sucks.
It keeps boiling down to "you get what you pay for." But it still annoys me that there are hosting companies out there doing this stuff. They are teaching people that they can get a ton of space/bandwidth for pennies, then slapping them on the hand when the customer actually uses those resources in reasonable ways.
What are people supposed to think?
They've been led down the primrose path, only to stumble at the edge of a hidden 300 ft. cliff, and it ain't right.
 Bailey
__________________
Let's Connect on Twitter! @thatsmsgeek2u || Fighting mediocrity one thread at a time.
|

05-17-2007, 04:27 PM
|
|
Web Hosting Master
|
|
Join Date: May 2005
Location: Behind a linux box
Posts: 685
|
|
__________________
Got Fused?
|

05-17-2007, 04:29 PM
|
|
Junior Guru
|
|
Join Date: May 2003
Location: So Cal
Posts: 182
|
|
Quote:
Originally Posted by David
Would he have preferred suspension? They notified him upfront which is acceptable.
|
I do give them credit for notifying him, but to be honest I would think that by getting into the hosting business you understand going in that search bots are going to be crawling around possibly eating up bandwidth.
I mean as a host if your having trouble with just the search bots, I would be afraid what might happen if I actually do any promotion of my site.
__________________
-Jay-
Watch out where the huskies go and don't you eat that yellow snow!
|

05-17-2007, 05:29 PM
|
|
& Goliath
|
|
Join Date: Oct 2003
Location: San Diego
Posts: 8,803
|
|
Quote:
Originally Posted by jayaic
I do give them credit for notifying him, but to be honest I would think that by getting into the hosting business you understand going in that search bots are going to be crawling around possibly eating up bandwidth.
I mean as a host if your having trouble with just the search bots, I would be afraid what might happen if I actually do any promotion of my site.
|
Yes, but buts can still cause load. It's not just bandwidth. An example would be a Yahoo slurp bot which was just crawling one of my client's sites on a single rss feed (admittedly, it's horrible code) but brought the load from a 0.17 average to 5.
If a similar bot were to hit a horribly coded script on a system that's already undergoing some load it could potentially cripple the server.
|

05-17-2007, 05:43 PM
|
|
Web Hosting Master
|
|
Join Date: Apr 2006
Posts: 2,200
|
|
It would be interesting to see a response from Dreamhost, and also to see if this is a hoax or not. (It is the first time I've heard of Google being blocked by an ISP in this fashion.)
|

05-17-2007, 05:47 PM
|
|
Owner of the net for a day
|
|
Join Date: Jun 2002
Location: Waco, TX
Posts: 4,550
|
|
I can tell you there ARE some real runaway bots one of them exists on the Amazon ec3 network called SMbot, it WILL kill your site, your server, etc.
|

05-17-2007, 06:00 PM
|
|
Web Hosting Master
|
|
Join Date: Nov 2003
Location: Newport Beach, CA
Posts: 2,921
|
|
I personally think they're missing the problem.
I've got a site that use to use a large amount of load (considering it's on a dual core dual cpu server running by itself). it gets a little over 700,000 page views per day now.
When we analyzed the traffic and the load, a lot of the problem appeared to be google.
Not google-bot, but google just the same.
The load on the server was roughly 5 or 6. pretty high but not fatal. We then noticed the problem with google was not their search bots or crawlers, it was their friggin cache crawlers. You know how you can click the link 'cached' next to your results in google? The bots that were doign all the caching were loading up the server. We blocked the range that they used, and the load dropped back down to under 2.
This did NOT stop any googlebots, just the caching service. upon investigation this is a completely different service and is not even using the same IP range as their search bots use.
So while it's easy to bad mouth dreamhost, they have a very valid point. They just may not know where the real problem is.
P.S. don't try using a robots.txt file to instruct not to index certain files. Those are precisely the files that rogue bots will look for, and you're pointing right to the files you don't want anyone to see 
__________________
Show your reciprocal links on your website. eReferrer
Last edited by fastnoc; 05-17-2007 at 06:04 PM.
|

05-17-2007, 06:11 PM
|
|
Owner of the net for a day
|
|
Join Date: Jun 2002
Location: Waco, TX
Posts: 4,550
|
|
to add to what fastnoc said, may of those "cache bots" are actually serving up the google content accelerator that has been around a bit too.
|

05-17-2007, 06:31 PM
|
|
Too smart for her own good.
|
|
Join Date: Feb 2004
Location: Your Screen
Posts: 3,998
|
|
Quote:
Originally Posted by David
Yes, but buts can still cause load. It's not just bandwidth. An example would be a Yahoo slurp bot which was just crawling one of my client's sites on a single rss feed (admittedly, it's horrible code) but brought the load from a 0.17 average to 5.
If a similar bot were to hit a horribly coded script on a system that's already undergoing some load it could potentially cripple the server.
|
I don't understand how this could happen (load of 0.17 ==> 5) ... can you offer some specs of the site being crawled? If it's a dynamic site, what scripting system (CMS?) is being used? How many pages/files/etc?
It seems that this must be a large site with a lot of pages or files, a lot of complex dynamically-generated content...
And can you please shed some light on the physical specs of the server? As well as how many sites are on it, what else was running at the time (mailings? Other bots? etc.) and what kind of sites are you hosting on it (e.g., lots of Joomla sites?)?
Load is a rather arbitrary measure of what is going on, it really tells us nothing.
From my experience, if a server's load is jumping from 0.17 to 5+ just because of regular bot traffic, there are three possibilities:
- the server is overloaded for what's being hosted/run on it. (Yes, yes, I know them's fightin' words.)
- the server is not optimized for the activities going on, or
- the client is running highly-dynamic, highly-resource intensive scripting which by all rights should be on a dedicated server.
I'm not saying that all shared clients belong on shared servers... by all means, resource-intensive scripts and operations need to be on their own box. I am assuming here that we are talking about shared clients who are running stuff that is appropriate for shared hosting, resource-wise.
Anyway I am just trying to better understand the example you provided, David. I don't consider bot traffic to be extraordinary, nor do we see bot traffic driving up loads on our boxes, so I am trying to understand how others are running into this problem so that we can hopefully avoid it.
Just my 2¢ worth, and your mileage will probably vary.
 Bailey
__________________
Let's Connect on Twitter! @thatsmsgeek2u || Fighting mediocrity one thread at a time.
|

05-17-2007, 08:46 PM
|
|
Retired Moderator
|
|
Join Date: Oct 2002
Location: EU - east side
Posts: 21,920
|
|
Quote:
Originally Posted by David
Well it was either that or suspend his account.
He should consider optimizing his scripts or moving; Not much left for choices beyond that!
Google & Yahoo can hammer sites like crazy so it's not all that drastic as long as it's temporary. Would he have preferred suspension? They notified him upfront which is acceptable.
|
I can only agree with you David. Customers (fairly) complain that they're not given details when resources abuse happens. Now they are given them, the site is not exactly suspended, yet there's still reason to complain, when the typical approach would have been "resources abuse", account suspended, end of story. All this is proof that as long as you do something that restricts usage, you'll have unhappy customers. The explanations matter little. Hardly any incentive for hosts to expand on their "abuse" related procedures. 
|

05-17-2007, 08:50 PM
|
|
Chilling in Pen Island
|
|
Join Date: May 2006
Posts: 872
|
|
I'm sorry, but if I ran a webhosting company and If I saw a site that's using a lot of resources (shared hosting), I would definitly suspend the account.
I think what dreamhost did was generous, but a lot of people here (especially, the smaller hosts) just disagree with thier action..because well, dreamhost is a sucessful company /=
__________________
hosted by HawkHost
I Recommend: LimeStone Networks!
The OverSeller Defender!
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
| Postbit Selector |
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|
|
| Login: |
|
|
| Advertisement: |
|
|
| Web Hosting News: |
|
|
|