hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : Web Hosting : Dreamhost Asking Client To Block Search Engine Bots?
Reply

Web Hosting Discussions on all aspects of web hosting including past experiences (both negative and positive), choosing a host, questions and answers, and other related subjects. If your service is unavailable, please click here.
Forum Jump

Dreamhost Asking Client To Block Search Engine Bots?

Reply Post New Thread In Web Hosting Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old 05-17-2007, 03:31 PM
jayaic jayaic is offline
Junior Guru
 
Join Date: May 2003
Location: So Cal
Posts: 182
*

Dreamhost Asking Client To Block Search Engine Bots?


Was just reading up on some search engine news and noticed this story DreamHost Blocks Googlebot on Clients Site

Seems a client received and email from Dreamhost saying:

Quote:
This email is to inform you that a few of your sites were getting hammered by Google bot. This was causing a heavy load on the webserver, and in turn affecting other customers on your shared server. In order to maintain stability on the webserver, I was forced to block Google bot via the .htaccess file.
And

Quote:
You also want to consider making your files be unsearchable by robots and crawlers, as that usually contributes to high number of hits. If they hit a dynamic file, like php, it can cause high memory usage and consequently high load…
I know the search bots can be a drain on bandwidth, but out right blocking the bots to your customers sites seems like a drastic move.

Most of these customers are dependent on search engines for people to find them so wouldn't you be killing their business effectively by blocking the bots?

__________________
-Jay-

Watch out where the huskies go and don't you eat that yellow snow!

Reply With Quote


Sponsored Links
  #2  
Old 05-17-2007, 03:55 PM
Jedito Jedito is offline
Web Hosting Master
 
Join Date: Apr 2001
Location: Paradise
Posts: 11,318
This is odd, as far as I know, you can request to google to make their bots browse slowers which would fix this issue, to not allow AT ALL to the spiders browse their sites it's going to be a big problem for many many people.
Google is our #1 referral, and I guess that's for many other people too.

__________________
Shared Web Hosting - Reseller Hosting - Semi-Dedicated Servers - SolusVM/XEN VPS
LiteSpeed Powered - R1Soft Continuous Data Protection - 24/7 Chat/Phone/Helpdesk Support - US/EU/Asia Servers
Cpanel/WHM - Softaculous - R1soft Backup - Litespeed - Cloudlinux -Site Builder- SSH support - Account Migration
DowntownHost LLC - In Business since 2001 - Toll Free:*1-877-384-6781*

Reply With Quote
  #3  
Old 05-17-2007, 03:56 PM
Siropel Siropel is offline
Web Hosting Master
 
Join Date: May 2005
Location: Behind a linux box
Posts: 685
I can confirm this as the guy that got the mail is a friend of mine.

__________________
Got Fused?

Reply With Quote
Sponsored Links
  #4  
Old 05-17-2007, 04:00 PM
David David is offline
& Goliath
 
Join Date: Oct 2003
Location: San Diego
Posts: 8,803
Well it was either that or suspend his account.
He should consider optimizing his scripts or moving; Not much left for choices beyond that!

Google & Yahoo can hammer sites like crazy so it's not all that drastic as long as it's temporary. Would he have preferred suspension? They notified him upfront which is acceptable.

__________________
David McKendrick
Fused - Quality web hosting
Follow me on twitter
Blessed is the man who walks not in the counsel of the wicked.

Reply With Quote
  #5  
Old 05-17-2007, 04:06 PM
bithost(NET) bithost(NET) is offline
Too smart for her own good.
 
Join Date: Feb 2004
Location: Your Screen
Posts: 3,998
Just another example of Dreamhost getting caught with their pants down, IMO...

I look at it this way. I run several dozen of my own sites... little, big, few pages, lots of pages. I look forward to when Google arrives, because that's what assures that my sites are getting put into the Great Google Database.

Not only does the Googlebot traffic not cause an increased load, but it does not cause significant drain in CPU nor does it cause anybody's sites to load more slowly. And, I have these sites spread amongst hosting client sites, so I am on the same chuck wagon as clients... I get the same ride. It's not like I have set up a separate server for my own stuff.

IMO, this is simply another sign of Dreamhost's servers being oversold. They have put more on a box than the box can comfortably run without performance being affected. Search engine bots are part of the hosting deal, they are not an "extreme" circumstance...

If hosts don't want to be faced with these kinds of issues, they need to take a closer look at how they have engineered the circumstances that have come to be. Plan sizes, bandwidth allocations, etc. end up basically misleading clients to thinking they can get a level of service that in practical terms, they can't.

And who loses? The hosting customer. Which IMO, sucks.

It keeps boiling down to "you get what you pay for." But it still annoys me that there are hosting companies out there doing this stuff. They are teaching people that they can get a ton of space/bandwidth for pennies, then slapping them on the hand when the customer actually uses those resources in reasonable ways.

What are people supposed to think?

They've been led down the primrose path, only to stumble at the edge of a hidden 300 ft. cliff, and it ain't right.

Bailey

__________________
Let's Connect on Twitter! @thatsmsgeek2u || Fighting mediocrity one thread at a time.

Reply With Quote
  #6  
Old 05-17-2007, 04:27 PM
Siropel Siropel is offline
Web Hosting Master
 
Join Date: May 2005
Location: Behind a linux box
Posts: 685
An official response from someone @dreamhost?
http://www.seopedia.org/internet-mar...ock-googlebot/

__________________
Got Fused?

Reply With Quote
  #7  
Old 05-17-2007, 04:29 PM
jayaic jayaic is offline
Junior Guru
 
Join Date: May 2003
Location: So Cal
Posts: 182
Quote:
Originally Posted by David View Post
Would he have preferred suspension? They notified him upfront which is acceptable.
I do give them credit for notifying him, but to be honest I would think that by getting into the hosting business you understand going in that search bots are going to be crawling around possibly eating up bandwidth.

I mean as a host if your having trouble with just the search bots, I would be afraid what might happen if I actually do any promotion of my site.

__________________
-Jay-

Watch out where the huskies go and don't you eat that yellow snow!

Reply With Quote
  #8  
Old 05-17-2007, 05:29 PM
David David is offline
& Goliath
 
Join Date: Oct 2003
Location: San Diego
Posts: 8,803
Quote:
Originally Posted by jayaic View Post
I do give them credit for notifying him, but to be honest I would think that by getting into the hosting business you understand going in that search bots are going to be crawling around possibly eating up bandwidth.

I mean as a host if your having trouble with just the search bots, I would be afraid what might happen if I actually do any promotion of my site.
Yes, but buts can still cause load. It's not just bandwidth. An example would be a Yahoo slurp bot which was just crawling one of my client's sites on a single rss feed (admittedly, it's horrible code) but brought the load from a 0.17 average to 5.

If a similar bot were to hit a horribly coded script on a system that's already undergoing some load it could potentially cripple the server.

__________________
David McKendrick
Fused - Quality web hosting
Follow me on twitter
Blessed is the man who walks not in the counsel of the wicked.

Reply With Quote
  #9  
Old 05-17-2007, 05:43 PM
Henrik Henrik is offline
Web Hosting Master
 
Join Date: Apr 2006
Posts: 2,200
It would be interesting to see a response from Dreamhost, and also to see if this is a hoax or not. (It is the first time I've heard of Google being blocked by an ISP in this fashion.)

__________________
Need personalized hosting or consulting?

Twitter


Reply With Quote
  #10  
Old 05-17-2007, 05:47 PM
(Stephen) (Stephen) is offline
Owner of the net for a day
 
Join Date: Jun 2002
Location: Waco, TX
Posts: 4,550
I can tell you there ARE some real runaway bots one of them exists on the Amazon ec3 network called SMbot, it WILL kill your site, your server, etc.

Reply With Quote
  #11  
Old 05-17-2007, 06:00 PM
fastnoc fastnoc is offline
Web Hosting Master
 
Join Date: Nov 2003
Location: Newport Beach, CA
Posts: 2,921
I personally think they're missing the problem.

I've got a site that use to use a large amount of load (considering it's on a dual core dual cpu server running by itself). it gets a little over 700,000 page views per day now.

When we analyzed the traffic and the load, a lot of the problem appeared to be google.

Not google-bot, but google just the same.

The load on the server was roughly 5 or 6. pretty high but not fatal. We then noticed the problem with google was not their search bots or crawlers, it was their friggin cache crawlers. You know how you can click the link 'cached' next to your results in google? The bots that were doign all the caching were loading up the server. We blocked the range that they used, and the load dropped back down to under 2.

This did NOT stop any googlebots, just the caching service. upon investigation this is a completely different service and is not even using the same IP range as their search bots use.

So while it's easy to bad mouth dreamhost, they have a very valid point. They just may not know where the real problem is.

P.S. don't try using a robots.txt file to instruct not to index certain files. Those are precisely the files that rogue bots will look for, and you're pointing right to the files you don't want anyone to see

__________________
Show your reciprocal links on your website. eReferrer


Last edited by fastnoc; 05-17-2007 at 06:04 PM.
Reply With Quote
  #12  
Old 05-17-2007, 06:11 PM
(Stephen) (Stephen) is offline
Owner of the net for a day
 
Join Date: Jun 2002
Location: Waco, TX
Posts: 4,550
to add to what fastnoc said, may of those "cache bots" are actually serving up the google content accelerator that has been around a bit too.

Reply With Quote
  #13  
Old 05-17-2007, 06:31 PM
bithost(NET) bithost(NET) is offline
Too smart for her own good.
 
Join Date: Feb 2004
Location: Your Screen
Posts: 3,998
Quote:
Originally Posted by David View Post
Yes, but buts can still cause load. It's not just bandwidth. An example would be a Yahoo slurp bot which was just crawling one of my client's sites on a single rss feed (admittedly, it's horrible code) but brought the load from a 0.17 average to 5.

If a similar bot were to hit a horribly coded script on a system that's already undergoing some load it could potentially cripple the server.
I don't understand how this could happen (load of 0.17 ==> 5) ... can you offer some specs of the site being crawled? If it's a dynamic site, what scripting system (CMS?) is being used? How many pages/files/etc?

It seems that this must be a large site with a lot of pages or files, a lot of complex dynamically-generated content...

And can you please shed some light on the physical specs of the server? As well as how many sites are on it, what else was running at the time (mailings? Other bots? etc.) and what kind of sites are you hosting on it (e.g., lots of Joomla sites?)?


Load is a rather arbitrary measure of what is going on, it really tells us nothing.


From my experience, if a server's load is jumping from 0.17 to 5+ just because of regular bot traffic, there are three possibilities:
  1. the server is overloaded for what's being hosted/run on it. (Yes, yes, I know them's fightin' words.)
  2. the server is not optimized for the activities going on, or
  3. the client is running highly-dynamic, highly-resource intensive scripting which by all rights should be on a dedicated server.

I'm not saying that all shared clients belong on shared servers... by all means, resource-intensive scripts and operations need to be on their own box. I am assuming here that we are talking about shared clients who are running stuff that is appropriate for shared hosting, resource-wise.


Anyway I am just trying to better understand the example you provided, David. I don't consider bot traffic to be extraordinary, nor do we see bot traffic driving up loads on our boxes, so I am trying to understand how others are running into this problem so that we can hopefully avoid it.

Just my 2¢ worth, and your mileage will probably vary.


Bailey

__________________
Let's Connect on Twitter! @thatsmsgeek2u || Fighting mediocrity one thread at a time.

Reply With Quote
  #14  
Old 05-17-2007, 08:46 PM
ldcdc ldcdc is offline
Retired Moderator
 
Join Date: Oct 2002
Location: EU - east side
Posts: 21,920
Quote:
Originally Posted by David View Post
Well it was either that or suspend his account.
He should consider optimizing his scripts or moving; Not much left for choices beyond that!

Google & Yahoo can hammer sites like crazy so it's not all that drastic as long as it's temporary. Would he have preferred suspension? They notified him upfront which is acceptable.
I can only agree with you David. Customers (fairly) complain that they're not given details when resources abuse happens. Now they are given them, the site is not exactly suspended, yet there's still reason to complain, when the typical approach would have been "resources abuse", account suspended, end of story. All this is proof that as long as you do something that restricts usage, you'll have unhappy customers. The explanations matter little. Hardly any incentive for hosts to expand on their "abuse" related procedures.

Reply With Quote
  #15  
Old 05-17-2007, 08:50 PM
40sixty 40sixty is offline
Chilling in Pen Island
 
Join Date: May 2006
Posts: 872
I'm sorry, but if I ran a webhosting company and If I saw a site that's using a lot of resources (shared hosting), I would definitly suspend the account.

I think what dreamhost did was generous, but a lot of people here (especially, the smaller hosts) just disagree with thier action..because well, dreamhost is a sucessful company /=

__________________
hosted by HawkHost
I Recommend: LimeStone Networks!
The OverSeller Defender!

Reply With Quote
Reply

Related posts from TheWhir.com
Title Type Date Posted
50e21a9e-0e64-443f-81f0-5ba545a79943 Listing 2013-03-05 18:24:43
50e21a9e-abf4-4d8b-b667-5bb245a79943 Listing 2013-03-05 18:24:45
50e21a9e-9430-44b2-85ed-5bb545a79943 Listing 2013-03-05 18:24:45
50e21a9e-7f00-4449-8414-5bb945a79943 Listing 2013-03-05 18:24:46
50db70b4-6568-4fd5-a244-480145a799de Listing 2013-03-05 18:24:38


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?