Results 1 to 11 of 11
  1. #1
    Join Date
    Jan 2002
    Location
    Colombia, South America
    Posts
    521

    Question .htaccess block BaiduSpider mod_rewrite

    I have tried, without success, to block approximately 50 crawlers of the Baidu search engine, in China; since this web site can only sell magazines to people with U.S. addresses. The top 6 lines of my .htaccess file are at the bottom of this post.

    Someone in OLM Support gave me the below link to webmasterworld. Since nothing I have tried so far has worked, I would appreciate someone knowledgeable about mod_rewrite and .htaccess letting me know whether I can place those 3 lines at the very top of my .htaccess file, or, if they need to be placed lower in the file. (If so, where?)

    Also, I am looking for any other ideas about how to keep the Baidu crawlers out. The site is on Shared Hosting, so trying to keep them out with iptables or something else is not an option for me. They do not look at the robots.txt file or sitemap.txt file, they just come in....

    TIA! Lanny



    http://www.webmasterworld.com/search...rs/3667300.htm

    If your site is on an Apache server, you can block using mod_rewrite via .htaccess:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC]
    RewriteRule .* - [F]

    I included the [NC] which allows for case differences since at least one of the Baidu bots uses "BaiDuSpider"



    deny from baidu.com
    deny from crawl.baidu.com
    deny from 220.181.7.
    deny from 123.125.66.
    deny from baiduspider-220-181-7-20.crawl.baidu.com
    deny from baiduspider-220-181-7-61.crawl.baidu.com

  2. #2
    Join Date
    Oct 2004
    Location
    Kerala, India
    Posts
    4,750
    Try adding this to your httpd.conf file.

    Code:
    SetEnvIfNoCase User-Agent "^Baidu" bad_bot
    <Directory />
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Directory>
    David | www.cliffsupport.com
    Affordable Server Management Solutions sales AT cliffsupport DOT com
    CliffWebManager | Access WHM from iPhone and Android

  3. #3
    Join Date
    Jan 2002
    Location
    Colombia, South America
    Posts
    521

    David: Thank you - should I also add the mod_rewrite to .htaccess?

    Quote Originally Posted by david510 View Post
    Try adding this to your httpd.conf file.

    Code:
    SetEnvIfNoCase User-Agent "^Baidu" bad_bot
    <Directory />
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Directory>
    David: Thank you! I will connect to the control panel and see if I have access to httpd.conf Question: Should I also add the mod_rewrite to .htaccess? TIA! Lanny

  4. #4
    Join Date
    Oct 2004
    Location
    Kerala, India
    Posts
    4,750
    If you add in the httpd.conf file, you don't need to add in .htaccess file.
    David | www.cliffsupport.com
    Affordable Server Management Solutions sales AT cliffsupport DOT com
    CliffWebManager | Access WHM from iPhone and Android

  5. #5
    Join Date
    Jan 2002
    Location
    Colombia, South America
    Posts
    521

    No access to httpd.conf file

    Quote Originally Posted by david510 View Post
    If you add in the httpd.conf file, you don't need to add in .htaccess file.
    >Unfortunately no, you do not have access to the httpd.conf file on a >shared server.

    The above is from a Sys Admin at OLM. If I had a Dedicated Box or VPS, I could do it, but not on shared hosting.

    Question: Should I put that mod_rewrite at the very top of the .htaccess file or does it need to go in some particular place? Your time and help are much appreciated!

  6. #6
    Join Date
    Feb 2005
    Location
    Australia
    Posts
    5,842
    If in doubt, post the existing .htaccess so we can see what else it might interfere with. But I can't think of any reason why it should - almost certainly you can put it in at the top.
    Chris

    "Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them." - Laurence J. Peter

  7. #7
    Join Date
    Nov 2002
    Location
    Portland, Oregon
    Posts
    2,948
    Lanny,

    If your website is on a cPanel shared server, you should be able to use the "IP Deny Manager" which allows you to block individual IP's, domain names, ranges, CIDR format, etc. This may help protect the Baidu crawlers from hitting anything on your domain name/website, but not the shared server as a whole. Seeing your .htaccess wouldn't hurt, though.
    | John Edel Jetfire Networks L.L.C. Trusted Hosting Solutions
    | Consistent, Reliable, Stable OpenVZ & KVM Virtual Private Servers
    | SpamWall AV & Full SMTP Filtering
    Now an SSLStore Titanium Partner!

  8. #8
    Join Date
    Oct 2004
    Location
    Kerala, India
    Posts
    4,750
    Quote Originally Posted by nwtg View Post
    Lanny,

    If your website is on a cPanel shared server, you should be able to use the "IP Deny Manager" which allows you to block individual IP's, domain names, ranges, CIDR format, etc. This may help protect the Baidu crawlers from hitting anything on your domain name/website, but not the shared server as a whole. Seeing your .htaccess wouldn't hurt, though.
    Seems OP has has already done that from cpanel. see the first reply.

    Code:
    deny from baidu.com
    deny from crawl.baidu.com
    deny from 220.181.7.
    deny from 123.125.66.
    deny from baiduspider-220-181-7-20.crawl.baidu.com
    deny from baiduspider-220-181-7-61.crawl.baidu.com
    David | www.cliffsupport.com
    Affordable Server Management Solutions sales AT cliffsupport DOT com
    CliffWebManager | Access WHM from iPhone and Android

  9. #9
    Join Date
    Jan 2002
    Location
    Colombia, South America
    Posts
    521

    .htaccess file (can't attach, included in post)

    The latest version of the .htaccess file is at the bottom of this post. They are still coming in. The latest things I've put into .htaccess are at the top of the file.

    nwtg it is not on a cPanel server it is on an Ensim server. When I move, it will be to cPanel.

    david510 and foobic thank you for all of your ideas and comments. Much appreciated!

    I tried, without success, to attach the .txt file to this reply, but it did not seem to work, so I am posting the entire contents of the .htaccess file below. Suggestions? TIA! Lanny

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC]
    RewriteRule .* - [F]
    deny from baidu.com
    deny from crawl.baidu.com
    deny from 220.181.7.
    deny from 123.125.66.
    deny from baiduspider-220-181-7-20.crawl.baidu.com
    deny from baiduspider-220-181-7-61.crawl.baidu.com
    deny from baiduspider-220-181-7-62.crawl.baidu.com
    deny from baiduspider-220-181-7-65.crawl.baidu.com
    deny from baiduspider-220-181-7-66.crawl.baidu.com
    deny from baiduspider-220-181-7-68.crawl.baidu.com
    deny from baiduspider-220-181-7-69.crawl.baidu.com
    deny from baiduspider-220-181-7-70.crawl.baidu.com
    deny from baiduspider-220-181-7-71.crawl.baidu.com
    deny from baiduspider-220-181-7-74.crawl.baidu.com
    deny from baiduspider-220-181-7-77.crawl.baidu.com
    deny from baiduspider-220-181-7-81.crawl.baidu.com
    deny from baiduspider-220-181-7-82.crawl.baidu.com
    deny from baiduspider-220-181-7-85.crawl.baidu.com
    deny from baiduspider-220-181-7-86.crawl.baidu.com
    deny from baiduspider-220-181-7-89.crawl.baidu.com
    deny from baiduspider-220-181-7-91.crawl.baidu.com
    deny from baiduspider-220-181-7-95.crawl.baidu.com
    deny from baiduspider-220-181-7-96.crawl.baidu.com
    deny from baiduspider-220-181-7-98.crawl.baidu.com
    deny from baiduspider-220-181-7-122.crawl.baidu.com
    deny from baiduspider-220-181-7-126.crawl.baidu.com
    deny from baiduspider-220-181-7-128.crawl.baidu.com
    deny from baiduspider-220-181-7-129.crawl.baidu.com
    deny from baiduspider-220-181-7-131.crawl.baidu.com
    deny from baiduspider-123-125-66-57.crawl.baidu.com
    deny from baiduspider-123-125-66-59.crawl.baidu.com
    deny from baiduspider-123-125-66-61.crawl.baidu.com
    deny from baiduspider-123-125-66-62.crawl.baidu.com
    deny from baiduspider-123-125-66-65.crawl.baidu.com
    deny from baiduspider-123-125-66-66.crawl.baidu.com
    deny from baiduspider-123-125-66-67.crawl.baidu.com
    deny from baiduspider-123-125-66-68.crawl.baidu.com
    deny from baiduspider-123-125-66-70.crawl.baidu.com
    deny from baiduspider-123-125-66-72.crawl.baidu.com
    deny from baiduspider-123-125-66-77.crawl.baidu.com
    deny from baiduspider-123-125-66-79.crawl.baidu.com
    deny from baiduspider-123-125-66-80.crawl.baidu.com
    deny from baiduspider-123-125-66-82.crawl.baidu.com
    deny from baiduspider-123-125-66-84.crawl.baidu.com
    deny from baiduspider-123-125-66-85.crawl.baidu.com
    deny from baiduspider-123-125-66-90.crawl.baidu.com
    deny from baiduspider-123-125-66-91.crawl.baidu.com
    deny from baiduspider-123-125-66-94.crawl.baidu.com
    deny from baiduspider-123-125-66-95.crawl.baidu.com
    deny from baiduspider-123-125-66-119.crawl.baidu.com
    deny from baiduspider-123-125-66-122.crawl.baidu.com
    deny from baiduspider-123-125-66-126.crawl.baidu.com
    deny from baiduspider-123-125-66-127.crawl.baidu.com
    deny from baiduspider-220-181-7-130.crawl.baidu.com
    # -FrontPage-

    IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

    <Limit GET POST>
    order deny,allow

    allow from all
    </Limit>
    <Limit PUT DELETE>
    order deny,allow
    deny from all
    </Limit>
    AuthName www.lowcostmagazines.com
    AuthUserFile /home/virtual/site212/fst/var/www/html/_vti_pvt/service.pwd
    AuthGroupFile /home/virtual/site212/fst/var/www/html/_vti_pvt/service.grp

    ErrorDocument 404 /404_Error.html

  10. #10
    deny from baidu.com
    deny from crawl.baidu.com
    deny from baiduspider-220-181-7-20.crawl.baidu.com
    deny from baiduspider-220-181-7-61.crawl.baidu.com

    The above may be abbreviated as
    deny from .baidu.com

  11. #11
    "I am looking for any other ideas about how to keep the Baidu crawlers out."

    If your server supports PHP, you can delay each page request by
    up to 999 seconds...

    Rename your default page to home.php,
    then insert the code below into index.php

    <?php

    function f_sleep() {
    sleep(999);
    echo "<html><body> Go away! </body></html>"; }

    $age = $_SERVER[HTTP_USER_AGENT];
    if(strpos($age , "Baidu") > 0) f_sleep();

    $ip = $_SERVER[REMOTE_ADDR];
    $ip0 = "0g".$ip;
    if(strpos($ip0 , "119.63.19") > 0) f_sleep();
    if(strpos($ip0 , "220.181") > 0) f_sleep();
    if(strpos($ip0 , "123.125.6") > 0) f_sleep();

    header("Location: home.php");
    exit;
    ?>

    or similar.

Similar Threads

  1. .htaccess mod_rewrite Help
    By Vickie in forum Hosting Security and Technology
    Replies: 1
    Last Post: 05-28-2008, 08:59 AM
  2. A mod_rewrite htaccess Q?
    By shiftchip in forum Programming Discussion
    Replies: 3
    Last Post: 01-20-2007, 02:15 PM
  3. mod_Rewrite .htaccess
    By atokatli in forum Hosting Security and Technology
    Replies: 2
    Last Post: 11-05-2005, 03:28 PM
  4. Need help with .htaccess mod_rewrite
    By mrzippy in forum Hosting Security and Technology
    Replies: 2
    Last Post: 08-23-2004, 09:25 PM
  5. [php] help with htaccess, and mod_rewrite
    By kneuf in forum Programming Discussion
    Replies: 7
    Last Post: 08-09-2004, 03:09 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •