Page 1 of 2 12 LastLast
Results 1 to 25 of 47
  1. #1
    Join Date
    Oct 2007
    Location
    Greater DC
    Posts
    152

    how to block Yandex bot ??

    Greetings -

    I'd like to block the Yandex bot. It eats bandwidth unduly,
    and I doubt that folks in Russia want to listen to my songs.
    [Admittedly, neither do most folk. ]

    Here's a sample entry from my visitors log:
    Host: 87.250.255.243
    * /irish.htm
    Http Code: 304 Date: Feb 03 23:39:56 Http Version: HTTP/1.1 Size in Bytes: -
    Referer: -
    Agent: Yandex/1.01.001 (compatible; Win16; I)
    and here are the relevant lines from my .htaccess
    #banish undesired bots
    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} ^BPImageWalker [NC]
    RewriteRule ^(.*)$ hxxp://sod.off/

    RewriteCond %{HTTP_USER_AGENT} ^Purebot [NC]
    RewriteRule ^(.*)$ hxxp://sod.off/

    RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC]
    RewriteRule ^(.*)$ hxxp://sod.off/
    Doesn't seem to help.

    Suggestions welcome.

    Thanks.

    - Richard

  2. #2
    Join Date
    May 2002
    Location
    Moscow
    Posts
    1,602
    You could use:

    Deny from 87.250.224.0/19
    Deny from 87.250.252.0/22
    Deny from 87.250.255.0/24

    in .htaccess file. or you could use just robots.txt and block Yandex robot
    TK Rustelekom LLC Dedicated server since 2002, RIPE NCC member, LIR

  3. #3
    robots.txt is an ideal way to block the Yandex robot. Create a file robots.txt in your account and insert the following lines:

    User-agent: Yandex
    Disallow: /
    This will disallow Yandex robot to access any file under your account. BTW, there are bad robots which ignores robots.txt file in which case, you have to manually block the IPs of these servers either using the method stated above by "rustelekom" OR block the IPs server wide using any 3rd party firewall or iptables.
    | LinuxHostingSupport.net
    | Server Setup | Security | Optimization | Troubleshooting | Server Migration
    | Monthly and Task basis services.
    | MSN : madaboutlinux[at]hotmail.com | Skype : madaboutlinux

  4. #4
    I found this post when searching for Yandex.ru

    I just got a call from my hoster because my bandwith usage is insane. We figured out that it's multiple IP's from yandex.ru.

    I take it, most of you on here have blocked this site, and it's not really necessary?

  5. #5
    Quote Originally Posted by madaboutlinux View Post
    robots.txt is an ideal way to block the Yandex robot. Create a file robots.txt in your account and insert the following lines:



    This will disallow Yandex robot to access any file under your account. BTW, there are bad robots which ignores robots.txt file in which case, you have to manually block the IPs of these servers either using the method stated above by "rustelekom" OR block the IPs server wide using any 3rd party firewall or iptables.
    First post here. I came for the same reason as the above poster, looking for a way to block Yandex.

    First of all, and this is from experience, the above quoted solution only works with robots that actually pay attention to robots.txt.

    But it's like saying "Pretty please, don't do what's on this list."

    Yandex DOES NOT CARE anything about robots.txt, and doesn't even visit it when they come skulking around your site. Check your own logs to verify this. I have, and it hasn't even LOOKED at robots.txt! That said, that to me, makes Yandex a "Bad Robot!"

    The ONLY way to keep them out is through .htaccess, and that's what I'm looking for too.

    IP blocking won't work either, or it might, but only for a little while, until they start using more IP's, and changing the existing ones.

    Why do I want to block them? On principle. Too much Internet crime originates in Russia and several other Easter European countries. Call my attitude what you will, it's not prejudice. It's reality.

    For a site that gets as little traffic as mine does, Yandex visits far too often, and for far too long, driving my bandwidth up noticeably! How much worse is it going to get when I actually start getting traffic?

    I'm not waiting around to find out. I'm bound and determined to slam the door on them now, before it gets worse, and before thousands more hackers start collecting info from them on my sites, getting them shut down yet again.

  6. #6

    Exclamation

    AND THE WINNER IS!!!

    in .htaccess add the following lines:

    SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
    Order Deny,Allow
    Deny from env=bad_bot


    403 403 403 403 403 403 403....





    Life is gewd!!!!

  7. #7
    Join Date
    May 2010
    Location
    Praha, CZ
    Posts
    270
    Use robots.txt

    User-agent: Yandex
    Disallow: /
    » NQhost.com - Flexible Solutions
    » Linux VPS, Windows VPS and Xen VDS
    » USA, Germany and Russia Data-centers

  8. #8
    Join Date
    May 2010
    Location
    Praha, CZ
    Posts
    270
    The correct way is to use robots.txt.
    Yandex bot will not look into your site and a lot of "403" errors is not very nice way to block an access.
    » NQhost.com - Flexible Solutions
    » Linux VPS, Windows VPS and Xen VDS
    » USA, Germany and Russia Data-centers

  9. #9
    Quote Originally Posted by NQhost View Post
    The correct way is to use robots.txt.
    Yandex bot will not look into your site and a lot of "403" errors is not very nice way to block an access.
    Are you kidding me?!

    Look, I USED robots.txt with the exact lines you suggested. For a WEEK.

    Yandex INGNORED it. Your d--n right 403 isn't "nice!"

    When robots do not adhere to the rules in robots.txt, as far as I am concerned, they are not playing fair, or nice.

    Yandex has spiked my bandwidth with a ridiculous number of visits (far, FAR more than Google bots EVER make).

    Before I tried blocking them in robots.txt, they were ignoring all the rules there, exposing much of my content to security risks.

    Not nice? You bet it's not nice! It's my bandwidth. I pay for it! They are MY files, and no one will access them if I don't want them to.

    Not nice? I tried "nice." It didn't work. No more Mr. Nice Guy!

    BTW: The "correct" way to do things is by doing what works! Not by following rules when they aren't working.

    I do use robots.txt

    When a robot breaks the rules in there, they are banned. Period!

  10. #10
    Join Date
    May 2010
    Location
    Praha, CZ
    Posts
    270
    Also, try to delete your url from yandex crawl:

    http://webmaster.yandex.ru/delurl.xml
    » NQhost.com - Flexible Solutions
    » Linux VPS, Windows VPS and Xen VDS
    » USA, Germany and Russia Data-centers

  11. #11
    Now, that is a method I haven't yet seen posted anywhere.

    Others can try it if they wish.

    I personally wouldn't trust it. If they didn't bother to adhere to robots.txt, what basis do I have to go on to believe that using a list on their site would work either?

    I'm not saying it doesn't. Maybe it does. But I am saying I have no reason to believe it will, and plenty of reason not to trust Yandex based on log entries.

    Every search engine, with the exception of Yandex, visits robots.txt FIRST, then moves on, adhering to the rules.

    When a robot, any robot, doesn't even bother to look at the rules, that's a BAD robot.

    Yandex is not just visiting way too many times per day, they are crawling files that haven't even been on the site since BEFORE they began crawling it. A HUGE number of their "hits" were 404 before I blocked them.

    Why? Why, if the robot has already determined multiple times that a file is not there, does it keep looking for it over and over again? Google, MSN, Yahoo, ASK, and all the other search engines don't do that. They look for them a few times, over the course of a week or so, presumably to see if they reappear, then quit looking.

    Yandex has been hitting 404's for WEEKS, multiple times a day. Most of the time, that's a sign of a hacker. I am NOT saying Yandex is trying to hack my sights. I'm sure they aren't, but that isn't the point.

    The point is, their crawler is woefully inefficient, displays no respect for my rules, and looks for all intents and purposes like it's acting like a hacker.

    That is exactly why there are so many forum posts all over the web from webmasters trying to figure out what it is (it doesn't occur to most that it could be a legitimate search engine, because it doesn't ACT like a legitimate search engine). It is exactly why so many are trying to learn how to block them.

    If you were throwing a party in your home, and told everyone who came, "The patio, living room, dining room and bathrooms are all open to you all, but please don't go in the bedroom, and you found one of your guests in the bedroom, going through your closets, what would you do? Show them the rules again??

    I wouldn't. I'd throw them out of my house, physically!

    There is another, and what I believe to be a very sound reason to block them. Yandex is located smack in the middle of one of the hottest hacking areas on earth. My own sites were hacked from Eastern Europe multiple times recently. Two were destroyed more than once, and one was hijacked with a phishing page for a BANK. It got my account suspended, knocking EVERY site I have off line.

    Yandex was crawling files no legitimate search engine should have any interest in whatsoever. You think I want those files and directories hanging out there where any hacker (even a kiddie) can find them?

    It's behavior like that that convinces me they can't be trusted.

    If I can't trust them, I block them. End of story. Web security isn't a game of "would you please play by my rules."

    It's a war. When somebody acts like the enemy, I'm going to treat them accordingly.

    It could very well be that asking them to remove your site from their lists, actually works. But I have no reason to believe it, and I already have a way to deal with them that works.

    Why change it?


    Yandex is OUT.

  12. #12
    Hello, riverstones! I'm from Yandex robot developers group. I would be very thankful if you give me more information about our bot's bad behaviour. Our bot downloads robots.txt every time before crawling. So in your case it can be some bug on our side. Please, tell me what is your host's name. Also some examples of repeatedly crawled 404's would be very helpful. Thanks!

    Best regards,
    Yaroslav

  13. #13
    Quote Originally Posted by yareg View Post
    Hello, riverstones! I'm from Yandex robot developers group. I would be very thankful if you give me more information about our bot's bad behaviour. Our bot downloads robots.txt every time before crawling. So in your case it can be some bug on our side. Please, tell me what is your host's name. Also some examples of repeatedly crawled 404's would be very helpful. Thanks!

    Best regards,
    Yaroslav
    With all due respect, it does not visit robots.txt at all when it comes to my sites.

    And no, I'm not getting into who my host is, or sharing any of my logs with anyone I don't already know from previous and lengthy interchanges.

    There are plenty of reports from webmasters all over Internet forums stating the same situation is happening with them.

    It's not my job to explain to Yandex how to do theirs.

    My job is to protect my sites front end in whatever manner I deem to be appropriate and reasonable.

    Sharing logs with someone, anyone based only on the claims you've made on this forum, would be foolish.

    As far as I am concerned that's just "Web Site Security 101."

    It amazes me how many web masters do post links to their problems, and log entries based only on a request from another user when they have no way of knowing who they really are. (Yes, I know the logs can be edited to hide the site name etc. but that takes time I simply do not have, and I have other matters that need attention.)

    For all I know, you could be the very person who hacked my sites to begin with. I'm not saying you are though, and the likelihood is very low.

    Blocking Yandex may seem to many to be a knee-jerk paranoid reaction.

    I've said it before, and I'll say it again, "It isn't paranoia if somebody really is out to get you!" I don't mean that Yandex is "out to get me." I mean there are actual site crackers that are out to get my sites. They have cracked them, and they are continuing to attempt to crack them.

    That environment is virtual warfare. I am, in effect, the guardian of the "castle walls" under siege.

    That said, there is no way on earth I'm about to share information with anyone outside those walls unless I have sought them out and contacted them myself (Security 101!)

    I appreciate your position, and I'm sure you can appreciate mine.

    Would you send YOUR logs to me if I asked for them? I think not.

    You already have pretty much all the information you need. Your robot is not visiting robots.txt every time it visits every sight. That problem has nothing to do with who my host is, or anyone's for that matter. It has to do with the robot's coding. And that, I can't help you with.

    I see that as a problem on my end, and have already done what I can do to prevent the problem. My problem is solved. Likely anyone who uses the above .htaccess code will solve their problems with it too.

    I can appreciate your not wanting that to happen on a widespread basis, but that isn't my problem. My problem is now gone, and that's pretty much all I care about right now.

    A blunt attitude it is, but the Internet is a dog-eat-dog world, and I'm not about to be lunch for anyone, even if that means my actions may, through the spread of others taking the same action, damages another company's reputation and/or means of doing business.

    I'm sure Yandex's proverbial heart would not skip one beat if my own enterprises collapsed. Neither would anyone else's for that matter.

    In short, while I see your point of view, and even empathize to some extent:

    "Sorry, but it's just business."

    When you swim with sharks, you have to defend and arm yourself, or be eaten. I will not be eaten.

    In business: "Trust NO ONE."

    I hope you can appreciate my position.

    Regards

    Riverstone

    "Trust has to be earned, and should come only after the passage of time."
    Arthur Ashe

  14. #14
    EDIT: Though I tried to state this in the above post, it appears so far down, it makes the earlier portion seem much stronger than I intended, so I am reiterating here in the hopes that the above post will [hopefully] not be taken the wrong way:

    The above post is in NO WAY intended to be an indictment of Yandex business practices. For all intents and purposes, Yandex appears to me to be a legitimate search engine, and probably a very well intentioned company. I have no reason to believe that Yandex is guilty of any crimes, shady practices or is untrustworthy.

    My comments above are ONLY related to my own personal experience, viewpoint, and based on what I was able to glean from many posts from many different web masters on various forums during about twelve hours of online research.

    I make NO CLAIM that what is true for me, is true for anyone else.

    The entire post is only in relation to me, my sites, and the Yandex robots that visit my sites. It is NOT about whether the things that have happened to me are happening to anyone else.

  15. #15
    Quote Originally Posted by yareg View Post
    Hello, riverstones! I'm from Yandex robot developers group. I would be very thankful if you give me more information about our bot's bad behaviour. Our bot downloads robots.txt every time before crawling. So in your case it can be some bug on our side. Please, tell me what is your host's name. Also some examples of repeatedly crawled 404's would be very helpful. Thanks!

    Best regards,
    Yaroslav
    Hello Yaroslav !
    I have a fang against Yandex too. While Google bot and Yahoo visited my page just few times in last days, Yandex was allways there ! Never rested.
    So, I decided to make a robots.txt file to block its access. I made sure that the file is in UNIX format and it starts with:
    Code:
    # Yandex bot
    User-agent: Yandex
    Disallow: /
    But the bot is still there doinig its job. I assume that your search engine is well intended and the bot is doing a legit job, but is INTRUSIVE ! I don't want it on my website ! Sorry, educate your bot, teach it to behave, then it will get our respect.
    Moreover, I tried to delete my url from yandex, as NQhost said, but it still comes.

    more info:
    spider58.yandex.ru
    IP address: 93.158.145.28
    User agent: Yandex/1.01.001 (compatible; Win16; I)
    Country: Russian Federation
    Region: Moscow City
    City: Moscow

    One more thing: it identifies as client, not as bot !! Google and Yahoo identified as bots !

    Please tell us how to block it !

    Thank you very much! In my name and in the name of all whom will benefit from your answer.
    Last edited by aripigus; 05-30-2010 at 07:18 AM.

  16. #16
    Hello, aripigus! Please, tell me what is your host, so I can search it through our logs to see what's wrong. Thanks!

  17. #17
    Join Date
    Jan 2006
    Location
    Guatemala
    Posts
    26
    To all that want to stop YANDEX, one of the best ways to block it is using MODSECURY, write your own rule and block yandex for ever.
    -Sergio
    www.HOSTnDOMAINS.com
    Domains, Appraisals, SSL Certificates and more.
    Join us, be a reseller today!

  18. #18
    Join Date
    May 2002
    Location
    Moscow
    Posts
    1,602
    Few important things:

    1) Never sent any information about your sites, logs etc. to unofficial contacts.
    Official contacts could be found there: http://company.yandex.com/general_in...nformation.xml Technical support unfortunately only on Russian yet but i sure you will be addressed to right person while you bring message to official contact email address.
    2) Some explanation about why you see Yandex robots in your logs. Yandex company just expand it's service to around the world and foreigner begin use it more actively and so this produce also more robots activity.
    3) They have problem with their robots early but this problem have been resolved year or so ago so i don't understand why it not work with TS.
    4) Full information about robots.txt and how Yandex robot handle could be found there http://help.yandex.ru/webmaster/?id=996567 (unfortunately page on Russian only)

    I already sent to them alert about this topic and i think someone from there come to here to explain what happen and why. But again - you don't need post here any logs and you need use only official contacts (note: @yandex.ru is not officil Yandex company mail domain. It is free email service domain and anyone could register email box there).
    Hope this will help.
    TK Rustelekom LLC Dedicated server since 2002, RIPE NCC member, LIR

  19. #19
    Quote Originally Posted by riverstones View Post
    AND THE WINNER IS!!!

    in .htaccess add the following lines:

    SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
    Order Deny,Allow
    Deny from env=bad_bot


    403 403 403 403 403 403 403....





    Life is gewd!!!!

    This has worked for me, for two days now I have not had this user-agent almost constantly visiting my log in page.

    Thanks.

  20. #20
    Hello everybody! I want to emphasize that we do not need any of your logs. We would be grateful if you send us only name of host, where our spider didn't respect robots.txt file. You may send it to support@search.yandex.com which is official support mail of Yandex search (check it at help.yandex.com/search/ ). We would appreciate any bug reports. We'll try to help you as fast as possible.
    Thanks!
    Akulov Yaroslav

  21. #21
    Join Date
    Oct 2001
    Posts
    1,319
    I think there's language confusion here - I belive Akulov is requesting the hostname of your website being indexed - yoursite.com - *not* the name of your web hosting provider.

    All the best,
    Avi B

  22. #22
    I dont bother reading all comments. But Yandex's services are used in alot of blackhat seo tools for spamming blogs, forums, etc etc. It could be that some guys have your website on their spamlist and try to post tons of comments on ur site..

  23. #23
    Pretty pissed off today as i am getting over 400 searches from yandex search for a keyword which doe NOT even exist in my website. I was successful to block Yandex's spider as this was consuming way too much of bandwidth and resources. A week later I am getting BOMBARDED with searches from yandex (pay back?)
    I also blocked visitors from Yandex search engine and when I test it i get:
    "Forbidden
    You don't have permission to access / on this server.
    Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request."
    I though that did the trick but still.
    I have added the following to my htaccess:
    SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
    Order Deny,Allow
    Deny from env=bad_bot
    RewriteEngine on
    # Options +FollowSymlinks
    RewriteCond %{HTTP_REFERER} yandex\.com [NC]
    RewriteRule .* - [F]
    I have a list of 300 IPs and some of them are:
    95.26.140.174
    95.26.201.135
    95.26.105.125
    95.27.253.69
    95.26.140.178
    95.27.182.190
    95.27.253.107
    95.24.141.83
    95.26.105.70
    95.26.112.123
    95.26.61.96
    95.26.246.37
    95.26.185.178
    95.27.182.216

    Any help will be appreciated to solve this issue.
    Thanks

  24. #24

    Block Yandex

    Hi I have a dynamic website php all though I have used ASP. I wrote a function that checks a list of bad robots and also a range of IP address. If there I dish up a 404.html for them.

    Desmond.

  25. #25
    Thanks riverstones! I have to say I am also p... off with yandex's visits to my site, only a week after it has been launched.

    I am using your .htaccess suggestions, hoppefully it will stop them visiting me every day (robots.txt didn't work). My private site is for french speakers only, I have no problem blocking yandex.

    If I may add this: I work in a content filterring team in a big company and I can assure you that our main daily fraud attacks come from Russia, China, Romania.

    Cheers

Page 1 of 2 12 LastLast

Similar Threads

  1. how to block user-agent / bot
    By expatCanuck in forum Hosting Security and Technology
    Replies: 6
    Last Post: 08-08-2008, 08:54 AM
  2. Should I block Yandex?
    By tnndotnet in forum Hosting Security and Technology
    Replies: 0
    Last Post: 03-13-2008, 12:41 AM
  3. Block a bot by Netmask (hmm, simple mistake?)
    By Rebies in forum Hosting Security and Technology
    Replies: 3
    Last Post: 01-09-2008, 12:13 PM
  4. To block or not to block mouse right bottom click?
    By Oleks in forum Web Design and Content
    Replies: 56
    Last Post: 02-23-2005, 12:53 PM
  5. Replies: 1
    Last Post: 05-19-2004, 07:42 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •