Results 1 to 25 of 47
Thread: how to block Yandex bot ??
-
02-06-2010, 02:33 PM #1WHT Addict
- Join Date
- Oct 2007
- Location
- Greater DC
- Posts
- 152
how to block Yandex bot ??
Greetings -
I'd like to block the Yandex bot. It eats bandwidth unduly,
and I doubt that folks in Russia want to listen to my songs.
[Admittedly, neither do most folk. ]
Here's a sample entry from my visitors log:Host: 87.250.255.243and here are the relevant lines from my .htaccess
* /irish.htm
Http Code: 304 Date: Feb 03 23:39:56 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: Yandex/1.01.001 (compatible; Win16; I)#banish undesired botsDoesn't seem to help.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BPImageWalker [NC]
RewriteRule ^(.*)$ hxxp://sod.off/
RewriteCond %{HTTP_USER_AGENT} ^Purebot [NC]
RewriteRule ^(.*)$ hxxp://sod.off/
RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC]
RewriteRule ^(.*)$ hxxp://sod.off/
Suggestions welcome.
Thanks.
- Richard
-
02-06-2010, 02:39 PM #2Hosting provider
- Join Date
- May 2002
- Location
- Moscow
- Posts
- 1,602
You could use:
Deny from 87.250.224.0/19
Deny from 87.250.252.0/22
Deny from 87.250.255.0/24
in .htaccess file. or you could use just robots.txt and block Yandex robotTK Rustelekom LLC Dedicated server since 2002, RIPE NCC member, LIR
-
02-06-2010, 02:52 PM #3Web Hosting Master
- Join Date
- Jul 2009
- Posts
- 1,568
robots.txt is an ideal way to block the Yandex robot. Create a file robots.txt in your account and insert the following lines:
User-agent: Yandex
Disallow: /| LinuxHostingSupport.net
| Server Setup | Security | Optimization | Troubleshooting | Server Migration
| Monthly and Task basis services.
| MSN : madaboutlinux[at]hotmail.com | Skype : madaboutlinux
-
04-01-2010, 04:16 PM #4New Member
- Join Date
- Feb 2006
- Posts
- 3
I found this post when searching for Yandex.ru
I just got a call from my hoster because my bandwith usage is insane. We figured out that it's multiple IP's from yandex.ru.
I take it, most of you on here have blocked this site, and it's not really necessary?
-
05-24-2010, 11:06 PM #5Newbie
- Join Date
- May 2010
- Posts
- 10
First post here. I came for the same reason as the above poster, looking for a way to block Yandex.
First of all, and this is from experience, the above quoted solution only works with robots that actually pay attention to robots.txt.
But it's like saying "Pretty please, don't do what's on this list."
Yandex DOES NOT CARE anything about robots.txt, and doesn't even visit it when they come skulking around your site. Check your own logs to verify this. I have, and it hasn't even LOOKED at robots.txt! That said, that to me, makes Yandex a "Bad Robot!"
The ONLY way to keep them out is through .htaccess, and that's what I'm looking for too.
IP blocking won't work either, or it might, but only for a little while, until they start using more IP's, and changing the existing ones.
Why do I want to block them? On principle. Too much Internet crime originates in Russia and several other Easter European countries. Call my attitude what you will, it's not prejudice. It's reality.
For a site that gets as little traffic as mine does, Yandex visits far too often, and for far too long, driving my bandwidth up noticeably! How much worse is it going to get when I actually start getting traffic?
I'm not waiting around to find out. I'm bound and determined to slam the door on them now, before it gets worse, and before thousands more hackers start collecting info from them on my sites, getting them shut down yet again.
-
05-25-2010, 12:30 AM #6Newbie
- Join Date
- May 2010
- Posts
- 10
AND THE WINNER IS!!!
in .htaccess add the following lines:
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
Order Deny,Allow
Deny from env=bad_bot
403 403 403 403 403 403 403....
Life is gewd!!!!
-
05-25-2010, 08:03 AM #7Web Hosting Guru
- Join Date
- May 2010
- Location
- Praha, CZ
- Posts
- 270
Use robots.txt
User-agent: Yandex
Disallow: /» NQhost.com - Flexible Solutions
» Linux VPS, Windows VPS and Xen VDS
» USA, Germany and Russia Data-centers
-
05-25-2010, 08:04 AM #8Web Hosting Guru
- Join Date
- May 2010
- Location
- Praha, CZ
- Posts
- 270
The correct way is to use robots.txt.
Yandex bot will not look into your site and a lot of "403" errors is not very nice way to block an access.» NQhost.com - Flexible Solutions
» Linux VPS, Windows VPS and Xen VDS
» USA, Germany and Russia Data-centers
-
05-25-2010, 12:54 PM #9Newbie
- Join Date
- May 2010
- Posts
- 10
Are you kidding me?!
Look, I USED robots.txt with the exact lines you suggested. For a WEEK.
Yandex INGNORED it. Your d--n right 403 isn't "nice!"
When robots do not adhere to the rules in robots.txt, as far as I am concerned, they are not playing fair, or nice.
Yandex has spiked my bandwidth with a ridiculous number of visits (far, FAR more than Google bots EVER make).
Before I tried blocking them in robots.txt, they were ignoring all the rules there, exposing much of my content to security risks.
Not nice? You bet it's not nice! It's my bandwidth. I pay for it! They are MY files, and no one will access them if I don't want them to.
Not nice? I tried "nice." It didn't work. No more Mr. Nice Guy!
BTW: The "correct" way to do things is by doing what works! Not by following rules when they aren't working.
I do use robots.txt
When a robot breaks the rules in there, they are banned. Period!
-
05-25-2010, 01:02 PM #10Web Hosting Guru
- Join Date
- May 2010
- Location
- Praha, CZ
- Posts
- 270
Also, try to delete your url from yandex crawl:
http://webmaster.yandex.ru/delurl.xml» NQhost.com - Flexible Solutions
» Linux VPS, Windows VPS and Xen VDS
» USA, Germany and Russia Data-centers
-
05-25-2010, 01:34 PM #11Newbie
- Join Date
- May 2010
- Posts
- 10
Now, that is a method I haven't yet seen posted anywhere.
Others can try it if they wish.
I personally wouldn't trust it. If they didn't bother to adhere to robots.txt, what basis do I have to go on to believe that using a list on their site would work either?
I'm not saying it doesn't. Maybe it does. But I am saying I have no reason to believe it will, and plenty of reason not to trust Yandex based on log entries.
Every search engine, with the exception of Yandex, visits robots.txt FIRST, then moves on, adhering to the rules.
When a robot, any robot, doesn't even bother to look at the rules, that's a BAD robot.
Yandex is not just visiting way too many times per day, they are crawling files that haven't even been on the site since BEFORE they began crawling it. A HUGE number of their "hits" were 404 before I blocked them.
Why? Why, if the robot has already determined multiple times that a file is not there, does it keep looking for it over and over again? Google, MSN, Yahoo, ASK, and all the other search engines don't do that. They look for them a few times, over the course of a week or so, presumably to see if they reappear, then quit looking.
Yandex has been hitting 404's for WEEKS, multiple times a day. Most of the time, that's a sign of a hacker. I am NOT saying Yandex is trying to hack my sights. I'm sure they aren't, but that isn't the point.
The point is, their crawler is woefully inefficient, displays no respect for my rules, and looks for all intents and purposes like it's acting like a hacker.
That is exactly why there are so many forum posts all over the web from webmasters trying to figure out what it is (it doesn't occur to most that it could be a legitimate search engine, because it doesn't ACT like a legitimate search engine). It is exactly why so many are trying to learn how to block them.
If you were throwing a party in your home, and told everyone who came, "The patio, living room, dining room and bathrooms are all open to you all, but please don't go in the bedroom, and you found one of your guests in the bedroom, going through your closets, what would you do? Show them the rules again??
I wouldn't. I'd throw them out of my house, physically!
There is another, and what I believe to be a very sound reason to block them. Yandex is located smack in the middle of one of the hottest hacking areas on earth. My own sites were hacked from Eastern Europe multiple times recently. Two were destroyed more than once, and one was hijacked with a phishing page for a BANK. It got my account suspended, knocking EVERY site I have off line.
Yandex was crawling files no legitimate search engine should have any interest in whatsoever. You think I want those files and directories hanging out there where any hacker (even a kiddie) can find them?
It's behavior like that that convinces me they can't be trusted.
If I can't trust them, I block them. End of story. Web security isn't a game of "would you please play by my rules."
It's a war. When somebody acts like the enemy, I'm going to treat them accordingly.
It could very well be that asking them to remove your site from their lists, actually works. But I have no reason to believe it, and I already have a way to deal with them that works.
Why change it?
Yandex is OUT.
-
05-26-2010, 04:28 PM #12New Member
- Join Date
- May 2010
- Posts
- 3
Hello, riverstones! I'm from Yandex robot developers group. I would be very thankful if you give me more information about our bot's bad behaviour. Our bot downloads robots.txt every time before crawling. So in your case it can be some bug on our side. Please, tell me what is your host's name. Also some examples of repeatedly crawled 404's would be very helpful. Thanks!
Best regards,
Yaroslav
-
05-26-2010, 05:29 PM #13Newbie
- Join Date
- May 2010
- Posts
- 10
With all due respect, it does not visit robots.txt at all when it comes to my sites.
And no, I'm not getting into who my host is, or sharing any of my logs with anyone I don't already know from previous and lengthy interchanges.
There are plenty of reports from webmasters all over Internet forums stating the same situation is happening with them.
It's not my job to explain to Yandex how to do theirs.
My job is to protect my sites front end in whatever manner I deem to be appropriate and reasonable.
Sharing logs with someone, anyone based only on the claims you've made on this forum, would be foolish.
As far as I am concerned that's just "Web Site Security 101."
It amazes me how many web masters do post links to their problems, and log entries based only on a request from another user when they have no way of knowing who they really are. (Yes, I know the logs can be edited to hide the site name etc. but that takes time I simply do not have, and I have other matters that need attention.)
For all I know, you could be the very person who hacked my sites to begin with. I'm not saying you are though, and the likelihood is very low.
Blocking Yandex may seem to many to be a knee-jerk paranoid reaction.
I've said it before, and I'll say it again, "It isn't paranoia if somebody really is out to get you!" I don't mean that Yandex is "out to get me." I mean there are actual site crackers that are out to get my sites. They have cracked them, and they are continuing to attempt to crack them.
That environment is virtual warfare. I am, in effect, the guardian of the "castle walls" under siege.
That said, there is no way on earth I'm about to share information with anyone outside those walls unless I have sought them out and contacted them myself (Security 101!)
I appreciate your position, and I'm sure you can appreciate mine.
Would you send YOUR logs to me if I asked for them? I think not.
You already have pretty much all the information you need. Your robot is not visiting robots.txt every time it visits every sight. That problem has nothing to do with who my host is, or anyone's for that matter. It has to do with the robot's coding. And that, I can't help you with.
I see that as a problem on my end, and have already done what I can do to prevent the problem. My problem is solved. Likely anyone who uses the above .htaccess code will solve their problems with it too.
I can appreciate your not wanting that to happen on a widespread basis, but that isn't my problem. My problem is now gone, and that's pretty much all I care about right now.
A blunt attitude it is, but the Internet is a dog-eat-dog world, and I'm not about to be lunch for anyone, even if that means my actions may, through the spread of others taking the same action, damages another company's reputation and/or means of doing business.
I'm sure Yandex's proverbial heart would not skip one beat if my own enterprises collapsed. Neither would anyone else's for that matter.
In short, while I see your point of view, and even empathize to some extent:
"Sorry, but it's just business."
When you swim with sharks, you have to defend and arm yourself, or be eaten. I will not be eaten.
In business: "Trust NO ONE."
I hope you can appreciate my position.
Regards
Riverstone
"Trust has to be earned, and should come only after the passage of time."
Arthur Ashe
-
05-26-2010, 05:53 PM #14Newbie
- Join Date
- May 2010
- Posts
- 10
EDIT: Though I tried to state this in the above post, it appears so far down, it makes the earlier portion seem much stronger than I intended, so I am reiterating here in the hopes that the above post will [hopefully] not be taken the wrong way:
The above post is in NO WAY intended to be an indictment of Yandex business practices. For all intents and purposes, Yandex appears to me to be a legitimate search engine, and probably a very well intentioned company. I have no reason to believe that Yandex is guilty of any crimes, shady practices or is untrustworthy.
My comments above are ONLY related to my own personal experience, viewpoint, and based on what I was able to glean from many posts from many different web masters on various forums during about twelve hours of online research.
I make NO CLAIM that what is true for me, is true for anyone else.
The entire post is only in relation to me, my sites, and the Yandex robots that visit my sites. It is NOT about whether the things that have happened to me are happening to anyone else.
-
05-30-2010, 07:08 AM #15New Member
- Join Date
- May 2010
- Posts
- 1
Hello Yaroslav !
I have a fang against Yandex too. While Google bot and Yahoo visited my page just few times in last days, Yandex was allways there ! Never rested.
So, I decided to make a robots.txt file to block its access. I made sure that the file is in UNIX format and it starts with:
Code:# Yandex bot User-agent: Yandex Disallow: /
Moreover, I tried to delete my url from yandex, as NQhost said, but it still comes.
more info:
spider58.yandex.ru
IP address: 93.158.145.28
User agent: Yandex/1.01.001 (compatible; Win16; I)
Country: Russian Federation
Region: Moscow City
City: Moscow
One more thing: it identifies as client, not as bot !! Google and Yahoo identified as bots !
Please tell us how to block it !
Thank you very much! In my name and in the name of all whom will benefit from your answer.Last edited by aripigus; 05-30-2010 at 07:18 AM.
-
06-08-2010, 10:17 AM #16New Member
- Join Date
- May 2010
- Posts
- 3
Hello, aripigus! Please, tell me what is your host, so I can search it through our logs to see what's wrong. Thanks!
-
06-08-2010, 03:42 PM #17Newbie
- Join Date
- Jan 2006
- Location
- Guatemala
- Posts
- 26
To all that want to stop YANDEX, one of the best ways to block it is using MODSECURY, write your own rule and block yandex for ever.
-Sergio
www.HOSTnDOMAINS.com
Domains, Appraisals, SSL Certificates and more.
Join us, be a reseller today!
-
06-08-2010, 07:09 PM #18Hosting provider
- Join Date
- May 2002
- Location
- Moscow
- Posts
- 1,602
Few important things:
1) Never sent any information about your sites, logs etc. to unofficial contacts.
Official contacts could be found there: http://company.yandex.com/general_in...nformation.xml Technical support unfortunately only on Russian yet but i sure you will be addressed to right person while you bring message to official contact email address.
2) Some explanation about why you see Yandex robots in your logs. Yandex company just expand it's service to around the world and foreigner begin use it more actively and so this produce also more robots activity.
3) They have problem with their robots early but this problem have been resolved year or so ago so i don't understand why it not work with TS.
4) Full information about robots.txt and how Yandex robot handle could be found there http://help.yandex.ru/webmaster/?id=996567 (unfortunately page on Russian only)
I already sent to them alert about this topic and i think someone from there come to here to explain what happen and why. But again - you don't need post here any logs and you need use only official contacts (note: @yandex.ru is not officil Yandex company mail domain. It is free email service domain and anyone could register email box there).
Hope this will help.TK Rustelekom LLC Dedicated server since 2002, RIPE NCC member, LIR
-
06-13-2010, 11:43 AM #19New Member
- Join Date
- Jun 2010
- Posts
- 1
-
06-16-2010, 06:21 AM #20New Member
- Join Date
- May 2010
- Posts
- 3
Hello everybody! I want to emphasize that we do not need any of your logs. We would be grateful if you send us only name of host, where our spider didn't respect robots.txt file. You may send it to support@search.yandex.com which is official support mail of Yandex search (check it at help.yandex.com/search/ ). We would appreciate any bug reports. We'll try to help you as fast as possible.
Thanks!
Akulov Yaroslav
-
06-17-2010, 08:04 PM #21Web Hosting Master
- Join Date
- Oct 2001
- Posts
- 1,319
I think there's language confusion here - I belive Akulov is requesting the hostname of your website being indexed - yoursite.com - *not* the name of your web hosting provider.
All the best,Avi B
-
06-18-2010, 01:10 PM #22Newbie
- Join Date
- Apr 2010
- Posts
- 5
I dont bother reading all comments. But Yandex's services are used in alot of blackhat seo tools for spamming blogs, forums, etc etc. It could be that some guys have your website on their spamlist and try to post tons of comments on ur site..
-
07-17-2010, 10:11 PM #23New Member
- Join Date
- Jul 2010
- Posts
- 1
Pretty pissed off today as i am getting over 400 searches from yandex search for a keyword which doe NOT even exist in my website. I was successful to block Yandex's spider as this was consuming way too much of bandwidth and resources. A week later I am getting BOMBARDED with searches from yandex (pay back?)
I also blocked visitors from Yandex search engine and when I test it i get:
"Forbidden
You don't have permission to access / on this server.
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request."
I though that did the trick but still.
I have added the following to my htaccess:
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
Order Deny,Allow
Deny from env=bad_bot
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} yandex\.com [NC]
RewriteRule .* - [F]
I have a list of 300 IPs and some of them are:
95.26.140.174
95.26.201.135
95.26.105.125
95.27.253.69
95.26.140.178
95.27.182.190
95.27.253.107
95.24.141.83
95.26.105.70
95.26.112.123
95.26.61.96
95.26.246.37
95.26.185.178
95.27.182.216
Any help will be appreciated to solve this issue.
Thanks
-
07-19-2010, 05:35 AM #24New Member
- Join Date
- Jun 2009
- Posts
- 1
Block Yandex
Hi I have a dynamic website php all though I have used ASP. I wrote a function that checks a list of bad robots and also a range of IP address. If there I dish up a 404.html for them.
Desmond.
-
08-20-2010, 12:20 PM #25New Member
- Join Date
- Aug 2010
- Posts
- 1
Thanks riverstones! I have to say I am also p... off with yandex's visits to my site, only a week after it has been launched.
I am using your .htaccess suggestions, hoppefully it will stop them visiting me every day (robots.txt didn't work). My private site is for french speakers only, I have no problem blocking yandex.
If I may add this: I work in a content filterring team in a big company and I can assure you that our main daily fraud attacks come from Russia, China, Romania.
Cheers
Similar Threads
-
how to block user-agent / bot
By expatCanuck in forum Hosting Security and TechnologyReplies: 6Last Post: 08-08-2008, 08:54 AM -
Should I block Yandex?
By tnndotnet in forum Hosting Security and TechnologyReplies: 0Last Post: 03-13-2008, 12:41 AM -
Block a bot by Netmask (hmm, simple mistake?)
By Rebies in forum Hosting Security and TechnologyReplies: 3Last Post: 01-09-2008, 12:13 PM -
To block or not to block mouse right bottom click?
By Oleks in forum Web Design and ContentReplies: 56Last Post: 02-23-2005, 12:53 PM -
Forum’s private messages? To block or not to block? This is a question!
By Oleks in forum Web Design and ContentReplies: 1Last Post: 05-19-2004, 07:42 AM