Is anyone familiar with this trick? Details,
searchenginegenie dot com/seo-blog/2005/12/webmasterworld-cloaks-robotstxt-file dot html
I'm not clear how to implement.
Basically it looks like a robots whitelist technique. Before, they were using robots.txt as a blacklist and spending one day a week trying to keep up with the bad guys proliferating. Now they just whitelist a few big engines and reject the rest. But I'm not clear how.
He's a webmaster's webmaster. He runs a webmaster's forum. He used to *write a blog* inside his robots.txt comments. Give me a break? I'm not going to second guess. He's been there and done that.
No objection holds water. Remember, robots.txt is *already* meant to enable/disable robots selectively. This 'trick' doesn't change the design, its ethics, or its syntax.
If a webmaster always returns valid robots.txt format, there is no ethics violation. He applies different logic to enable/disable, that's all.
The big boys themselves admit robots.txt is not sufficient! Google and Yahoo have custom robots.txt directives. So *they* violate the rules. It gets a little worse, too. Google *has* been caught not obeying robots.txt.
And if you think a blacklist is simpler than a whitelist, I invite you to write rules on the 300,000 user agents listed at botsvsbrowsers dot com .
You just add TXT as an executable extension in your http.conf file. Then just set the file up as a script. It gets exectued.
The rest of the story:
We ran that robots.txt script for about 6 years. At this point, we have given up on both black lists and white lists. The really bad bots, ignore robots entirely. The good bots have so extended the standard that it is no longer anything like the original. The robots.txt system is a complete joke. (note that the robots.txt "recommendation" was never approved by any web standard organization on the planet - even the search engines don't agree on all syntax and proprietary extensions).
This is not cloaking. Cloaking is showing visitors one thing and the search engines another. Robots.txt is not intended for humans - it is intended for bots. If you show the code that produces the page - is it really cloaking? No on both accounts.
The only really protection you can offer your site is to take advantage the so called 'first click free' program. Run a ip tracker and block those that abuse your site after X number of page views.