Web Hosting Talk







View Full Version : Do search engines really follow the permissions set in robots.txt?


cftranslate
07-07-2005, 11:38 AM
If I set up a directory with PDF files that I don't want indexed by search engines. Will their robots follow the disallow directives for the folders I put in robots.txt??

jdmath
07-07-2005, 11:53 AM
The nice ones will (like google, msn,etc..) but they are a lot of others that will not.

dynamicnet
07-07-2005, 12:01 PM
Greetings:

The following of robots.txt is optional.

And the pure ASCII text file can be used to determine what directories may contain sensitive data by hackers.

Thank you.

BigBison
07-08-2005, 02:14 AM
Originally posted by dynamicnet
The following of robots.txt is optional.

And the pure ASCII text file can be used to determine what directories may contain sensitive data by hackers.

It can be made mandatory. Simply determine which spiders use robots.txt to determine where to go instead of where not to go or otherwise misbehave, and ban them. For further information, check these links:

http://www.ikt-ret.dk/projects/werd.shtml
http://www.kloth.net/internet/badbots.php
http://www.searchtools.com/robots/

ldcdc
07-08-2005, 02:58 AM
Thread moved to Web Design and Content.

Marble
07-08-2005, 03:50 AM
Originally posted by dynamicnet
Greetings:

The following of robots.txt is optional.

And the pure ASCII text file can be used to determine what directories may contain sensitive data by hackers.

Thank you.

Good thing to point out. You can also put sensitive files / directories above "public_html" and not have to worry about them.