Web Hosting Talk







View Full Version : Warning about robots.txt file


scslawin
05-01-2001, 01:58 PM
I happened on a thread in a search engine forum which said that some search engine spiders will not index a web site if it doesn't find a "robots.txt" file. I didn't have one on my site, so I decided to add one.

While I was adding the robots.txt file to the root of my domain, I included some "Disallow" entries to prevent indexing of some non-public subdirectories. These directories most likely would not have been indexed anyway, because I don't link to them from anywhere on the site.

HOWEVER, here is when I realized that putting these disallow entries in your robots.txt file actually creates security vulnerabilities for your site. Anyone with a browser can enter:

http://www.yoursite.com/robots.txt

and view your entries in the robots.txt file, INCLUDING a list of the directories you have marked as disallowed. You may not link to those directories, but someone reading the robots.txt file could type them into theor browser and get access to your hidden areas.

For instance, if your robots.txt file included:

Disallow: /logs/
Disallow: /members

snoopers could enter:
http://www.yoursite.com/logs/
or http://www.yoursite.com/members/

to potentially view your server logs or your members-only areas.

So how do you protect yourself? Simple. Make sure every one of your disallowed directories includes a default server file. On most servers those are: index.html, default.html, index.htm or default.htm. You should know which it is for your server, it's the file that loads when someone enters your root URL only:

http://www.yoursite.com

Your default files in the disallowed directories can be a redirect to the front of your site, a blank file, or a warning -- it's up to you.

Just make sure you protect those directories!

thewitt
05-01-2001, 04:04 PM
If you have an area on your website that you need to have secured, be sure to at least secure it with BASIC authentication techniques.

More than one person has been caught with the /secret/ directory on his site being exposed.

Just "not linking" to the directory is not enough to protect your directories from discovery.

-t

(SH)Saeed
05-01-2001, 04:52 PM
Recently I've got a lot of request for the file robot.txt. Does this mean search engines are scanning my website or what?
:confused:

Btw, if you want to protect some of your directories, why don't you password protect them?

SI-Chris
05-02-2001, 02:20 AM
Originally posted by scslawin
...
HOWEVER, here is when I realized that putting these disallow entries in your robots.txt file actually creates security vulnerabilities for your site. Anyone with a browser can enter:

http://www.yoursite.com/robots.txt

and view your entries in the robots.txt file, INCLUDING a list of the directories you have marked as disallowed. You may not link to those directories, but someone reading the robots.txt file could type them into theor browser and get access to your hidden areas.
...
Someone correct me if I'm wrong, but if you don't have links or references to those "hidden" folders in your HTML pages, the spiders aren't going to find those directories in the first place. So why would you need to add them to robots.txt?

fatman
05-02-2001, 10:49 PM
Originally posted by zolbian
Recently I've got a lot of request for the file robot.txt. Does this mean search engines are scanning my website or what?
:confused:


Search engines requesting the robots.txt file is not a cause for alarm. They do it all the time, with many spiders requesting it numerous times per day. Anyway, good spiders are supposed to request the robots.txt file so that they know which areas (or files) they are not allowed to access.

fatman
05-02-2001, 10:54 PM
Originally posted by thewitt
If you have an area on your website that you need to have secured, be sure to at least secure it with BASIC authentication techniques.

More than one person has been caught with the /secret/ directory on his site being exposed.

Just "not linking" to the directory is not enough to protect your directories from discovery.

-t

True. Another way (perhaps even safer than password protecting the area) is to keep the secret directory out of the web directory tree (ie, put it in a directory above or at the same level as your htdocs tree). :)

archangel777
05-09-2001, 03:51 AM
Darn... I better take the path to my naked picture off of the robots.txt file. No wonder I was getting strange love letters from people.