Web Hosting Talk







View Full Version : Why is there robots.txt file in /usr/admserv/html


yellowdefend
11-22-2002, 12:26 PM
This is what is there:

# Prevent all robots from visiting this site:

User-agent: *
Disallow: /


Looks to me that it will not let any robots index the whole server.
Will this be only for this directory or will this override the robots.txt file in my /home/sites/site1 directory that has:

User-agent: *
Disallow: /certs/
Disallow: /users/
Disallow: /logs/

I want to have the spiders index all the domains only.

Alan:confused:

CobaltCuban
11-22-2002, 12:56 PM
admserv is a different daemon than the normal http serving daemon.

ps ax|grep http
you'll see httpd.admserv this server is in charge of handling all requests for managing sites (/siteadmin, /admin)

It's okay to tell robots not to spider, nor to try to, these directories.

BruceT
11-22-2002, 09:27 PM
/usr/admserv is the root directory for all the pieces that make up the administrative UI. The contents have nothing to do with the virtual sites you're hosting.

With the robots exclusion, even if a spider stumbles on your admin stuff (running on port 81 via a separate httpd daemon), none of your admin pages will get cached. No passwords would be visible via that method, but user names, DNS info, etc could be made available and you don't really need that...

So don't worry about the file. Just make sure there isn't a robots.txt file in /home/sites/home/web, /home/sites/sitex/web, etc, and all your "user" sites will be spider-able.

yellowdefend
11-23-2002, 11:50 AM
You said:

"So don't worry about the file. Just make sure there isn't a robots.txt file in /home/sites/home/web, /home/sites/sitex/web, etc, and all your "user" sites will be spider-able."

So do not put any robot.txt under that folder because the user, logs, certs, will be indexed.

So I presume the robot.txt file should always go under the /web folder which is the same place the index.htm file resides.

Thanks
Alan;)

BruceT
11-23-2002, 01:21 PM
So do not put any robot.txt under that folder because the user, logs, certs, will be indexed. ...the robot.txt file should always go under the /web folder which is the same place the index.htm file resides.

Virtual site /user, /certs, and /logs directories are not web-accessible, so you don't need to worry about a spider finding them.

Now, if you have content in/under the /web directory, you might want a robots.txt to exclude that content, etc. So I guess I mis-spoke when I said not to put one in the /web directory at all.

Use one if you need to; if you don't use one, the entire /web directory will be spiderable with no action required from you.