I am building a new robots.txt and seeing as google doesn't like repetitive content, I am wondering how I can disallow my member's replicated sites. These sites are created by a script so there is no actual "member" folder though if you type a member URL it will show and it will track the sale. It is all virtual aliases.
My question is.. Is there a line of code in robots.txt that will disallow a folder and everything beneath it? My example that I am thinking of would be the star command /folder/*
Star in shell means everything going downwards. I know that spiders care about this because my google results are going down in a big way for my replicated sites. I need to disallow them before my main site gets taken off.
Hmm let me get this right.. Here is what my robots.txt looks like. Am I correct that spiders will crawl the entire site except the specified folders? Or is it as soon as you use User-agent: * and Disallow: it blocks the site alltogether? Thanks for the input the last thing anyone wants is to kick out the spiders from the main site lol User-agent: * Disallow: /css Disallow: /videos Disallow: /templates Disallow: /templates/sites Disallow: /members
Cool beans, what you are saying means that my robots.txt is fine. However, the question remains, can you disallow a directory and everything beneath it? There are 2 possibilities:
1. Disallow: /dir1/ blocks that folder and everything beneath it.
2. Disallow: /dir1/ blocks dir1 only and everything beneath it will be crawled.
That is the question. When thinking in a logical way, one would think that if you block a folder, bots can't go down in it and explore its sub-folders. Then again, bots aren't human and that's why I am wondering.. My sub folders in the /members folder are dynamic and I can't disallow them all one by one. I need a wild card if possibility 2 is in effect with search engines.
Last edited by Online_Currency; 12-23-2006 at 12:03 AM.
I think that it prevents crawling of that directory and subdirectories, but if there is a direct link to a file from your site or another site, I don't think that it will prevent that file from being crawled, even if that file is in that directory.