Results 1 to 7 of 7
  1. #1

    robots.txt >> Disallow: folder/* ??

    I am building a new robots.txt and seeing as google doesn't like repetitive content, I am wondering how I can disallow my member's replicated sites. These sites are created by a script so there is no actual "member" folder though if you type a member URL it will show and it will track the sale. It is all virtual aliases.

    My question is.. Is there a line of code in robots.txt that will disallow a folder and everything beneath it? My example that I am thinking of would be the star command /folder/*

    Star in shell means everything going downwards. I know that spiders care about this because my google results are going down in a big way for my replicated sites. I need to disallow them before my main site gets taken off.

    Any input is most appreciated!

  2. #2
    If you put:

    User-agent: *

    Disallow:


    that site won't be crawled by Google and other nice bots
    Around two decades of web marketing experience & millions of visitors.
    NunoAlex.com explains my expertise & how I can help you.
    Contact me!

  3. #3
    Hmm let me get this right.. Here is what my robots.txt looks like. Am I correct that spiders will crawl the entire site except the specified folders? Or is it as soon as you use User-agent: * and Disallow: it blocks the site alltogether? Thanks for the input the last thing anyone wants is to kick out the spiders from the main site lol
    User-agent: *
    Disallow: /css
    Disallow: /videos
    Disallow: /templates
    Disallow: /templates/sites
    Disallow: /members

  4. #4
    In my last post, that code will block the whole site. You should only use it if you distribute a zip with the sites your members will use. Never put it in your main site!

    To exclude just some directories you should use:

    User-agent: *
    Disallow: /dir1/
    Disallow: /dir2/
    Disallow: /dir3/
    Full explanation here:

    http://www.robotstxt.org/wc/exclusion-admin.html


    Good luck!
    Around two decades of web marketing experience & millions of visitors.
    NunoAlex.com explains my expertise & how I can help you.
    Contact me!

  5. #5
    Cool beans, what you are saying means that my robots.txt is fine. However, the question remains, can you disallow a directory and everything beneath it? There are 2 possibilities:

    1. Disallow: /dir1/ blocks that folder and everything beneath it.

    2. Disallow: /dir1/ blocks dir1 only and everything beneath it will be crawled.

    That is the question. When thinking in a logical way, one would think that if you block a folder, bots can't go down in it and explore its sub-folders. Then again, bots aren't human and that's why I am wondering.. My sub folders in the /members folder are dynamic and I can't disallow them all one by one. I need a wild card if possibility 2 is in effect with search engines.
    Last edited by Online_Currency; 12-23-2006 at 12:03 AM.

  6. #6
    It should block anything beneath it. I never read something stating the opposite, but if you want to be sure, a search for robots.txt will be the best choice.
    Around two decades of web marketing experience & millions of visitors.
    NunoAlex.com explains my expertise & how I can help you.
    Contact me!

  7. #7
    Join Date
    Dec 2006
    Posts
    76
    I think that it prevents crawling of that directory and subdirectories, but if there is a direct link to a file from your site or another site, I don't think that it will prevent that file from being crawled, even if that file is in that directory.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •