Results 1 to 7 of 7
  1. #1

    Question Removing a Google index of a geocities site

    I am trying to remove Google indexes on one of the directories on my geocities site using robots.txt. I am placing the robots.txt file in that directory. The file looks like this:

    User-agent: *
    Disallow: /

    When trying to perform the removal through www.google.com/remove.html, I am getting the following error message:

    The following rule applies to a URL that is outside the jurisdiction of this robots.txt file:
    DISALLOW /

    Anyone knows what I should specify in the Disallow part so that the removal can be performed?

  2. #2
    Join Date
    Aug 2000
    Location
    NYC
    Posts
    6,627
    You can't exlude from "/" because that's the server root directory, which I'd guess is www.geocities.com/. So, change it to whatever the full path to the directory in question is.

    You could also use meta tags in each file:

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    Specializing in SEO and PPC management.

  3. #3
    I just tried with

    Disallow: /www.geocities.com/username/folder

    but received the following error:

    The following rule applies to a URL that is outside the jurisdiction of this robots.txt file:
    DISALLOW /www.geocities.com/username/folder

    Doesn't work without the first '/' either.

    I have put the META NAME tag into the html files, the problem is I want to exclude the .doc files too. Is there a way to do this without the robots.txt file?

  4. #4
    Join Date
    Aug 2001
    Location
    St. Louis, MO
    Posts
    467
    Check out this link:

    Example on Geocities

  5. #5
    Join Date
    Nov 2001
    Location
    Ann Arbor, MI
    Posts
    2,978
    Geocities user directories are actually just subdirectories, and the robots.txt file is supposed to be in the root directory. You would probably need to be able to modify the one at http://www.geocities.com to have your requests acknowledged by the spyder.

    You'll probably need to use the meta tag method:

    http://www.google.com/remove.html#exclude_pages
    -Mark Adams
    www.bitserve.com - Secure Michigan web hosting for your business.
    Only host still offering a full money back uptime guarantee and prorated refunds.
    Offering advanced server management and security incident response!

  6. #6
    Join Date
    Aug 2000
    Location
    NYC
    Posts
    6,627
    Originally posted by bitserve
    Geocities user directories are actually just subdirectories, and the robots.txt file is supposed to be in the root directory.
    That's certainly true of robots.txt for any "normal" purpose, but Google will check for it in a subdirectory when you request removal of their database. As they explain in the removal page you linked to, in that case they'll only honor the exclusion for a temporary 90-day period; you need to use an "official" robots.txt in the root directory for a permanent exclusion.

    Disallow: /www.geocities.com/username/folder
    ivec, that's not exactly right, you don't want the geocities part to be there. As an example, if your url at geocities is: www.geocities.com/labyrinth/maze/6502 (just copied from that "example" page, you should try:

    Disallow: /labyrinth/maze/6502/folderyouwanttoexclude/

    If it's not a real emergency to get the stuff removed, though, as long as those .doc files aren't linked to from anywhere else on the web they'll be dropped from the index anyway if the only links to them are from the html files that you did exclude.

    You could also try moving or renaming them, in which case any existing links would no longer work. There'd still be the Google "cache" to worry about, though.
    Specializing in SEO and PPC management.

  7. #7
    Originally posted by JayC
    Disallow: /labyrinth/maze/6502/folderyouwanttoexclude/
    Thanks, this worked.

    You could also try moving or renaming them, in which case any existing links would no longer work. There'd still be the Google "cache" to worry about, though.
    I used Google's removal of dead links, which should eventually delete the cache too. I renamed the .doc files in order for this to work.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •