I am trying to remove Google indexes on one of the directories on my geocities site using robots.txt. I am placing the robots.txt file in that directory. The file looks like this:
User-agent: *
Disallow: /
When trying to perform the removal through www.google.com/remove.html, I am getting the following error message:
The following rule applies to a URL that is outside the jurisdiction of this robots.txt file:
DISALLOW /
Anyone knows what I should specify in the Disallow part so that the removal can be performed?
You can't exlude from "/" because that's the server root directory, which I'd guess is www.geocities.com/. So, change it to whatever the full path to the directory in question is.
You could also use meta tags in each file:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
I just tried with
Disallow: /www.geocities.com/username/folder
but received the following error:
The following rule applies to a URL that is outside the jurisdiction of this robots.txt file:
DISALLOW /www.geocities.com/username/folder
Doesn't work without the first '/' either.
I have put the META NAME tag into the html files, the problem is I want to exclude the .doc files too. Is there a way to do this without the robots.txt file?
pgrote
08-27-2002, 07:07 PM
Check out this link:
Example on Geocities (http://thesapphirecat.iwarp.com/weapons/meta.html)
bitserve
08-27-2002, 10:18 PM
Geocities user directories are actually just subdirectories, and the robots.txt file is supposed to be in the root directory. You would probably need to be able to modify the one at http://www.geocities.com to have your requests acknowledged by the spyder.
You'll probably need to use the meta tag method:
http://www.google.com/remove.html#exclude_pages
Originally posted by bitserve
Geocities user directories are actually just subdirectories, and the robots.txt file is supposed to be in the root directory. That's certainly true of robots.txt for any "normal" purpose, but Google will check for it in a subdirectory when you request removal of their database. As they explain in the removal page you linked to, in that case they'll only honor the exclusion for a temporary 90-day period; you need to use an "official" robots.txt in the root directory for a permanent exclusion.
Disallow: /www.geocities.com/username/folder ivec, that's not exactly right, you don't want the geocities part to be there. As an example, if your url at geocities is: www.geocities.com/labyrinth/maze/6502 (just copied from that "example" page, you should try:
Disallow: /labyrinth/maze/6502/folderyouwanttoexclude/
If it's not a real emergency to get the stuff removed, though, as long as those .doc files aren't linked to from anywhere else on the web they'll be dropped from the index anyway if the only links to them are from the html files that you did exclude.
You could also try moving or renaming them, in which case any existing links would no longer work. There'd still be the Google "cache" to worry about, though.
Originally posted by JayC
Disallow: /labyrinth/maze/6502/folderyouwanttoexclude/
Thanks, this worked.
You could also try moving or renaming them, in which case any existing links would no longer work. There'd still be the Google "cache" to worry about, though.
I used Google's removal of dead links, which should eventually delete the cache too. I renamed the .doc files in order for this to work.