ap0llo_*
02-23-2009, 05:30 PM
Hi there,
I'm trying to redirect some of my dynamic URL's to their rewritten simpler form. I'm doing this to avoid Google penalizing me for duplicate content.
Here's what I mean:
index.php?task=category&id=10&name=Defence&page=1 to /cat/10/Defence/p1.htm
Please could someone show me how to do this using a 301 redirect in Apache?
scaledsolutions
02-24-2009, 06:00 PM
You wouldn't do this with a redirect, you would need to have mod_rewrite installed (no linky - I am too new to the forums - Google is your friend). Then you could rewrite the URL based on any rules you choose. You would need to talk to your hosting company to find out if they have the module installed, or see if they can install it for you.
What do you want the sample url to be redirected to?
ap0llo_*
02-24-2009, 06:11 PM
Ah right.
I want a URL like:
index.php?task=category&id=10&name=Defence&page=1
Changed to:
/cat/10/Defence/p1.htm
I've used my htaccess to rewrite these URLS, but when I go to the dynamic URL, it still loads - as far as I'm aware, Google sees this as duplicate content.
Look at my site: http://thegamecade.com to see what I mean.
scaledsolutions
02-25-2009, 05:11 PM
Can you post the content of your .htaccess file?
ap0llo_*
02-25-2009, 07:03 PM
Certainly, here it is:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^thegamecade\.com$ [NC]
RewriteRule ^(.*)$ http://www.thegamecade.com/$1 [R=301,L]
RewriteRule ^view/([0-9]+)/([0-9a-zA-Z?-]+) index.php?task=view&id=$1&name=$2 [L]
RewriteRule ^cat/([0-9]+)?/([0-9a-zA-Z?-]+)/p([0-9]+) index.php?task=category&id=$1&name=$2&page=$3 [L]
RewriteRule ^profile/([0-9]+)?/([0-9a-zA-Z?-]+) index.php?task=profile&id=$1&name=$2 [L]
RewriteRule ^profile/comments/([0-9]+)?/([0-9a-zA-Z?-]+) index.php?task=users_comments&id=$1&name=$2 [L]
RewriteRule ^page/([0-9]+) index.php?task=view_page&id=$1 [L]
RewriteRule ^task/register index.php?task=register [L]
RewriteRule ^task/lost-password index.php?task=lost_pass [L]
RewriteRule ^task/links index.php?task=links [L]
RewriteRule ^task/allnews index.php?task=news [L]
RewriteRule ^task/members index.php?task=member_list [L]
RewriteRule ^task/messages index.php?task=messages [L]
RewriteRule ^task/search index.php?task=search [L]
RewriteRule ^task/news/item/([0-9]+)/([0-9a-zA-Z?-]+) index.php?task=view_news&id=$1 [L]
RewriteRule ^task/messages index.php?task=messages [L]
scaledsolutions
02-26-2009, 01:46 PM
I think I understand now. You are already using mod_rewrite to convert the "friendly" URL to the longer URL. You want to keep google from discovering the longer URL when spidering your site.
My first question would be what application are you using to serve the content on your site? Is it capable of creating and using friendly URLs for you? I have started using Drupal for content management and it has this ability, as do most other CMS out there.
Secondly, why would the googlebot access the longer url? It looks like you are using the friendly URLs for links on your site and the bot should just follow the links. I am not a google expert so if I am missing something, let me know.
You may be able to to do this with mod_rewrite by creating rules to convert long URLs to friendly urls if the user agent is googlebot and it contains the long url and it doesn't have an entry in the query string indicating that it has been processed, ie
RewriteCond %{REQUEST_URI} ^index\.php$
RewriteCond %{REQUEST_URI} !&proc=1
RewriteCond %{HTTP_USER_AGENT} .*Googlebot.*
rewrite the long url to the short url and send the response as a redirect using [L,R=302] flags at the end of your rewrite.
You would have to change your existing rewrite rules to include the proc=1 param in the query string. Without some flag to indicate that the URL has already been processed, the request will get stuck in an infinite loop and you will get a 500 Server Error response.
scaledsolutions
02-26-2009, 05:38 PM
I hate to reply to myself, but I finally got some time to do a little testing with rules and found that this worked
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/index\.php$
RewriteCond %{HTTP_USER_AGENT} .*Googlebot.*
RewriteCond %{QUERY_STRING} ^task=view&id=([0-9]+)&name=([0-9a-zA-Z?-]+)$
RewriteRule (.*) /view/%1/%2/foo? [R=302,L]
RewriteRule ^view/([0-9]+)/([0-9a-zA-Z?-]+) /index.php?task=view&id=$1&name=$2&proc=1 [L]
The R=302 tells google that this is a permanent redirect. The clients never see the proc=1 at the end of the query string because a redirect isn't issued, the rewritten url is just used internally.
I hope this idea solves your problem.
ap0llo_*
02-26-2009, 08:25 PM
Thanks very much for your time scaledsolutions, have given it a test and works perfectly =]