Forum Moderators: phranque

Message Too Old, No Replies

Does switching to mod_rewrite (or equivilent) mean duplicate content?

         

colossus

7:25 pm on Jul 1, 2005 (gmt 0)

10+ Year Member



I have a large site with thousands of pages which has been indexed by all the major search engines (Yahoo 23k pages, MSN 17K pages, Google 12K pages). But the problem is we use long, complex querystrings. Unfortunately, no thought was given to SEO in designing the site a year ago. The querystring parameters all use the convention {Entity}ID and the key values are all GUIDs. We are getting indexed but not ranked very well. Plus, google randomly decides that it doesn't like some of the parameters in our querystring and just removes them. This means that google has hundreds of our error pages in their index.

I have been tasked with SEO and have determined that we need to "mask" the urls to hide their complexity. The problem is that because of the number of pages which have been indexed already I have to support the old url format so that users with bookmarks and links from SERPs don't hit dead links.

My questions are as follows:

-Will my site be penalized for duplicate content when I have 2 urls which point to the same page? (Only one of the urls will be used on our site, but the original url is already in the SE indexes and will likely be used to determine duplicate content).
-Is there a way that the SE indexes could be cleared and our site reindexed? (I thought of submitting a remove request followed by a resubmission, but that would take months).
-Should I support the old url format or throw it out?
-If I throw out the old url format how can I keep from loosing traffic?

Here's the old url format:

www.mydomain.com/directory/page.aspx/templatename?State=CA&SSID={GUID}&AreaID={GUID}&SchoolID={GUID}

Here's the new format:

www.mydomain.com/directory/templatename.mxp/California/Boys_Varsity_Football_Fall_04-05/AreaID-{GUID}/SchoolID-{GUID}

NOTE: The new format still has GUIDs but I'm working bit by bit to transform the whole url keeping performance in mind where I have to do lookups to translate the text in the url to key values.

colossus

10:58 pm on Jul 1, 2005 (gmt 0)

10+ Year Member



Well for any who might be interested, I found a solution. The solution to all problems related to moving content is 301 redirect. Because I wrote my own mod_rewrite module I was able to add a 301 redirect whenever a page is requested using the old format. This should take care of robots and users. Robots will replace the old file in the index and browsers will redirect users. In the case a browser doesn't redirect, I have a page informing the user what happened.

jdMorgan

2:44 am on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



colossus,

Welcome to WebmasterWorld!

The usual approach is to *internally rewrite* static URLs to their dynamic equivalent, in order to work with your existing script(s). Then *externally redirect* any direct client requests for dynamic URLs to the static URL.

Even a cursory examination of the simple description above will result in the question, "But won't this cause an 'infinite' redirection loop?", and the answer is, "Yes, it could." A bit of care and trickery will solve the problem, though. As a simple example showing what's needed, let's just take a simple one-parameter dynamic URL:

Friendly static URL: www.example.com/buy/car
Unfriendly dynamic URL: www.example.com/buy.php?what=car


# Internally rewrite static URLs to script
RewriteRule ^buy/(.+) /buy.php?what=$1 [L]
#
# Externally redirect client-requested dynamic URLs to new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /buy\.php\?what=([^ ]+)\ HTTP/
RewriteRule ^buy\.php$ http://www.example.com/buy/%1 [R=301,L]

Here, we use the server variable {THE_REQUEST} to examine the URL-path originally requested by the client, rather than the current (and possibly-rewritten) URL-path seen by RewriteRule. By doing so, we avoid the pitfall of an infinite redirection loop.

Jim

colossus

1:49 pm on Jul 6, 2005 (gmt 0)

10+ Year Member



jdMorgan,

Thanks, I apprecate the reply. From your response it looks like I am on the right track. Your sample includes the code [R=301,L] which indicates a 301 redirect once a dynamic request has been rewritten (see your snippet below).

# Externally redirect client-requested dynamic URLs to new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /buy\.php\?what=([^ ]+)\ HTTP/
RewriteRule ^buy\.php$ http://www.example.com/buy/%1 [R=301,L]

I forgot to mention that I am using IIS and while there are equivilents to mod_rewrite out there I chose to write my own (albeit simplified) version to allow me to replace complex query values with simple ones by doing lookups (the site was designed to use GUIDs). But again, you have confirmed my findings for me. Thank you.

Mark