URL rewrite with 301

Forum Moderators: phranque

Message Too Old, No Replies

URL rewrite with 301

troyid

8:00 pm on Feb 17, 2010 (gmt 0)

Here is my situation.

I run a news script which produces pages like http://www.example.com/cgi-bin/news.cgi?a=article&ID=1265144816

I have setup a htaccess rule that produces a html page.

RewriteRule news/([0-9]+)\.html$ /cgi-bin/news.cgi?a=article&ID=$1 [L]

The only problem is that it does not 301 the /cgi-bin/news.cgi?a=article&ID=$1 [L]
version to /news/1265144816.html so I might get penalized for duplicate content.

Any help would be appreciated.

[edited by: jdMorgan at 11:05 pm (utc) on Feb 17, 2010]
[edit reason] example.com [/edit]

jdMorgan

11:24 pm on Feb 17, 2010 (gmt 0)

There is no duplicate-content "penalty" -- That's a very pernicious Webmaster Myth. While there may indeed be "penalties" for massive quantities of intentional duplicate content, there is no penalty for minor, accidental duplicate content.

The "penalty" is that you have two (or more) URLs competing with each other for incoming links and PageRank/Link-popularity, and this dilutes the ranking of each of them.

Your RewriteRule "creates" nothing. All it does is tell the server to serve the content from the internal filepath /cgi-bin/news.cgi?a=article&ID=1234 when an HTTP request for the URL /news/1234.html is received from a Web client.

Keeping URLs and filepaths as separate and distinct concepts, associated *only* by the action of a server, will help a lot when thinking about rewrites and redirects.

The only problem is that it does not 301 the /cgi-bin/news.cgi?a=article&ID=$1
version to /news/1265144816.html so I might get penalized for duplicate content.

Were you expecting it to? That's not what your rule does... So you need a second (and complementary) rule to implement that function:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/news\.cgi\?a=article&ID=([0-9]+)(&[^\ ]*)?\ HTTP/
RewriteRule ^cgi-bin/news\.cgi$ http://www.example.com/news/%1.html [R=301,L]

This new rule should precede your rewrite code posted above, and it should precede your domain canonicalization redirect and any other less-specific redirects. These redirects should then be followed by your existing internal rewrites, again in order from most-specific to least-specific.

The complex RewriteCond is required to differentiate the /cgi-bin/news.cgi?a=article&ID=1234 script-path being directly requested by a client as a URL, as opposed to being internally-requested as the result of your existing internal rewrite rule. Without this test, the two rules would unconditionally countermand each other, resulting in an 'infinite' loop.

Jim

troyid

11:58 pm on Feb 17, 2010 (gmt 0)

Hi Jim,

I love your explanation regarding duplicate penalties.

I implemented your rule and it almost worked. When I visit http://www.example.com/cgi-bin/news.cgi?a=article&ID=1265144816 it 301's to http://www.example.com/news/1265144816.html?a=article&ID=1265144816

I just need it to 301 to http://www.example.com/news/1265144816.html

jdMorgan

12:20 am on Feb 18, 2010 (gmt 0)

Yeah, I forget that almost every time I re-type this code...

It should be:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/news\.cgi\?a=article&ID=([0-9]+)(&[^\ ]*)?\ HTTP/
RewriteRule ^cgi-bin/news\.cgi$ http://www.example.com/news/%1.ht[b]ml?[/b] [R=301,L]

Jim

troyid

12:32 am on Feb 18, 2010 (gmt 0)

Beautiful! Works a treat. Thanks Jim