Welcome to WebmasterWorld Guest from 54.167.155.147

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

URL rewrite with 301

   
8:00 pm on Feb 17, 2010 (gmt 0)

10+ Year Member



Here is my situation.

I run a news script which produces pages like http://www.example.com/cgi-bin/news.cgi?a=article&ID=1265144816

I have setup a htaccess rule that produces a html page.

RewriteRule news/([0-9]+)\.html$ /cgi-bin/news.cgi?a=article&ID=$1 [L]

The only problem is that it does not 301 the /cgi-bin/news.cgi?a=article&ID=$1 [L]
version to /news/1265144816.html so I might get penalized for duplicate content.

Any help would be appreciated.

[edited by: jdMorgan at 11:05 pm (utc) on Feb 17, 2010]
[edit reason] example.com [/edit]

11:24 pm on Feb 17, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



There is no duplicate-content "penalty" -- That's a very pernicious Webmaster Myth. While there may indeed be "penalties" for massive quantities of intentional duplicate content, there is no penalty for minor, accidental duplicate content.

The "penalty" is that you have two (or more) URLs competing with each other for incoming links and PageRank/Link-popularity, and this dilutes the ranking of each of them.

Your RewriteRule "creates" nothing. All it does is tell the server to serve the content from the internal filepath /cgi-bin/news.cgi?a=article&ID=1234 when an HTTP request for the URL /news/1234.html is received from a Web client.

Keeping URLs and filepaths as separate and distinct concepts, associated *only* by the action of a server, will help a lot when thinking about rewrites and redirects.

The only problem is that it does not 301 the /cgi-bin/news.cgi?a=article&ID=$1
version to /news/1265144816.html so I might get penalized for duplicate content.

Were you expecting it to? That's not what your rule does... So you need a second (and complementary) rule to implement that function:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/news\.cgi\?a=article&ID=([0-9]+)(&[^\ ]*)?\ HTTP/
RewriteRule ^cgi-bin/news\.cgi$ http://www.example.com/news/%1.html [R=301,L]

This new rule should precede your rewrite code posted above, and it should precede your domain canonicalization redirect and any other less-specific redirects. These redirects should then be followed by your existing internal rewrites, again in order from most-specific to least-specific.

The complex RewriteCond is required to differentiate the /cgi-bin/news.cgi?a=article&ID=1234 script-path being directly requested by a client as a URL, as opposed to being internally-requested as the result of your existing internal rewrite rule. Without this test, the two rules would unconditionally countermand each other, resulting in an 'infinite' loop.

Jim
11:58 pm on Feb 17, 2010 (gmt 0)

10+ Year Member



Hi Jim,

I love your explanation regarding duplicate penalties.

I implemented your rule and it almost worked. When I visit http://www.example.com/cgi-bin/news.cgi?a=article&ID=1265144816 it 301's to http://www.example.com/news/1265144816.html?a=article&ID=1265144816

I just need it to 301 to http://www.example.com/news/1265144816.html
12:20 am on Feb 18, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yeah, I forget that almost every time I re-type this code...

It should be:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/news\.cgi\?a=article&ID=([0-9]+)(&[^\ ]*)?\ HTTP/
RewriteRule ^cgi-bin/news\.cgi$ http://www.example.com/news/%1.ht[b]ml?[/b] [R=301,L]

Jim
12:32 am on Feb 18, 2010 (gmt 0)

10+ Year Member



Beautiful! Works a treat. Thanks Jim