Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate Content URLs - is this httpd.conf RewriteRule sufficient?

The use of RewriteRule in the httpd.conf.

         

helpnow

6:46 pm on Oct 13, 2006 (gmt 0)

10+ Year Member



My long-standing relief at having fixed duplicate content has just turned into a cold sweat: Have I fixed my duplicate content issues sufficiently?

Let's say that for a period of time, I was sloppy, and allowed URLs like the following, which all lead to the same physical page:

mysite/x1/y2/z3
mysite/y2/z3/x1
mysite/z3/x1/y2
etc.

I used a series of RewriteRule lines in my httpd.conf file to force all of the above URLs into a standard format, say:

mysite/x1/y2/z3

i.e. mysite/y2/z3/x1 and mysite/z3/x1/y2 are now converted on the fly by httpd.conf into the one standard format: mysite/x1/y2/z3

<Sigh> So, my question is, is that good enough? With all this talk about 304s, pemanent redirects, etc. etc., I am worried now if the way I used RwreiteCond is the way you fix these things once and for all.

An example of one of my RewriteRules is as follows:

RewriteRule (.*)/variable([0-9]+)/(.*) $1/$3/variable$2 [R=301]

I've got the [R=301] in there. Is that it? Does that make them permanent? Am I missing anything else?

I think I am fine, but I work in a vaccum here, like so many others, with no one to bounce my work off of, so, I would dearly like the opinions of those who know better than I. I am a professonal coder, but when it comes to apache, I am an amateur hack.<grin>

I also hope that by posing this question in such concise detail, that the responses of this community may help others with this same question also confirm that their own efforts are on the right track.

Please accept my thanks in advance. I often just lurk here, and the advice I have found here has been over-the-top (I really mean that.) Bravo to everyone who takes the time to respond to queries like this from people like me.<grin>

g1smd

7:07 pm on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add the code to the site and then test it. The 301 sets it as a permanent redirect.

I think your example code is too simplistic.

I think it will match several positions.

I think you need much more code.

If you need the order to be ABC, then you will need to test for, and then correct, each of: ACB, BAC, BCA, CAB and CBA.

helpnow

7:47 pm on Oct 13, 2006 (gmt 0)

10+ Year Member



Thank you very much for your reply, g1smd!

The truth is, my final httpd.conf file is 127K. It is full of all of my RewriteConds and RewriteRules... I just extracted one line of it - sorry if that was just too simplistic!<grin>

All that to say, yes, I did test it, and my whole flotilla of rules and conditions does test out the way I need them to work. Together, they solve a bunch of duplicate content and bad URL issues.

Actually, in my rewriteRule example above, it just removes an old useless variable from the URL that is no longer used. My original post above in that respect was unintentionally ambiguous... <embarrassed look>

An example of one of the RewriteRules I used to actually _reorganize_ variables in the URL format is as follows:

RewriteRule (.*)/variable([0-9]+)/(.*) $1/$3/variable$2 [R=301]

With a few of these in a row for different variables, the resulting URLs always come out in the standard format I settled on...

And, as for the 301, what I have is fine then? I've got it right, that the R=301 will remove the duplicates in Google's eyes, so that all the duplicate URLs resolve to the one single standard URL, and eventually the errant duplicates will disappear?

(Thank you again!)

AndyA

7:55 pm on Oct 13, 2006 (gmt 0)

10+ Year Member



I like to watch Googlebot go through the latest visitors log. I'm thrilled when I see it hit a URL, and can see the server return a 301, then on the next line Googlebot gets the correct URL.

I can also see where Googlebot doesn't go to the correct page, but gets a 301, so I know it's checking back to make sure that URL is still gone.

I am also quite happy to see all the URLs listed at Webmaster Central that are forbidden by robots.txt. Lots of duplicated URLs in there, mostly from my forum which was passing out session IDs like there was no tomorrow! I'll be glad to see all of those pages go away.

And, I've already noticed some of my formerly supplemental pages are dropping the supplemental results badge of dishonor. Still not ranking well, but Googlebot has a lot of old cache dates to still update!

g1smd

8:36 pm on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> RewriteRule (.*)/variable([0-9]+)/(.*) $1/$3/variable$2 [R=301]

My point was that (.*)varB(.*) will match both ABC and CBA, so be careful.

You also need to make sure that you get from the wrong URL to the correct URL in just one step, avoiding a redirection chain.

By the way, your example has a different number of left ( and right ) brackets - mismatched.

helpnow

8:43 pm on Oct 13, 2006 (gmt 0)

10+ Year Member



Hello again!

I need to consider your other points before I respond to them, but I did want to first ask where you see the mismatch in brackets, I see 3 opened, and 3 closed... Am I spooked, is there a mismatch somewhere!? ; )

g1smd

10:44 pm on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heck, I must be going cross-eyed. It's OK. Brackets are fine after all.