Forum Moderators: Robert Charlton & goodroi
Let's say that for a period of time, I was sloppy, and allowed URLs like the following, which all lead to the same physical page:
mysite/x1/y2/z3
mysite/y2/z3/x1
mysite/z3/x1/y2
etc.
I used a series of RewriteRule lines in my httpd.conf file to force all of the above URLs into a standard format, say:
mysite/x1/y2/z3
i.e. mysite/y2/z3/x1 and mysite/z3/x1/y2 are now converted on the fly by httpd.conf into the one standard format: mysite/x1/y2/z3
<Sigh> So, my question is, is that good enough? With all this talk about 304s, pemanent redirects, etc. etc., I am worried now if the way I used RwreiteCond is the way you fix these things once and for all.
An example of one of my RewriteRules is as follows:
RewriteRule (.*)/variable([0-9]+)/(.*) $1/$3/variable$2 [R=301]
I've got the [R=301] in there. Is that it? Does that make them permanent? Am I missing anything else?
I think I am fine, but I work in a vaccum here, like so many others, with no one to bounce my work off of, so, I would dearly like the opinions of those who know better than I. I am a professonal coder, but when it comes to apache, I am an amateur hack.<grin>
I also hope that by posing this question in such concise detail, that the responses of this community may help others with this same question also confirm that their own efforts are on the right track.
Please accept my thanks in advance. I often just lurk here, and the advice I have found here has been over-the-top (I really mean that.) Bravo to everyone who takes the time to respond to queries like this from people like me.<grin>
I think your example code is too simplistic.
I think it will match several positions.
I think you need much more code.
If you need the order to be ABC, then you will need to test for, and then correct, each of: ACB, BAC, BCA, CAB and CBA.
The truth is, my final httpd.conf file is 127K. It is full of all of my RewriteConds and RewriteRules... I just extracted one line of it - sorry if that was just too simplistic!<grin>
All that to say, yes, I did test it, and my whole flotilla of rules and conditions does test out the way I need them to work. Together, they solve a bunch of duplicate content and bad URL issues.
Actually, in my rewriteRule example above, it just removes an old useless variable from the URL that is no longer used. My original post above in that respect was unintentionally ambiguous... <embarrassed look>
An example of one of the RewriteRules I used to actually _reorganize_ variables in the URL format is as follows:
RewriteRule (.*)/variable([0-9]+)/(.*) $1/$3/variable$2 [R=301]
With a few of these in a row for different variables, the resulting URLs always come out in the standard format I settled on...
And, as for the 301, what I have is fine then? I've got it right, that the R=301 will remove the duplicates in Google's eyes, so that all the duplicate URLs resolve to the one single standard URL, and eventually the errant duplicates will disappear?
(Thank you again!)
I can also see where Googlebot doesn't go to the correct page, but gets a 301, so I know it's checking back to make sure that URL is still gone.
I am also quite happy to see all the URLs listed at Webmaster Central that are forbidden by robots.txt. Lots of duplicated URLs in there, mostly from my forum which was passing out session IDs like there was no tomorrow! I'll be glad to see all of those pages go away.
And, I've already noticed some of my formerly supplemental pages are dropping the supplemental results badge of dishonor. Still not ranking well, but Googlebot has a lot of old cache dates to still update!
My point was that (.*)varB(.*) will match both ABC and CBA, so be careful.
You also need to make sure that you get from the wrong URL to the correct URL in just one step, avoiding a redirection chain.
By the way, your example has a different number of left ( and right ) brackets - mismatched.