Forum Moderators: phranque

Message Too Old, No Replies

Is that code correct? Question About mod rewrite

Question about rewrite rules

         

Haruka

5:04 pm on Jul 18, 2011 (gmt 0)

10+ Year Member



Hello,

I'm having problems with a RewriteRule here.
I'm trying to do something like that:
I have URLs like
www.mydomain.com/anything.aspx?idA=123456&idb=1234 and
www.mydomain.com/anything.aspx?ida=123456&idB=1234

As you can see, it's the same URL except for the cases. It's a problem for me and I want to unify these URLs by rewritting everything to something like the second URL.

So I've come out with a code I ~guess~ will work, but I'm not sure about that and unfortunately I can't test it. Since I don't want my site to fell into doom or into a redirect loop, I'd like to show and ask you if it's everything okay with that code:

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^idA(.*) http://www.mydomain.com/anything.aspx?ida$1 [R=301,L]
RewriteRule ^idb(.*) http://www.mydomain.com/anything.aspx?ida$1idB$2 [R=301,L]

Will it work properly? I just want all idA to be converted into ida and idb to turn into idB. The problem is I don't know very well how to do it...
I'll be very grateful if someone could help me, because I really need to solve this problem ):

PS: I'm very sorry about my English, I hope it's at least understandable.

lucy24

10:31 pm on Jul 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do all of your rewrites apply only to the query string? If so, an ordinary rewrite won't do. You have to first set up a

RewriteCond %{QUERY_STRING} [A-Z]

meaning, simply, the query string contains at least one capital letter. Most RewriteCond statements include an NC line. Here you don't want to use one, because case does matter. If the original query string contains no capital letters, don't mess with it.

The Rewrite Rule will then be in the form

RewriteRule (.+) $1?{something here} [L]

meaning that you keep the original input as-is, but replace the existing query string with a new one. You don't need a redirect unless the various urls are all floating around the internet and you need to get rid of the duplicates.

Now for the tricky part. Some RegEx flavors have a case-changing function. Apache doesn't seem to. [pcre.org] Do you have a limited number of specific queries to change-- or at least a limited number of specific letters, as in your example? Can you route the capitalized queries to something else, like a php script? This should happen before the request even reaches htaccess, if possible.

Haruka

3:44 pm on Jul 19, 2011 (gmt 0)

10+ Year Member



Well, my main problem here is that thoose duplicated URLs already are floating around the Internet, since Google appear to index both of them (with the capital in idA and the idB version too).
So I really need to use a redirect here...I just don't know how to do it. I've heard about the case-changing but it really looks like Apache doesn't have it ):
All of my rewrites apply only to the query string. I guess I understood what you meant, but can't I use it with a redirect?
And do I need to tell exactly what it the {QUERY STRING}? When I specificate that the query string contains at least one capital letter and then set up a RewriteRule like:
RewriteRule (.+) $1?{idA} [L]
Won't it just replace all the query strings to idA? How do I make something more specific for each query string?
Thx for the help, I think I understood a bit more about RewriteCond and Rules now thanks to you (:

lucy24

7:40 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, if they're floating around the Internet you do need a redirect so that eventually g### will get the message that they're the same page. Matter of fact, they shouldn't even be getting to the variant pages, since everything after the domain name is supposed to be case sensitive. (Technically it depends on what platform the server runs on. Fortunately I didn't know this until after I'd developed the case-sensitive habit.)

Now, since Apache doesn't have a decapitalize function, you will have to do it by brute force. That's why I asked how many different strings are involved, or how many letters. Will you potentially need to decapitalize the whole alphabet? Is there some ceiling to how many times any given letter might occur?

It is probably worth the time investment to look through your raw logs and see just how many different configurations we're talking about. Open them in a text editor that does Regular Expressions, and search for anything crawled by g### that contains capital letters in the query.

We may be looking at the rare case where you have to append a [N] for "next" [httpd.apache.org] to the end of the rule.
The [Next] flag could be used, for example, if you wished to replace a certain string or letter repeatedly in a request. The example shown here will replace A with B everywhere in a request, and will continue doing so until there are no more As to be replaced.

RewriteRule (.*)A(.*) $1B$2 [N]

You can think of this as a while loop: While this pattern still matches (i.e., while the URI still contains an A), perform this substitution (i.e., replace the A with a B).

You would need this rule if you potentially have more than one capital A to change into a lower-case a. And then you do the same thing for all the others.

That's assuming it all has to be done in .htaccess. If you can lay your hands on someone who speaks php, there is probably a less painful way to do it.

g1smd

8:03 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When faced with this problem I REWRITE all requests with upper case to a special PHP script. The special PHP script builds the new URL then issues the HEADER to redirect the browser to the correct canonical URL.

There's sample code in a recent thread, perhaps a couple of months ago.

Haruka

8:05 pm on Jul 19, 2011 (gmt 0)

10+ Year Member



Oh, there are just two parameters. I guess there isn't a problem on telling that, so the parameters usually are idDept and idProduct. They usually are used in two ways:
idDept=1234&idproduct=123456
and
iddept=1234&idProduct=123456
Since I want to padronize that, I want all the URLs to be redirected into something like: iddept=1234&idproduct=123456 (all in lower case).
"G###" just appear to index some URLs like the first and some others like the second, so I'm not sure if they have any criteria about it.
Since it's just two parameters I guess there's not a need for the Next append.
Will it work if I just set up a condition about the letters "D" and "P" in a query string and two rules to treat that? I guess I'm talking about something like that:
RewriteCond %{QUERY_STRING} [D,P]
RewriteRule +D $1?{d} [R=301, L]
RewriteRule +P $1?{p} [R=301, L]

I'm not sure about that "+P" and "+D" part, but since I want anything that has one "P" or one "D" to be turned into lower case p or d, I guess that's the correct syntax. And I really have to do it on .htaccess, unfortunately.
Sorry about my newbieness, I'm studying about htaccess just recently, so I'm not sure about its syntax or anything that's just a bit different from the examples they have on the official documentation.

lucy24

8:45 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{QUERY_STRING} [D,P]


Whoops! You're confusing the posix-class brackets with the special Apache rule brackets. [D,P] would mean "the query contains at least one D or P or comma". If your queries never do contain commas it won't do any harm, but better not confuse yourself. Since the only possibilities are rewriting "Dept" to "dept" and "Product" to "product", make two separate rules, each with their own condition.

The tricky part now becomes how to preserve everything else in the query.

RewriteCond %{QUERY_STRING} ^idDept(.+)
RewriteRule (.+) $1?iddept%1 [R=301]


(leaving off the L because you may not be done) and then

RewriteCond %{QUERY_STRING} ^(.+)idProduct(.+)
RewriteRule (.+) $1?%1idproduct%2 [R=301,L]


The $1 means "stuff captured earlier in the Rule" while the %1 and %2 mean "stuff captured in the last Condition". I would not recommend collapsing them into a single rule with two [OR] conditions because you're liable to make a big mess of the query-string captures. Well, I would, anyway.

g1smd

9:31 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need to be very sure that no request can ever result in a multiple step redirection chain.

You will need multiple rules, one for each possible incorrect request format, but only one rule will run for any request that actually arrives.

lucy24

9:45 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You will need multiple rules, one for each possible incorrect request format, but only one rule will run for any request that actually arrives.

So, in this case, there should be a third rule for
^idDept(.+)idProduct(.+)

? By "third" I mean "first", since it would have to execute before the two other rules, the ones in which only one of the two words is capitalized. And then each one can have its own [L] ?

:: mystery solved: in Preview it shows "code" at a teeny size, but it gets nice and big when posted ;) ::

Haruka

4:35 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Hhmmmm. I've tried out the code you gave me above but it looks like it takes the full path of where the .htaccess is placed.
Like, instead of redirecting me to
www.mydomain.com/anything.aspx?iddept=123456&idproduct=1234
it gives me the adress:
www.mydomain.com/home/mydomain/public_html/anything.aspx?iddept=123456&idproduct=1234


I guess I'll have to inform the full path in the rules? I don't know why it's happening, but it has something to do with the place where .htaccess file is located. At least it's really working for the upper-lower case thing, what is a great thing!
I've searched a little more about it and found that too:

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]


However, I've tried with something like:


Options +FollowSymLinks
RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{QUERY_STRING} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]


But it didn't work and the entire domain gave me a 500 error. It was great to discover that htaccess is suposed to have something specific for lower case, but but the result was worse than before...Does someone have a clue about what's happening in both of cases?

lucy24

7:11 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's fun getting a 500 the first time because at least you get to see your custom 500 screen in action ;) (Just don't waste time trying to figure out how it manages to display it when the whole domain has crashed!) But yes, it does get old pretty fast.

First and most obvious question: Do you in fact have a RewriteMap [httpd.apache.org]? It isn't a free-standing command; there has to be a document to go with it.

Oh, and the RewriteRule has to say (in the "target" part)

http://www.example.com/$1?%1idproduct%2

g1smd

7:22 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If internal paths are being exposed as URLs it is likely you have external redirects listed after internal rewrites. The rule order is very important.

I hinted at a PHP solution earlier in this thread. There's more information at [webmasterworld.com...] and [webmasterworld.com...]

Haruka

8:43 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Well, it's not the order thing, but it's the case Lucy has said. I'll try it out and forget about the rewrite map. Thanks to everyone who helped, I guess I can fix it now (:
My only problem was with the path, but it seems like it was fixed, so everything is okay now (: