Forum Moderators: phranque

Message Too Old, No Replies

htaccess query string rewrite/redirect

         

atomMan

2:58 pm on Oct 16, 2011 (gmt 0)

10+ Year Member



hi all

i have a multilingual wordpress site and i set it up using query strings, which i now realize was a mistake, so i need to rewrite (redirect?)...

/some-page/?lang=ru


to...

/ru/some-page


so from askapache, i found this and, though i know a little about PCRE, i don't grasp what's going on here...

RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+)/(de|es|fr|it|ja|ru|en)/\ HTTP/ [NC]
RewriteRule ^(.*)$ - [env=lang:%2]


here's kind of what i'm contemplating, but i don't know how to complete it. i need to capture the query string language code (/some-post/?lang=de) and rewrite/redirect as /de/some-post and i am very weak on capturing with RegEx. matching is no problem...

RewriteCond /\?lang=[a-z]{2} [NC]
# or...
RewriteCond /\?lang=(es|de|fr) [NC]

this needs to work with requests like these...

/?lang=es # (site root)
/page/sub-page/?lang=es


so the first would be directed to the site root "/" and the latter to /es/page/sub-page

and i need to do this in a way that is the most SEO friendly

lucy24

10:08 pm on Oct 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's start with the boilerplate on query strings. Ignore the parts you already know.
Query Strings

The Query String, also known as a Parameter, is the part of an url after the question mark. Question = query.

By default, rewrites simply ignore the query string. That is, mod_rewrite stashes the query in a safe place, does its stuff to the part before the question mark, and then reappends the original query.

Changing a Query

#1 To delete a query, add a ? to the end of your rewrite target.
#2 To replace a query—or create a new one—add ?blahblah to the rewrite target. The blahblah can be either literal text, or stuff you captured earlier. (#1 and #2 are really the same thing: you're just replacing the query with either something or nothing.)
#3 To add to an existing query, again put ?blahblah at the end of the target, but also add [QSA] to your flags (the bracketed items at the end of the Rule). It stands for "Query String Append", meaning that the blahblah is to be added to the existing query—if any—instead of replacing it.

Getting the Query

You only need to retrieve the original query if
#1 you want the rewrite to behave differently depending on what the query was
or
#2 you need to change or delete the query

Add a Condition that says

RewriteCond %{QUERY_STRING} blahblah


using your ordinary Regular Expressions, anchors and ! as needed.

To test whether there was a query at all

RewriteCond %{QUERY_STRING} .


which simply means "If the query contains at least one character of any kind".

If you need to capture any of the query, use parentheses as usual. In the rewrite target, the captures will be %1, %2 etc instead of $1, $2 etc, because they are coming from a Condition instead of the Rule. Each set is separately numbered, so the first capture from the Rule will still be $1.

g1smd

12:05 am on Oct 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule cannot "see" query string data.

You need a RewriteCond looking at %{QUERY_STRING} here.

You then redirect parameterised URL requests to the SEF version of the URL using a RewriteRule and the [R=301,L] flags.

atomMan

3:15 pm on Oct 17, 2011 (gmt 0)

10+ Year Member



i would need at least a generic sample to work with - i know mod_rewirte is a hugely complex subject and i need to use it only occasionally

i understand what you guys are saying, and i have a basic understanding of RegEx, but i have NO idea how to put it together

this is a production site and i just need to capture the /?lang=nn string and redirect/rewrite to /nn/$1

atomMan

3:48 pm on Oct 17, 2011 (gmt 0)

10+ Year Member



ok, this is what i found that appears close to what i need...

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} ^q=(.*)$ [NC]
RewriteRule ^(.*)$ /$1/%1? [R=301,L,NE]

R=301 will redirect with https status 301
L will make last rule
NE is for no escaping query string

%1 is capture group for query string q= (whatever comes after q=)
$1 is your REQUEST_URI


and this is my mod for the languages i am using...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L,NE]


alternatively i could do...

RewriteCond %{QUERY_STRING} ^lang=(\w){2}$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L,NE]


or is supposed to be...

RewriteCond %{QUERY_STRING} ^lang=(\w){2}$ [NC]
RewriteRule ^(.*)/$1 [R=301,L,NE]


or...

RewriteCond %{QUERY_STRING} ^lang=(\w){2}$
RewriteRule ^(.*)/$1


i need to go from "/this-page/?lang=nn" to "/nn/this-page"

i don't know which is correct/better/more efficient

thoughts?

i don't know the "NE" is needed in RewriteRule since there will be no query string in the rewritten URI

lucy24

4:49 pm on Oct 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looks good. You've got each piece sorted out.

But to use the {number} construction it would have to be

^lang=(\w{2})$

otherwise you are only capturing the first of two letters. And then it's just as fast to say \w\w.

It's your judgement call between that form and your alternative

lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)

depending on how likely you are to get queries for nonexistent languages. If it isn't possible, then the \w\w version is faster.

And unless the query never contains anything but "lang=\w\w" it may be safer to leave off the anchors.

The [NE] has to go in the line that's actually using it: not the Rule but the Condition.

You don't need anchors in the Rule; by default the regex will capture everything. And if the language is to come first, the Rule has to be

RewriteRule (.*) /%1/$1? [R=301,L]

atomMan

6:32 pm on Oct 17, 2011 (gmt 0)

10+ Year Member



thanks lucy24

i will be getting queries from non-existant langs. what i think i want to do there is simply redirect them to the home page, so then i'd want to use the OR version, yes?

lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)

now, how to redirect all other lang= to home, and one of them will be "ww-ww". the rest will be 2 word chars...

RewriteCond %{QUERY_STRING} ^lang=(\w\w|\w\w\-\w\w)$ [NC]
RewriteRule /


now i want these to be a permanent... redirect? - i want the address in the browser to be the actual URI, not the old URI, and i want to tell index bots the new URI also, so...

RewriteCond %{QUERY_STRING} ^lang=(\w\w|\w\w\-\w\w)$ [NC]
RewriteRule / [R=301,L]


so given this URI...

http://site.com/some-page/?lang=af


and this rule...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L]

will produce this result? ...

[code]http://site.com/af/some-page


cause what i don't understand is where how the /af/ is being inserted properly

if that's good, then putting it all together...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L]

RewriteCond %{QUERY_STRING} ^lang=(\w\w|\w\w\-\w\w)$ [NC]
RewriteRule / [R=301,L]


but is there supposed to be 2 "$" the rule, or should it be...

RewriteRule ^(.*)/$1 [R=301,L]

g1smd

6:46 pm on Oct 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The RewriteRule needs this syntax:

RewriteRule pattern target [flags]


You have at least one element missing for some of your examples.

For a redirect, the target must contain the protocol and domain name and sjould use the [R=301,L] flags.

atomMan

6:54 pm on Oct 17, 2011 (gmt 0)

10+ Year Member



i have the flags in the last example

lucy24

4:10 am on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you first redirect all the languages you do recognize, then you don't have to do anything fancy with the leftovers. A simple {QUERY_STRING} ^lang= will pick up everything else. If you don't put anything at all after the = sign, then you will even be picking up requests that somehow picked up a "lang" query string without putting any value into it. And you presumably want those to hit the home page too.
and this rule...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L]

will produce this result? ...

http://site.com/af/some-page


cause what i don't understand is where how the /af/ is being inserted properly

if that's good, then putting it all together...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule ^(.*)$ /$1 [R=301,L]

The parentheses in your Condition are here doing two separate jobs. And you don't even have to pay them extra.

One job is to group the languages, so it means the Query begins with "lang=af" OR with "lang=da" OR with ... et cetera. If you didn't have the parentheses but kept everything else the same, the Condition would mean "Query begins with 'lang=af'" OR contains "da" OR contains "de" et cetera.

The second job is capturing. Anything inside parentheses automatically picks up a label: %1 %2 etc. in the Conditions; $1 $2 etc in the Rule.

But you seem to have misplaced a bit of your Rule. It has to be

RewriteRule ^(.*)$ %1/$1? [R=301,L] 


The %1 is your "af" or whatever. The final ? does not mean "optional" as it would mean in a Pattern; it means "get rid of the query string". Crucial, or else the rule will keep executing over and over again.

RewriteCond %{QUERY_STRING} ^lang=(\w\w|\w\w\-\w\w)$ [NC]
RewriteRule / [R=301,L]

but is there supposed to be 2 "$" the rule, or should it be...

RewriteRule ^(.*)/$1 [R=301,L]

This is your rule for the leftovers? As noted above, you don't need to say anything beyond ^lang= in the Condition. The Rule becomes

RewriteRule (.*) /$1? [R=301,L]


You left out a space, which would be fatal. Contrarily, you don't need an anchor. In this context it won't do any harm; it just isn't needed. Again, final ? to strip the query string.

Oh, wait, you said Index page. Then it is still easier because all you need to say-- with no capturing at all-- is

RewriteRule .* http://www.example.com/ [R=301,L]


If your query string contains things other than language, things get messier.

Officially you need to put the whole http://www.example.com part before your leading slash in the Rewrite whenever it is turned into a Redirect. But mod_rewrite seems to use the same fallback as mod_alias, which is to recycle the existing host if you don't specify one.

g1smd

6:37 am on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It has to be
RewriteRule ^(.*)$ %1/$1? [R=301,L]


Almost! The redirect target should include the protocol and domain name.

RewriteRule (.*) http://www.example.com/%1/$1? [R=301,L]


Additionally, the question mark strips the parameters, and prevents an infinite redirect loop.

atomMan

1:02 pm on Oct 18, 2011 (gmt 0)

10+ Year Member



i think i have a better understanding of how back-referencing works here thanks to your very helpful posts guys

so then, adding what g1smd said, this is where i'm at...

RewriteCond %{QUERY_STRING} ^lang=(af|da|de|es|fr|it|nl|no|pt|ro|sv)$ [NC]
RewriteRule (.*) http://example.com/%1/$1? [R=301,L]


i don't understand this though -- if the protocol/domain is needed here, then it looks to me like a request for
http://example.com/?lang=af
would end up something like
http://example.com/af/http://example.com/
. (.*) already captured the request, and we spit it back out with $1, so why is the proto/domain needed here?

then i'm gonna dump all other "lang" query strings and i want a request for root "/" to rewrite to root and a request for any other page to rewrite to that page...

RewriteCond %{QUERY_STRING} ^lang= [NC]
RewriteRule (.*) /$1? [R=301,L]


now if you're correct g1smd in your last post, wouldn't the proto/domain be needed here too?

my brain is catching fire again :)

atomMan

2:23 pm on Oct 18, 2011 (gmt 0)

10+ Year Member



after testing for redirecting known langs, i see both seem to work...

RewriteRule (.*) http://test.12bytes.org/%1/$1? [R=301,L] 
RewriteRule (.*) /%1/$1? [R=301,L]


and both of these seem to work for redirecting unknown langs...

RewriteRule (.*) http://test.12bytes.org/$1? [R=301,L]
RewriteRule (.*) /$1? [R=301,L]


but the right way in both cases is the first?
and i don't need "^(.*)" instead? don't i want to match from beginning of string?

g1smd

7:19 pm on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$1 contains ONLY the requested path, so no, the domain will not be added twice.

If you do not include the domain name in the rule target then
http://example.com/path?lang=af 
redirects to
http://example.com/af/path
and
http://www.example.com/path?lang=af 
redirects to
http://www.example.com/af/path

This creates either a duplicate content problem, or non-canonical requests result in an unwanted multiple step redirection chain.

Include the domain name in the rule target so that both requests redirect to
http://www.example.com/af/path

atomMan

7:49 pm on Oct 18, 2011 (gmt 0)

10+ Year Member



gotchya - thanks!
thanks to you too lucy24 !

that's the way i stuck it in htaccess (with proto/domain) and it seems to be working fine

g1smd

8:01 pm on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure you install the Live HTTP Headers extension for Firefox so you can examine the HTTP transaction in detail.

lucy24

8:54 pm on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you do not include the domain name in the rule target then...

I've only just realized why I don't have this problem. It's because, as noted elsewhere, the www-redirect is done by the host, so by the time anything reaches my individual htaccess it is already in canonical form. (Hence the regular pairing of 301 and 403 for the same request from certain slow-on-the-uptake robots.)

Basing your behavior on what works in specific circumstances may or may not be a good idea ;)

g1smd

9:00 pm on Oct 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you use Live HTTP Headers you will see that non-canonical requests are sometimes redirected twice. This is always a problem.

atomMan

9:18 pm on Oct 18, 2011 (gmt 0)

10+ Year Member



my setup is the same as lucy24

i'll check it out with FireBug though

atomMan

12:58 pm on Oct 19, 2011 (gmt 0)

10+ Year Member



ok, now i have a small "problem" that i'm not sure is a problem - i have a photo gallery and i see google (no human visitors) is hitting the images and adding the language code to the URI which results in a 404...

should i even bother with this?

the image URI's in my sitemap do not contain the lang code, so i don't know why google is parsing these URI's. here's an example URI where the lang code is added...

http://ex.com/de/wp-content/uploads/image.gif


now if i was to deal with this, how does this look? ...

RewriteCond %{REQUEST_URI} \.(gif|jpg|png|bmp)$ [NC]
RewriteCond http://(www\.)?/(\w\w/) [NC]


as for the rewrite, i'm lost :)

i captured the lang code and extra trailing slash with "(\w\w/)", but how do i remove it in the rewrite rule?

lucy24

6:23 pm on Oct 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



should i even bother with this?

If you start taking the time to redirect spurious URLs, there will be no end to it. They don't exist, and you never claimed they existed, so it's google's problem.

!

Are the images called by your assorted pages using relative links? If so, both google and humans (that is, browsers with attached human) would be looking for

www.example.com/de/wp-content/uploads/image.gif
www.example.com/fr/wp-content/uploads/image.gif
www.example.com/sp/wp-content/uploads/image.gif

et cetera, and that is a problem. But the solution lies in fixing up the code, not in mass redirection.

atomMan

6:51 pm on Oct 19, 2011 (gmt 0)

10+ Year Member



roger that
i think some of the images are using relative paths, so i'll wash that up

thanks!

g1smd

7:40 pm on Oct 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Set up your redirect code to redirect only things that should be redirected.

When mangled URLs for other things are requested, ensure that you're not serving duplicate content, either redirect the request or serve a 404 error.