Forum Moderators: phranque

Message Too Old, No Replies

Language Redirection with URL Rewriting

         

seb2point0

10:39 pm on May 25, 2009 (gmt 0)

10+ Year Member



I'm trying to accomplish 3 things at once in htaccess and I'm having a hard time to tie them in together.

My site has one PHP page, index.php. It's a mini site with my contact info. You have to pass a lang variable to switch languages. Ex /index.php?lang=en, /index.php?lang=fr

So here's what I need. When someone hits mysite.com, Apache should detect accept-language and redirect the user to mysite.com/xx/ where xx is the language code with something like this


RewriteEngine on
RewriteBase /
RewriteRule ^([a-zA-Z0-9_-]+)$ index.php?lang=$1
RewriteRule ^([a-zA-Z0-9_-]+)/$ index.php?lang=$1

In other words you can't access /. You will always get redir'ed

Finally, there should be a default lang like this


ErrorDocument 406 /en/

So there's what I need. Really I just need help tying in the the language part with the rest. I read a few tutorials but they all used files like index.html.en, index.html.fr which is not at all my case.

g1smd

11:33 pm on May 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



None of that code above is for a redirect,

They are both rewrites. Each rule should be followed by [L] unless you know the exact reason why it should not be included (almost never).

A redirect would also include the domain name in the redirect target and have the [R=301] flag too.

.

Links on pages are what define URLs. A URL is seen to 'exist' as soon as you create a link on a page with something in the 'href' part of it. It is that entry that is stored in a SE links database.

Only when that link is clicked by a user, or when the SE bot requests that URL directly, and the request hits the server, do we actually get to find out whether it returns 200, 301, 302, 307, 403, 404, 503, or some other status code within the HTTP Header. If the URL is blocked by robots.txt the status will forever remain unknown to the bot. That's because it will not be able to send that URL request to the server.

Only a URL returning '200 OK' will be indexed with content. Other status codes will force the bot to do other things.

A redirect tells the browser requesting the old URL to make a new request for a new URL.

A rewrite connects an external URL request with an internal filepath/file; one that is different to that initially suggested by the path part of the original URL request, without revealing what the internal path and file actually are.

.

So, do you really want a rewrite, or a redirect, here?

jdMorgan

3:28 am on May 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest that you add the capability to let the user select a language on your home page and set a cookie with their preference in it.

Then change your mod_rewrite code to check the cookie first, and only if the cookie is absent use the Accept-Language header, since the latter is a *browser configuration* preference, and not necessarily a personal preference.


RewriteEngine on
# If "lang" cookie is set, use it to get language preference
RewriteCond %{HTTP_COOKIE} ^lang=([a-z]{2,3})[-,;]? [OR]
# else use first language in HTTP Accept-Language request header
RewriteCond %{HTTP:Accept-Language} ^([a-z]{2,3})[-,;]?
# Rewrite home page requests to script, passing language preference in query string
RewriteRule ^$ /index.php?lang=%1 [L]

If this reasoning is not clear, think of a computer in a Swiss internet cafe -- Is it set for French, German, or Italian? And did the user just re-set it, or was it set by the cafe owner when he installed the machine two years ago, and never changed since? There is a good reason for the "row of little flags" or "clickable world map" you see on international sites; It is not the case that they don't know about the Accept-Language header, it's that they know it cannot be counted on to reflect their visitors' true preferences...

Note that this code rewrites "home page" requests only: I assume that all other pages will be handled within your script using the saved cookie value or the passed "lang=" value. Otherwise, you'll likely want to match all pages in the rewriterule pattern, and pass the page URL-path to your script as an additional query string parameter. And do note that I said "pages" there, as you likely won't want to rewrite image requests or requests for 'special' files like robots.txt and sitemap.xml to your script...

I also suggest that you add a rule for known search engine robot user-agents, pointing them to a pre-selected language of your choice without using cookies or the Accept-Language header.

Jim

seb2point0

5:50 am on May 26, 2009 (gmt 0)

10+ Year Member



Thanks for the replies.

@g1smd I'm sorry, I used the wrong terminology. I do know the difference between a rewrite and a redirect and their status codes. Stupid me.

@jdMorgan I do understand your reasoning but the site is quite basic (there is only one page) and I offer the possibility to quickly change the lang. Personally, I'm not a huge fan of the 'row of flags' I prefer using the default language and then letting the user change it if need be. I can understand for a large site but I don't think it's necessary here.

I also don't think I need to handle cookies since 90% of my visitors will come from 3 countries with very distinct languages (en or fr). For the other 10%, if they don't speak any of those languages or can't find the language menu, I don't want them there anyway lol.

Simply, by hitting the root, htaccess should do a 301 redirect to /index.php?lang=[ClientLang] where it rewrites the URL to /[ClientLang/]. And yes, robots and sitemap requests should NOT be rewritten or redirected. And if the client does not use any of the languages (404 will be sent by php with header('status:404'), send to english.

Thanks

[edited by: jdMorgan at 2:08 pm (utc) on May 26, 2009]
[edit reason] No URLs, please. See TOS. [/edit]

seb2point0

11:13 am on May 26, 2009 (gmt 0)

10+ Year Member



I've managed to get what I want working with the following code but I don't know if it's efficient. what are your thought. Ideally, I would like 404 to do a 301 to /en/. right now it works but still shows the bad URL.

Options +FollowSymLinks

RewriteEngine on

RewriteBase /
RewriteRule ^([a-zA-Z0-9_-]+)$ seb.php?lang=$1
RewriteRule ^([a-zA-Z0-9_-]+)/$ seb.php?lang=$1

RewriteCond %{HTTP:Accept-Language} ^fr [NC]
RewriteRule ^$ /fr/ [L,R=301]

RewriteCond %{HTTP:Accept-Language} ^en [NC]
RewriteRule ^$ /en/ [L,R=301]

#For every other language (including English :)) use English
RewriteRule ^$ /en/ [L,R=301]

ErrorDocument 404 /en/

jdMorgan

1:56 pm on May 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Simplify and combine, making use of the power of regular expressions:

RewriteEngine on
#
# Use accept-language to redirect to subdirectory, default to english if no match
RewriteCond %{HTTP:Accept-Language} ^(frĶen)[-,;]? [NC,OR]
RewriteCond en ^(en)$
RewriteRule ^$ http://example.com/%1/ [R=301,L]
#
RewriteRule ^([a-z0-9_-]+)/?$ seb.php?lang=$1 [NC,L]

As before, replace the broken pipe "Ķ" character with a solid pipe before use.

Every change above is important and/or intentional.

Jim

[edited by: jdMorgan at 2:08 pm (utc) on May 26, 2009]

seb2point0

2:53 pm on May 26, 2009 (gmt 0)

10+ Year Member



Thanks Works great .. and I learned a lot too

g1smd

7:26 pm on May 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Up above I said:
Each rule should be followed by [L] unless you know the exact reason why it should not be included (almost never).

A redirect would also include the domain name in the redirect target and have the [R=301,L] flag too.

which jd has included in his example.