Forum Moderators: phranque

Message Too Old, No Replies

php URL masking/shifting

         

t1000

2:28 am on Jan 21, 2009 (gmt 0)

10+ Year Member


Hi all:

Can't seem to figure this outˇK

My site, for example, www.site.com is configured to do multi-language with gettext and PoEdit (.po, .mo) files. So all multi-language links are something like:

www.site.com/?locale=de_DE
www.site.com/products/index.php?locale=de_DE
www.site.com/?locale=zh_CN
www.site.com/products/index.php?locale=zh_CN

now, what I want to do is to make the above two url to LOOK LIKE (while it's actually really running the above links):

www.site.com/de_DE/
www.site.com/de_DE/products/
www.site.com/zh_CN/
www.site.com/zh_CN/products/

I emphasize on LOOK LIKE because I want images with relative paths in the html to work. For example <img src="images/title.png" /> to look for the image without the de_DE added to the path.

I tried in .htaccess:

<FilesMatch "^(de_DE¦zh_CN)">
ForceType application/x-httpd-php
</FilesMatch>

and made a de_DE and a zh_CN file to include the target php files. It nearly worked, but it's actually running the de_DE (zh_CN) file, so the de_DE (zh_CN) is added to the path of the images with relative paths, causing it to not find the images (and the solution was messy too).

How should I do this?

Any help would be appreciated, thanx in advance.

t1000

3:56 am on Jan 21, 2009 (gmt 0)

10+ Year Member



realized I didn't make it clear enough

instead of making the url do the LOOK LIKE I mentioned above, I should say that I would like

'www.site.com/zh_CN/products/' to actually run 'www.site.com/products/?locale=zh_TW'

'www.site.com/zh_CN/products/index.php' to actually run 'www.site.com/products/index.php?locale=zh_TW'

'www.site.com/zh_CN/' to actually run 'www.site.com/?locale=zh_TW'

'?locale=zh_TW' can still work, no need to do redirect or anything.

jdMorgan

4:19 am on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You have only one choice as to what the URL looks like: It will look like the URL. The URL is defined by whatever you use in the links on your pages.

Once a URL in a link is clicked and is requested from your server, you can configure the server (in httpd.cof, conf.d, or .htaccess, etc.) to deliver the content *associated* with that URL from any place in the server's filesystem that you choose.

So links on pages define URLs, and the structure and names you choose define the files and directories inside the server. Once a URL is defined on a page you cannot "change it" and you cannot make it "look like" something else.

To further clarify, URLs exist on the Web, and are defined by Web pages. Files and filenames exist inside the server. The server's primary and fundamental purpose is to translate Web requests for URLs into server operating system requests for files (including scripts). So the server sits at the boundary between the Web-world and the operating system world, and translates between the two. And its primary function is to allow a "universal" and uniform resource-location (U.R.L.) system to exist that is independent of the server software, operating system, and filesystem architecture inside the server. This allows URLs to "work" regardless of whether the server is Apache, IIS, or something else, and whether the operating system is Unix, Linux, HP-UX, Solaris, Windows Server, or anything else.

As a result, changing your URL involves changing the source code of the page that defines that URL (or changing the code of the script that produces the page that defines that URL). Once the URL 'looks like' what you want it to look like, then you configure your server to locate the proper file or script to serve the proper content in response to requests for that URL.

Jim

jdMorgan

4:25 am on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



'?locale=zh_TW' can still work, no need to do redirect or anything.

Actually, there *is* a need to redirect that -- You should redirect such old URLs to the new 'friendly' ones. Otherwise, you will have a duplicate-content problem; two URLs returning the same content, and essentially competing with each other for ranking.

This thread [webmasterworld.com] from our forum library describes all three steps of switching to 'friendly' URLs: The link URL changes, the URL-to-filename rewrites, and the old-URL-to-new URL fixup redirects to speed search engine listing updates and prevent duplicate content.

Jim

t1000

11:53 am on Jan 21, 2009 (gmt 0)

10+ Year Member



jdMorgan:

Thank you for replying. I now have a better idea in the direction I should be heading. I suck at mod_rewrite though, here is my feeble failed attemptˇK

Below line seem to make 'www.example.com/zh_CN/index.php' and 'www.example.com/zh_CN/products/test/' work:
RewriteRule ^(tw¦cn¦de)/(.*) $2?locale=$1 [L]

however, I failed to redirect all '...?locale=zh_CN' to '/zh_CN/...'
tried it with:
RewriteRule ^(.*)\?locale=(.*) http://www.example.com/$2/$1/index\.php? [R=301,L]

but failedˇK

What did I do wrong?

Again, any help from anyone would be great. Thanx!

[edited by: jdMorgan at 1:54 pm (utc) on Jan. 21, 2009]
[edit reason] Please use example.com [/edit]

jdMorgan

1:53 pm on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need two rules, as described and shown in the thread I referenced: One to internally rewrite friendly URL requests to your script, and the other to redirect only direct client requests for old script URLs to the new friendly URLs.

Also it's not clear whether your first rule worked; I would guess it did not, because the pattern does not match the requested URL (the "_CN" is missing, for example).

I suggest that you get the first rule working completely before trying to add or debug the second rule.

Jim

t1000

2:27 pm on Jan 21, 2009 (gmt 0)

10+ Year Member



Oops! I copied an old rule, sorry about that.

RewriteRule ^(zh_TW¦zh_CN¦de_DE)/(.*) $2?locale=$1 [L]

The above is the one that's working (at least to my limited knowledge of the mod_rewrite).

Trying to read up on regular expression and mod_rewrite still.

jdMorgan

2:31 pm on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, now take a look at the cited thread, and note that you must use a rule with RewriteCond %{THE_REQUEST} to accomplish the script-URL-to-friendly URL redirect *only* when the script URL is directly-requested by a client (browser or robot). Without this check, the redirect would be applied to the output of your internal rewrite, and you'd get an 'infinite' rewrite/redirect loop.

Jim

t1000

3:43 pm on Jan 21, 2009 (gmt 0)

10+ Year Member



I did looked at the threat, but couldn't figure it out yet.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?product=([^&]+)&color=([^&]+)&size=([^&]+)&texture=([^&]+)&maker=([^\ ]+)\ HTTP/

'RewriteCond %{THE_REQUEST}' I understood
'/index\.php\?product=([^&]+)&color=([^&]+)&size=([^&]+)&texture=([^&]+)&maker=([^\ ]+)' I understand too

These two are what I don't get:
'^[A-Z]{3,9}' - truly no idea what this is doing here… ^-start of line, [A-Z]-one char between capital A-Z, {3,9}- no idea…
'\ HTTP/' - is this to limit only from http request?

Tried to google some guidance, but the examples I saw were all similar to 'RewriteCond %{THE_REQUEST} /index.php\?' only. Only one part after the REQUEST.

jdMorgan

5:10 pm on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The %{THE_REQUEST} variable contains the HTTP request sent by the client, exactly as it appears in your raw server access logs:
GET /products/?locale=zh_TW HTTP/1.1

So the [A-Z]+ matches the the HTTP methods (GET, HEAD, POST, etc.), then we have the URL-path and any appended query string, and finally the HTTP version.

So in your case, to match that request and create the %1 back-reference for use in the RewriteRule substitution URL, you might use


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/(index\.php)?\?([^&]*&)*locale=([^&\ ]+)[^\ ]*\ HTTP/
RewriteRule ^products/(index\.php)?$ http://www.example.com/%2/products/? [R=301,L]

Combined with the other rule, we now rewrite requests for /products/<language>/ to /products/index.php?locale=<language>, and if the client browser or robot requests /products/?locale=<language> directly, we redirect that back to /products/<language>/ to speed up search engines' updating to the new URL, recover the PageRank and link-popularity of the old URLs, and to preserve old incoming links and user bookmarks. However, since we require that the old URL be directly requested by the HTTP client, we avoid having the two rules interfere with other, causing a loop.

This rule will work with appended query strings which contain "locale=". If additional name/value pairs precede or follow "locale=" the rule will still be invoked, but those name/value pairs will be dropped. This may be helpful if someone links to you with intentionally-bogus extra name-value pairs in the query string.

You may want to add additional rules to handle the case where "/products/index.php" or "/products/" is requested without the "locale=" name/value in the appended query string, or with a blank value for locale. Search engines and unfriendly competitors may try those invalid URLs, so it would be wise to handle them in some known way.

Note that in both the RewriteCond and RewriteRule above, the "index.php" is optional, so that both versions of the old URLs are detected and redirected. This will further reduce your duplicate-content problem.

Jim

t1000

12:09 am on Jan 22, 2009 (gmt 0)

10+ Year Member



Jim:

So we're parsing against 'GET /products/?locale=zh_TW HTTP/1.1'. That makes a lot more sense now.

It's still now working though…

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/(index\.php)?\?([^&]*&)*locale=([^&\ ]+)[^\ ]*\ HTTP/
RewriteRule ^products/(index\.php)?$ http://www.example.com/%2/products/? [R=301,L]

I tried www.site.com/products/test/index.php?locale=zh_CN

and no redirection is happening

Also, I am trying to redirect basically all pages (all pages have translations), not just the ones under /products

so basically:

www.site.com/...........?locale=<language> becomes www.site.com/<language>/...........

so I tried this first with index.php:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php)?\?([^&]*&)*locale=([^&\ ]+)[^\ ]*\ HTTP/
RewriteRule ^/(index\.php)?$ [site.com...] [R=301,L]

um… not working… :(

And you're right, I shouldn't use the index.php. Assuming if the previous lines worked, then it should have been:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ \?([^&]*&)*locale=([^&\ ]+)[^\ ]*\ HTTP/
RewriteRule ^?$ [site.com...] [R=301,L]

Not sure…

Also, I don't quite get the RewriteRule part. I thought it works like this RewriteRule <a> <b> where <a> is replaced by <b>. From the example you gave me, it seems to be different when using with RewriteCond.

Steve

jdMorgan

1:44 am on Jan 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no "/test" after "products" in the RewriteCond and RewriteRule patterns, so I'm not sure why you'd expect that URL to be redirected...

The patterns must exactly match the requested URL-path, or the rule won't run.

Please nail down exactly what URLs you want to rewrite and redirect, and then I'll be glad to further assist.

Jim

t1000

12:33 pm on Jan 22, 2009 (gmt 0)

10+ Year Member



Jim:

Sorry, didn't make myself clear. Basically, the site is with regular php files, and each and every single page would have translations through gettext(). All pages are using the english version as a template, and with the ?locale=<language> (always at the end), php would just substitute all text into another language.

Instead of having to see the ?locale=<language> at the back, I want to make is so that just by adding <language> behind the www.site.com/ and append /<whatever at the end of the url>.

Like: www.site.com/<something>?locale=<language> => www.site.com/<language>/<something>

For example:
www.site.com/?locale=<language> => www.site.com/<language>/

www.site.com/<language>/about.php?locale=<language> => www.site.com/<language>/about.php

www.site.com/<language>/products/?locale=<language> => www.site.com/<language>/products/

www.site.com/<language>/about/manufacture.php?locale=<language> => www.site.com/<language>/about/manufacture.php

www.site.com/<language>/about/?locale=<language> => www.site.com/<language>/about/

www.site.com/<language>/products/index.php?locale=<language> => www.site.com/<language>/products/index.php

www.site.com/<language>/products/spec.php?locale=<language> => www.site.com/<language>/products/spec.php

www.site.com/<language>/products/test/index.php?locale=<language> => www.site.com/<language>/products/test/index.php

In other words, take the whole url, remove ‘locale=<language>’, add ‘<language>/’ to behind www.site.com/.

Thanx so much for your patience.

Steve

jdMorgan

3:38 pm on Jan 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> In other words, take the whole url, remove ‘locale=<language>’, add ‘<language>/’ to behind www.site.com/.

OK, perhaps something like this:


# Internally rewrite new friendly URLs to script with locale as query string
RewriteRule ^(zh_TW¦zh_CN¦de_DE)/(.*) $2?locale=$1 [L]
#
# Externally redirect direct client request for old query string URLs to new friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?locale=(zh_TW¦zh_CN¦de_DE)\ HTTP/
RewriteRule (.*) http://www.example.com/%1/$1? [R=301,L]

Replace the broken pipe "¦" characters with solid pipes before use.

Jim

t1000

1:57 pm on Jan 23, 2009 (gmt 0)

10+ Year Member



Jim:

Thanx for the help! It's working.

if you don't mind, I would like to ask a few more questions:

RewriteRule (.*) http://www.example.com/%1/$1? [R=301,L]

can we just totally remove http://www.example.com?

Why the www.example.com?

Also, what would you suggest for readings with these RewriteCond, RewriteRule? I read the links you sent me. It's laid out the big picture nicely. Do you have any more links that goes into the nitty-gritty parts? The Apache ones are too cryptic to me. I'm look for something with lots of examples.

Thanx again.

Steve

jdMorgan

5:12 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> can we just totally remove http://www.example.com?

I strongly recommend that you DO NOT remove it. If you do remove it, then Apache will "guess" about the correct domain name, and may choose the wrong one. It's not actually a guess, but Apache will use the canonical name of the server as defined by the server configuration file, instead of your preferred domain name. So, you may end up redirecting to the non-www, for example, when your links all point to "www".

So, I suggest putting the domain name that you use for the majority of your links in the RewriteRule substitution URL.

You did not say *why* you feel a need to remove it, so I can't answer in more detail.

Mod_rewrite *is* cryptic and complicated, so the documentation reflects this. There are several books about it available from places like Amazon.com, and of course, we have loads of threads here in the forum, and some tutorials in our Library.

Jim

t1000

9:38 am on Jan 24, 2009 (gmt 0)

10+ Year Member



the reason is that I have three computers of the same site

1. dev.sit.com on my MacBook Pro for development (through httpd.conf and host, that address works only on my computer)

2. ###.###.###.###:9000 on a test server at work for colleagues to test (sync with MacBook Pro through svn)

3. and the real www.site.com

Instead of remembering to modify .htaccess every time I sync or upload, I'm trying to find a setting that would work for all three computers.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?locale=(tw¦cn¦de)\ HTTP/
RewriteRule (.*) %{HTTP_HOST}/%1/$1? [R=301,L]

(Oh, I simplified the language extension to just 2 letters)

Would that work? I tried it. Had a hiccup after I saved the .htaccess (language switching stopped working for a few clicks until I switched to the root dev.site.com/tw/), but seems to work now (on my dev MacBook Pro, no access to machine 2 until the Chinese New Year is over, and really don't want to try on the real site yet). Is there something bad with this I haven't seen yet? That initial hiccup really got me nervous.

t1000

4:02 pm on Jan 24, 2009 (gmt 0)

10+ Year Member



Jim:

I further modified the command to:

RewriteRule ^([a-zA-Z]{2})/(.*) $2?locale=$1 [L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?locale=([a-zA-Z]{2})/\ HTTP/
RewriteRule (.*) %{HTTP_HOST}/%1/$1? [R=301,L]

This would also eliminate the need to add language extension every time. Again, tested on my MacBook Pro only, seems to work. What do you think? Looks good?

Thank you so much for your help. Through the conversation with you, I've learned so much. Reading related articles seems much easier now then before.