homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Czech characters and rewrite rule
Robo




msg:4600754
 8:45 pm on Aug 9, 2013 (gmt 0)

Probably asked before but could not find a satisfying answer.

I am working with a Czech website. I cannot get .htacces to rewrite if I use Czech characters /diacritics. I can get it to work using normal letters. However, would Google and actual servers still recognize it when a Czech person would write the friendly URL in proper Czech (including diacritics) while all my rules are in normal letters?

 

lucy24




msg:4600801
 1:29 am on Aug 10, 2013 (gmt 0)

Where "normal" = ASCII? ;)

Ordinarily non-ASCII characters are escaped (converted to percent-encoding) before they're let loose on the internet. So when you think you've got a č * it may really be %C4%8D and that's what you need to code for.

Unless someone has made a serious blunder, search engines should recognize the URL as written. You can try searching for non-English words and see how some of your results lead to non-ASCII URLs.


* AHEM. That was mean to be a c with hacek (what unicode calls a "caron").

Robo




msg:4602856
 8:01 pm on Aug 17, 2013 (gmt 0)

Had a busy week, so a late reply. I checked around and noticed that indeed all major Czech websites use friendly URLs without diacritics, that is, no hacek and carka's either. So I guess its ok for me to do the same and not try to waste precious time to figure out a way of coding for the stuff without making errors. Since I am a Dutchman working in the Czech Republic and speaking lousy Czech, that is probably the best option anyway.

lucy24




msg:4602889
 11:59 pm on Aug 17, 2013 (gmt 0)

Follow-up:

When I saw this topic title in new posts, I immediately went over to my test site and added this group of experimental rules:

RewriteRule % http://www.example.com/escape.html [R=301,L]
RewriteRule [-] http://www.example.com/unescape.html [R=301,L]

RewriteCond %{QUERY_STRING} %
RewriteRule . http://www.example.com/escapequery.html? [R=301,L]
RewriteCond %{QUERY_STRING} [-]
RewriteRule . http://www.example.com/unescapequery.html? [R=301,L]

(If the Forums mangle the second rule in each pair, it's the character range from inverted exclamation mark to y-umlaut, i.e. the visible part of the Latin-1 range.)

I then tried requesting made-up URLs containing accented characters. With a plain-page request I ended up on "unescape.hmtl" meaning that when the request reaches htaccess it is in as-written (unescaped) form. But with an added query string, I ended up on "escapequery.html".

A more worrying detail is that when I reopened the "live" htaccess to add the query-string rules, the text editor saw fit to go into "Chinese (GB18030)" encoding, making each of my non-ASCII characters (two bytes in UTF-8) a single Chinese character. If I spend too long trying to understand this, I will get a headache.

Robo




msg:4603360
 6:32 pm on Aug 19, 2013 (gmt 0)

If I spend too long trying to understand this, I will get a headache.


I get a headache reading your post and trying to follow it! Interesting though. I will experiment with it more later this month, when the site is up and running. However, for the moment we will stick with the non-diacritic urls; I still have to get the site up and running in Czech, German, and English first. The Czech site is the main one, the DE and EN versions are sub-directories.

g1smd




msg:4603434
 11:03 pm on Aug 19, 2013 (gmt 0)

I will experiment with it more later this month, when the site is up and running.

Noooo! The time to experiment is now, but only on your test domain.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved