Welcome to WebmasterWorld Guest from 54.204.100.232

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Czech characters and rewrite rule

   
8:45 pm on Aug 9, 2013 (gmt 0)

5+ Year Member



Probably asked before but could not find a satisfying answer.

I am working with a Czech website. I cannot get .htacces to rewrite if I use Czech characters /diacritics. I can get it to work using normal letters. However, would Google and actual servers still recognize it when a Czech person would write the friendly URL in proper Czech (including diacritics) while all my rules are in normal letters?
1:29 am on Aug 10, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Where "normal" = ASCII? ;)

Ordinarily non-ASCII characters are escaped (converted to percent-encoding) before they're let loose on the internet. So when you think you've got a č * it may really be %C4%8D and that's what you need to code for.

Unless someone has made a serious blunder, search engines should recognize the URL as written. You can try searching for non-English words and see how some of your results lead to non-ASCII URLs.


* AHEM. That was mean to be a c with hacek (what unicode calls a "caron").
8:01 pm on Aug 17, 2013 (gmt 0)

5+ Year Member



Had a busy week, so a late reply. I checked around and noticed that indeed all major Czech websites use friendly URLs without diacritics, that is, no hacek and carka's either. So I guess its ok for me to do the same and not try to waste precious time to figure out a way of coding for the stuff without making errors. Since I am a Dutchman working in the Czech Republic and speaking lousy Czech, that is probably the best option anyway.
11:59 pm on Aug 17, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Follow-up:

When I saw this topic title in new posts, I immediately went over to my test site and added this group of experimental rules:

RewriteRule % http://www.example.com/escape.html [R=301,L]
RewriteRule [-] http://www.example.com/unescape.html [R=301,L]

RewriteCond %{QUERY_STRING} %
RewriteRule . http://www.example.com/escapequery.html? [R=301,L]
RewriteCond %{QUERY_STRING} [-]
RewriteRule . http://www.example.com/unescapequery.html? [R=301,L]

(If the Forums mangle the second rule in each pair, it's the character range from inverted exclamation mark to y-umlaut, i.e. the visible part of the Latin-1 range.)

I then tried requesting made-up URLs containing accented characters. With a plain-page request I ended up on "unescape.hmtl" meaning that when the request reaches htaccess it is in as-written (unescaped) form. But with an added query string, I ended up on "escapequery.html".

A more worrying detail is that when I reopened the "live" htaccess to add the query-string rules, the text editor saw fit to go into "Chinese (GB18030)" encoding, making each of my non-ASCII characters (two bytes in UTF-8) a single Chinese character. If I spend too long trying to understand this, I will get a headache.
6:32 pm on Aug 19, 2013 (gmt 0)

5+ Year Member



If I spend too long trying to understand this, I will get a headache.


I get a headache reading your post and trying to follow it! Interesting though. I will experiment with it more later this month, when the site is up and running. However, for the moment we will stick with the non-diacritic urls; I still have to get the site up and running in Czech, German, and English first. The Czech site is the main one, the DE and EN versions are sub-directories.
11:03 pm on Aug 19, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I will experiment with it more later this month, when the site is up and running.

Noooo! The time to experiment is now, but only on your test domain.