Welcome to WebmasterWorld Guest from 54.146.239.96

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Czech characters and rewrite rule

     

Robo

8:45 pm on Aug 9, 2013 (gmt 0)

5+ Year Member



Probably asked before but could not find a satisfying answer.

I am working with a Czech website. I cannot get .htacces to rewrite if I use Czech characters /diacritics. I can get it to work using normal letters. However, would Google and actual servers still recognize it when a Czech person would write the friendly URL in proper Czech (including diacritics) while all my rules are in normal letters?

lucy24

1:29 am on Aug 10, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Where "normal" = ASCII? ;)

Ordinarily non-ASCII characters are escaped (converted to percent-encoding) before they're let loose on the internet. So when you think you've got a č * it may really be %C4%8D and that's what you need to code for.

Unless someone has made a serious blunder, search engines should recognize the URL as written. You can try searching for non-English words and see how some of your results lead to non-ASCII URLs.


* AHEM. That was mean to be a c with hacek (what unicode calls a "caron").

Robo

8:01 pm on Aug 17, 2013 (gmt 0)

5+ Year Member



Had a busy week, so a late reply. I checked around and noticed that indeed all major Czech websites use friendly URLs without diacritics, that is, no hacek and carka's either. So I guess its ok for me to do the same and not try to waste precious time to figure out a way of coding for the stuff without making errors. Since I am a Dutchman working in the Czech Republic and speaking lousy Czech, that is probably the best option anyway.

lucy24

11:59 pm on Aug 17, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Follow-up:

When I saw this topic title in new posts, I immediately went over to my test site and added this group of experimental rules:

RewriteRule % http://www.example.com/escape.html [R=301,L]
RewriteRule [-] http://www.example.com/unescape.html [R=301,L]

RewriteCond %{QUERY_STRING} %
RewriteRule . http://www.example.com/escapequery.html? [R=301,L]
RewriteCond %{QUERY_STRING} [-]
RewriteRule . http://www.example.com/unescapequery.html? [R=301,L]

(If the Forums mangle the second rule in each pair, it's the character range from inverted exclamation mark to y-umlaut, i.e. the visible part of the Latin-1 range.)

I then tried requesting made-up URLs containing accented characters. With a plain-page request I ended up on "unescape.hmtl" meaning that when the request reaches htaccess it is in as-written (unescaped) form. But with an added query string, I ended up on "escapequery.html".

A more worrying detail is that when I reopened the "live" htaccess to add the query-string rules, the text editor saw fit to go into "Chinese (GB18030)" encoding, making each of my non-ASCII characters (two bytes in UTF-8) a single Chinese character. If I spend too long trying to understand this, I will get a headache.

Robo

6:32 pm on Aug 19, 2013 (gmt 0)

5+ Year Member



If I spend too long trying to understand this, I will get a headache.


I get a headache reading your post and trying to follow it! Interesting though. I will experiment with it more later this month, when the site is up and running. However, for the moment we will stick with the non-diacritic urls; I still have to get the site up and running in Czech, German, and English first. The Czech site is the main one, the DE and EN versions are sub-directories.

g1smd

11:03 pm on Aug 19, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I will experiment with it more later this month, when the site is up and running.

Noooo! The time to experiment is now, but only on your test domain.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month