Forum Moderators: phranque

Message Too Old, No Replies

German Umlauts in urls

         

xcomm

11:07 am on Jul 26, 2004 (gmt 0)

10+ Year Member



Hi All,

I made the failure to use some German Umlauts in urls about some German song lyrics. This works ok under Unix and e.g. with GoogleBot. But with the dumber bots from Yahoo and MSN I got a lot of 404's.

Why are they fail to the utf8 encoding when the webserver sends them a Content-Type: text/html; charset=ISO-8859-1?

I there any idea despite avoid umlauts - e.g. has someone used rewriting for solving this?

Thank you in advance! Jan

jdMorgan

1:10 pm on Jul 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jan,

If necessary, you could use mod_rewrite's RewriteMap function in httpd.conf context (only) to provide a reverse map for these requests. I'm not a character-set expert, but it should be possible to identify the encoding system they are trying to use, and to reverse its effects when you receive a request from a dumb bot.

It appears that the character code used in links on your pages is ISO/IEC 10646-1:1993(E), while the requested URLs that fail are ISO-8859-1. That is, ISO/IEC 10646-1:1993(E) "%F6" = ISO-8859-1 "\xb6" -- Two different codings for a lowercase "o" with umlaut.

Internationalization summary [uni-giessen.de]

Jim