I just added some new search options into my site... made some names searchable. Looks like Google is not handling them the way I want. Lets say the name is: Katinka Faragó... I am seeing this in my logs:
174.52.xx.xx - - [11/May/2010:03:06:53 -0600] "GET /Name/Katinka_Farag%C3%83%C6%92%C3%82%C2%B3.html HTTP/1.1" 200 28650 "-" "GSiteCrawler/v1.23 rev. 286 (http://gsitecrawler.com/)"
174.52.xx.xx - - [11/May/2010:03:31:27 -0600] "GET /Name/Katinka_Farag%C3%83%C6%92%C3%82%C2%B3.html HTTP/1.1" 200 28657 "-" "GSiteCrawler/v1.23 rev. 286 (http://gsitecrawler.com/)"
174.52.xx.xx - - [11/May/2010:05:52:11 -0600] "GET /Name/Katinka_Farag%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%9A%C3%82%C2%B3.html HTTP/1.1" 200 28657 "-" "GSiteCrawler/v1.23 rev. 286 (http://gsitecrawler.com/)"
174.52.xx.xx - - [11/May/2010:07:50:41 -0600] "GET /Name/Katinka_Farag%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B3.html HTTP/1.1" 200 28657 "-" "GSiteCrawler/v1.23 rev. 286 (http://gsitecrawler.com/)"
So that "ó" seems to be driving the Google crazy.
Right now, I am trying to add a quick substitution routine in my url... on this order:
$short_1s =~ s/à|á|â|ã|ä|å/a/g;
$short_1s =~ s/æ/ae/g;
etc.
Is this best? ANy ideas how to fix this better?
Thanks!
[edited by: lammert at 5:08 pm (utc) on May 11, 2010]
[edit reason] obscured IP address [/edit]