| Google Directory search fails with diactrics encoding issue |
mfagan

msg:148223 | 2:06 am on Aug 11, 2003 (gmt 0) | Go to the Google Directory World category . [directory.google.com...] . Go to the German category [directory.google.com...] and search within that category. Then try the same in the French category [directory.google.com...] . The French one doesn't work at all.
|
g1smd

msg:148224 | 10:40 pm on Aug 11, 2003 (gmt 0) | The ODP switched over to UTF-8 in the RDF sometime back, but there have been some other minor encoding issues from time to time in the directory database itself. Current plans are to convert everything to UTF-8 and that has been ongoing with the server upgrades, and was started quite some time ago for some categories. There were a few mismatches in the accept character sets for some forms, and maybe some bad data found its way in somewhere. Additionally it isn't known how Google actually uses the data they get from the ODP. Maybe they don't start out afresh with each new RDF but simply spider the datafile for changes and apply them to their existing data. That could also account for things getting out of step. I don't know any of the technical details, but you might find some hints somewhere in the notes linked from [rodan.ncc.com...] etc.
|
|
|