Forum Moderators: open
The Google server seems to choke on the "Pokémon Series" category in /Games/Video_Games/Roleplaying/P/
example search phrase: wizardry "import files"
located: [directory.google.com...]
(Cool, I have a page with no page rank. Just imagine my mortification. ;) )
-- Rich
[edited by: RFranzen at 6:12 pm (utc) on Mar. 17, 2004]
In the serps the category shows up but clicking it results in HTTP 500 Internal Server Error. What could be causing it? Can Google selectively remove categories, especially ones that are against 'bad' businesses?
<edit>Specific error code inserted</edit>
You checked that the category still exists in DMOZ and has not moved elsewhere or been deleted?
The category is
[dmoz.org...]
and it's still there in DMOZ.
Corresponding parent directory in Google is
[directory.google.com...]
and clicking on Allegedly Unethical Firms results in error.
For examplen, [directory.google.com...] show the same 500 error, because there's a 'Pokémon' sub-cat there.
[directory.google.com...]
dispays with no problem, even with the existence of a San_José subcategory.
Note that the entire ODP is converted to and delivered as UTF-8. The Google Directory is apparantly using UTF-8 for its World hierarchy, but it still attempts to deliver all other categories as ISO-8859-1. We did have some kinks in the conversion process, and these are ironed out as they are found. Google is likely reflecting our glitches as of early March plus having some of their own.
Hopefully they will update soon from a more recent RDF, and transition to 100% UTF-8.
-- Rich
These are being worked on both by scripts and by manual editing. I am guessing that the next couple of RDF files might also be a bit out of whack too, but after that things should improve a bit. Google took their update from a file with some encoding errors in and they also might be misinterpreting some of the data in that file too.
The conversion of all data and all editing interfaces to UTF-8 is a big project, and has gone very well, but was bound to have a few glitches here and there. Just remember that the data has been entered by tens of thousands of different people over the course of many years, and some may have had their browser set to some "wierd" encoding when the data was entered, or may have used some non-standard characters.