Forum Moderators: open

Message Too Old, No Replies

Google Directory

How Old?

         

GodLikeLotus

2:59 pm on Mar 9, 2004 (gmt 0)

10+ Year Member



When was the Google Directory last updated?

tschild

3:09 pm on Mar 9, 2004 (gmt 0)

10+ Year Member



Category pages: in November of last year.

ODP descriptions/category links: more recently. Last week I saw a directory category link for a site that has only been in the ODP since mid-December.

Moondog

4:57 am on Mar 17, 2004 (gmt 0)

10+ Year Member



It looks like it has recently updated.

windharp

8:47 am on Mar 17, 2004 (gmt 0)

10+ Year Member



Yes it has, a category I created a few weeks ago is now visible in Google directory.

bull

9:11 am on Mar 17, 2004 (gmt 0)

10+ Year Member



The green PR bars however are not actual.
There are problems with international characters like ö, ä, ü etc. These are replaced by a '?' in the actual, updated Google directory, at least in the categories I watch. Anything to do with unicode migration?

tschild

11:08 am on Mar 17, 2004 (gmt 0)

10+ Year Member



The data seems to be not older that the RDF dump published on 25 February. I have looked at some sites that are new in the directory - they do have a nonzero PR bar. Non-ASCII chars in the World cats are OK, part of the non-ASCII chars in the English-language cats are not (a temporary effect related to the change of all the directory to UTF-8).

The Google server seems to choke on the "Pokémon Series" category in /Games/Video_Games/Roleplaying/P/

Eljaybe

5:19 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



Yes, finally! I'm seeing my updated listing in Google's directory! It's been updated on DMOZ since January! Maybe it's St. Patty's Day luck?

RFranzen

6:05 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



The Google Directory update seems to reflect ODP data as it was during the first week of March 2004. Note that neither the normal Google search nor the Directory search have been fully updated to reflect the new Directory contents.

example search phrase: wizardry "import files"
located: [directory.google.com...]

(Cool, I have a page with no page rank. Just imagine my mortification. ;) )

-- Rich

[edited by: RFranzen at 6:12 pm (utc) on Mar. 17, 2004]

IITian

6:07 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



It seems to have updated this week however one category that I track has vanished completely while all the others seem intact.

In the serps the category shows up but clicking it results in HTTP 500 Internal Server Error. What could be causing it? Can Google selectively remove categories, especially ones that are against 'bad' businesses?

<edit>Specific error code inserted</edit>

windharp

6:37 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



In theory they can, but they didn't do that before. You checked that the category still exists in DMOZ and has not moved elsewhere or been deleted? Has the category maybe special character (non english ones like öäüß) which are maybe not correctly interpreted by Google? Any idea what could prevent Google from showing the category on the technical side?

IITian

6:52 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



You checked that the category still exists in DMOZ and has not moved elsewhere or been deleted?

The category is
[dmoz.org...]

and it's still there in DMOZ.

Corresponding parent directory in Google is
[directory.google.com...]

and clicking on Allegedly Unethical Firms results in error.

yapuka

9:55 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



Probably because there's a subcat called 'Nestlé' there. Google has not managed the transition to special characters.

For examplen, [directory.google.com...] show the same 500 error, because there's a 'Pokémon' sub-cat there.

IITian

11:21 pm on Mar 17, 2004 (gmt 0)

10+ Year Member



Thanks yapuka, I was thinking the same.

RFranzen

2:35 pm on Mar 18, 2004 (gmt 0)

10+ Year Member



The non-ASCII-7 problem only occurs sometimes. For example:

[directory.google.com...]

dispays with no problem, even with the existence of a San_José subcategory.

Note that the entire ODP is converted to and delivered as UTF-8. The Google Directory is apparantly using UTF-8 for its World hierarchy, but it still attempts to deliver all other categories as ISO-8859-1. We did have some kinks in the conversion process, and these are ironed out as they are found. Google is likely reflecting our glitches as of early March plus having some of their own.

Hopefully they will update soon from a more recent RDF, and transition to 100% UTF-8.

-- Rich

bull

5:48 pm on Mar 18, 2004 (gmt 0)

10+ Year Member



open.thumbshots.org does not display the nonstandard chars correctly as well.

g1smd

10:35 pm on Mar 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the conversion of over 4 000 000 database entries, some 20 000 have some sort of oddity in the title or description. Some are the "wrong type of apostrophe", others are some sort of missed accented character, and some are just some odd [tab]s in an entry.

These are being worked on both by scripts and by manual editing. I am guessing that the next couple of RDF files might also be a bit out of whack too, but after that things should improve a bit. Google took their update from a file with some encoding errors in and they also might be misinterpreting some of the data in that file too.

The conversion of all data and all editing interfaces to UTF-8 is a big project, and has gone very well, but was bound to have a few glitches here and there. Just remember that the data has been entered by tens of thousands of different people over the course of many years, and some may have had their browser set to some "wierd" encoding when the data was entered, or may have used some non-standard characters.