Forum Moderators: open

Message Too Old, No Replies

Google - Searching 2,469,940,685 web pages

         

Brett_Tabke

6:13 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



New numbers on home page:

Google - Searching 2,469,940,685 web pages

jeremy goodrich

6:50 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Before I remember reading that they used the links / text in the links to make the number bigger - and they had only fully indexed the contents of most of the pages in the db, but knew the location of more pages -

thus giving the bigger number.

I just tried poking around their site, to see if they said how many they had fully indexed of these 2.5 billion (wow that's a lot), but there's no mention I could find.

Anyone?

rubble88

9:01 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



You can find a bit of analysis on Google's number counting here:
[searchengineshowdown.com ]

Also, AllTheWeb currently shows a total size of 2,112,188,990 pages. This is up a bit from its June 17th announcement of 2,095,568,809. The annoucement focused on how ATW had "dethroned" Google as the "world's largest search engine. Today, Google goes back to number one with the 2,469,940,685 number. Will AllTheWeb up its number soon? Will another engine join (Inktomi) in on the fun?

jeremy goodrich

9:26 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, this is what I was looking for:

[google.com...]

See #4 there?

The Google index contains two types of pages--fully indexed and partially indexed pages. Your page is currently partially indexed, which means that although we know about your site, our robots have not read all the content on your page(s) in past crawls.

Took me a minute to find that! There was a better description I read before, but basically, they include in their counting pages they only have found links to, but have never indexed before.

This probably makes it a bit easier to 'build' a bigger index, because they haven't actually gone and parsed a few hundred million of those pages :)

brotherhood of LAN

7:58 am on Aug 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>#4

I bet that will go for all the robots "noindex" and whatever else the bot finds.

Actually, remember a few posts a while back about G not obeying robots.txt?? hmmm- if that was true then you have to wonder if their bigger index really has any substance. Anyway- it was exactly 2.1 billion something or other for months!

vitaplease

9:54 am on Aug 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>#4

would that explain the Google results number of 2,790,000,000 > 2,469,940,685
on a search for "the" ?

whats up skip

10:05 am on Aug 9, 2002 (gmt 0)

10+ Year Member



There is a big problem with using the "the" test.

Might work for English language pages, but what about the rest.

That is going to account for the missing millions of pages.

The bottom line is:

How useful the search results are - Google kills ATW in MHO.

How many people use the search engine - Again Google is king!

vitaplease

11:11 am on Aug 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"That is going to account for the missing millions of pages"

whats_up_skip,

the point I was trying to make is that there are already 300 million more (> ) pages than Google claims to index for a search on "the" only. (not the other way around hence "missing").

BTW, Alltheweb search for "the" 1,046,066,940

And yes the Spanish, German, Chinese, etc equivalents for high occuring words such as "the" are not even included, although the numbers Google shows for these words are very limited.

"y" in Spanish only sites for Google: 6,090,000.
"y" in Spanish only sites for Alltheweb: 29,092,521

Abrexa_UK

3:05 pm on Aug 9, 2002 (gmt 0)

10+ Year Member



Rubble88
I thought that Openfind still claimed the largest index, with 3.5 billion?
Or is it that they have cleverly worded it to sound like that but in fact have a much smaller database?

jamesf4218

3:30 pm on Aug 9, 2002 (gmt 0)

10+ Year Member



Google may have 2,469,940,685 pages in its database but only about 0.00001% of them are useful or worthwhile.

It has been said that if you kept a certain number of monkeys in a room for an infinite amount of time and gave them all a typewriter, eventually they would reproduce the entire works of Shakespeare.

Now, thanks to the Internet, we know that's not true...

Beachboy

8:44 pm on Aug 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would like to wisecrack that the problem with the monkeys on the typewriters is: Finding enough working typewriters and a sufficiently large supply of ribbons. Old, abandoned technology. But my attempts at humor are frequently deleted, so.... Expect this to vanish any moment....

Hemsell

4:19 am on Aug 10, 2002 (gmt 0)

10+ Year Member



It has been said that if you kept a certain number of monkeys in a room for an infinite amount of time and gave them all a typewriter, eventually they would reproduce the entire works of Shakespeare.

Now, thanks to the Internet, we know that's not true...

I just want you to know that I am stealing that line. It is the funniest thing i have read in a long long time.

Todd the Plagiarist