Forum Moderators: open

Message Too Old, No Replies

Google Increases Index Size Figure

Searching 3,307,998,701 web pages

         

takagi

6:38 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Searching 3,083,324,652 web pages

changed into

Searching 3,307,998,701 web pages

So again it is bigger than the recently updated AlltheWeb (Currently searching 3,151,743,117 web pages).

ciml

9:43 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had a bet on Google updating that figure within a week of ATW passing them. :-)

I think it's good to see a little competitive spirit between engines.

xcandyman

9:45 am on Aug 26, 2003 (gmt 0)

10+ Year Member



This was as predictable as me not getting number 1 position in the Google SERPS for the keyword "Google"

lazerzubb

9:48 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Always feels good to know you have controll of what they show on Google.com :)

bwelford

10:04 am on Aug 26, 2003 (gmt 0)

10+ Year Member



So why do I get a count of 3.48 billion web pages when I do a search for "the"?

Barry Welford

Getafix

9:35 am on Aug 31, 2003 (gmt 0)

10+ Year Member



So why do I get a count of 3.48 billion web pages when I do a search for "the"?

I get 5.2 billion.

Jakpot

1:54 pm on Aug 31, 2003 (gmt 0)

10+ Year Member



Lots of crap in those 3+ billion pages

MonkeeSage

2:03 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What with all the doto about droping index pages and such lately, mabye Google should change it to be a bit more accurate:

"Searching 3,307,998,701 web pages, losing 1,253,628,931 of them"

;)

Jordan

mil2k

2:06 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Searching 3,307,998,701 web pages, losing 1,253,628,931 of them"

ROFLMAO

g1smd

3:01 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What about all the Thai, Chinese, Japanese, Arabic, Cyrillic, Sanskit, and other foreign language pages that do not contain the word The at all.

There must be hundreds of millions of those too.

[edited by: g1smd at 5:42 pm (utc) on Aug. 31, 2003]

markus007

3:14 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This query returns 3.2 billion....

allinurl:a

vitaplease

3:25 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Part of the explanation of the high number of results for the "the" search, could be in the "unindexed" pages. Example: Pages protected by a robots.txt or noindex meta tag.

Such as a search for:

site:www.nyt.com +the

shows only the urls (even without "the" in the url - that probably came from the anchortext towards those pages)

[searchengineshowdown.com...]

2_much

4:29 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps why they added the "Supplemental Results"?

GranPops

6:06 pm on Aug 31, 2003 (gmt 0)

10+ Year Member



Not sure about the total but...............

for most of my KW, total has increased by approx. 50%

i.e.
500,000 has become 750,000

6 million has become 9 million

although one has gone from 240,000 to 946,000

Chndru

7:49 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess, i read somewhere last year that larry page (or was it sergey brin?) was saying their goal was to index 10 bil. pages by the end of 2003..Maybe it is coming true?

takagi

3:38 am on Sep 1, 2003 (gmt 0)

killroy

8:17 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, since on average it overestimates my page counts by a factor of ten, the 5 billion pages for the might mean it's got only 500 million indexed ;)))

SN

percentages

9:18 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



20+ years ago when I was a rookie programmer I would have been fired if I couldn't tell my boss how many records were in the database index.

Can anyone please explain how Google can't know the answer to such a simple question today?

killroy

10:03 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DITO, I've been wondering that. Only thing I can think of is that it doesn'T really live query the entire DB. PÜerhaps it has each PR level on separate servers, and starts with the top ones and works down until it has enough resutls to answer the query, and never finishes going through the whole dataset.

NS

claus

10:10 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think there's a few reasons, the first being that it's perhaps not one database in the strict sense. Second being that there are daily fluctuations. Third being that it's not good PR to publish a decreasing figure by accident due to common fluctuations - you'd like to wait until you can publish a large jump in size, although these things happen gradually.

/claus

4eyes

11:13 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is saying that 3 of my sites have 40-50% more pages in the index than actually exist (all static pages)

This is inflating the figures - whether by accident or design is anyone's guess.

fLaMiN

12:55 pm on Sep 1, 2003 (gmt 0)




i busted the firewall and hacked into the Google mainframe this morning, executed a few queries on their live database ..

SELECT * FROM *;

im still waiting for the query to finish ..

:)

karmov

6:34 pm on Sep 2, 2003 (gmt 0)

10+ Year Member



Ugh....

My PR took a small hit with the increase in index size. I've been told that this is natural and I can figure out why it makes sense, but still a bit of a bummer.

The march towards 10 billion scares me though. The index increased by 10% and I got taken down a notch in PR. My PR will be -3 if the index size triples :)

jilla

6:40 pm on Sep 2, 2003 (gmt 0)

10+ Year Member



I realize this is a basic question but here goes:

When I go to google and type in www.mydomain.com in the search box it will save something like 2 out of 149 results. Then I go down and click the "ommitted results" and I get a totally different number ie results 1-10 of 370 results... This is happening when I check any of my subdirectories ie abc.com/whatever/

Which is the number of pages of my domain ACTUALLY in google? Why the discrepancy of the 2 numbers? THanks.