Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

3+ billion pages indexed- why can't we see them all?



10:28 pm on Jan 11, 2003 (gmt 0)

3+ billion pages indexed - why do we only get to see a small percent of those?

When you do a search on Google for any term that returns over 1,000 results, why does Google only let you see seven or eight hundred results?

It dose not matter if the term you searched for returns 1,000 results or 10,000,000 results Google only lets you see a few hundred...Why?


10:39 pm on Jan 11, 2003 (gmt 0)

10+ Year Member

There used to be an option at the bottom of the search results to view the page with omitted results included.



10:43 pm on Jan 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

For one thing, in any ordered dataset - if you want to get the N-th element, the system has to go through N-1 elements in order to even figure out that N-th element is in fact the N-th.

In other words, Google's software has to go through 10M records every time you click refresh on the page that would display 10,000,000-10,000,010th results.

That's just one "final" step that is required after all index has been searched and the pages were ordered (by relevance or anything else for that matter). And this thing alone would consume all google's resources.

[edited by: bcc1234 at 10:45 pm (utc) on Jan. 11, 2003]


10:45 pm on Jan 11, 2003 (gmt 0)

That button is there for some specialized searches (never for normal searches) - and it usually only returns pages in the same domain that have the search term on them. such as links:yourdomain.com.

No I am talking about normal searches - Why can't we see all the results? Why do we only get to see a very small percent of the sites returned?

<added>Bcc1234 if what you said were accurate, how do we get any search results at all?</added>


10:52 pm on Jan 11, 2003 (gmt 0)

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

I believe the cut off is ~1000 results.

Why? Serverload? Never handled 3 Bill. dbs, so no idea really.


11:03 pm on Jan 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

I have no way of knowing Google's internals.
If I did - I would be rich :)

But I assume the way they generate serps by sending the query to many parallel boxes and then combining the result.
Let's say there are 5 parallel boxes that contain all index (the 3B pages), each box has 1/5 of the index.

The query goes to all 5 of them and they return 1,000 (or any other preset number) most relevant results from THEIR indeces.
So we get 5 lists from 5 different indeces.
After that, all 5 of them are combined and the final 1k results are sorted out from those records.

Why there is a limit? Well, it's easier to allocate memory for the list with "at most" set limit of records.

That way, if some of the 5 boxes' indeces did not have a single relevant page - it's just 4x1,000 or 3x1,000 etc.

On really specific terms it might be:
box 1 - 25 results
box 2 - 0 results
box 3 - 150 results
box 4 - 2 results
box 5 - 0 results

And the final list has 177 results.

But if the list is larger then it's truncated with the least relevant entries being left out.

I can't even imagine an efficient architecture that would allow to retrieve it all. After all, you would have to store it somewhere while it's being merged and served.


11:20 pm on Jan 11, 2003 (gmt 0)

Thanks bcc1234
That makes it a little more understandable :-)


11:27 pm on Jan 11, 2003 (gmt 0)

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

AV cuts off at 1K too, ATW cut off is at 4K.


11:30 pm on Jan 11, 2003 (gmt 0)

Yup I was just checking that out myself.
your right the cutoff is 1000 for google and AV - I hadn't gotten to the end on fast yet :)

Featured Threads

Hot Threads This Week

Hot Threads This Month