nativenewyorker

msg:51267 | 10:39 pm on Jan 11, 2003 (gmt 0) |
There used to be an option at the bottom of the search results to view the page with omitted results included. Ted
|
bcc1234

msg:51268 | 10:43 pm on Jan 11, 2003 (gmt 0) |
For one thing, in any ordered dataset - if you want to get the N-th element, the system has to go through N-1 elements in order to even figure out that N-th element is in fact the N-th. In other words, Google's software has to go through 10M records every time you click refresh on the page that would display 10,000,000-10,000,010th results. That's just one "final" step that is required after all index has been searched and the pages were ordered (by relevance or anything else for that matter). And this thing alone would consume all google's resources. [edited by: bcc1234 at 10:45 pm (utc) on Jan. 11, 2003]
|
Lots0

msg:51269 | 10:45 pm on Jan 11, 2003 (gmt 0) |
That button is there for some specialized searches (never for normal searches) - and it usually only returns pages in the same domain that have the search term on them. such as links:yourdomain.com. No I am talking about normal searches - Why can't we see all the results? Why do we only get to see a very small percent of the sites returned? <added>Bcc1234 if what you said were accurate, how do we get any search results at all?</added>
|
heini

msg:51270 | 10:52 pm on Jan 11, 2003 (gmt 0) |
I believe the cut off is ~1000 results. Why? Serverload? Never handled 3 Bill. dbs, so no idea really.
|
bcc1234

msg:51271 | 11:03 pm on Jan 11, 2003 (gmt 0) |
I have no way of knowing Google's internals. If I did - I would be rich :) But I assume the way they generate serps by sending the query to many parallel boxes and then combining the result. Let's say there are 5 parallel boxes that contain all index (the 3B pages), each box has 1/5 of the index. The query goes to all 5 of them and they return 1,000 (or any other preset number) most relevant results from THEIR indeces. So we get 5 lists from 5 different indeces. After that, all 5 of them are combined and the final 1k results are sorted out from those records. Why there is a limit? Well, it's easier to allocate memory for the list with "at most" set limit of records. That way, if some of the 5 boxes' indeces did not have a single relevant page - it's just 4x1,000 or 3x1,000 etc. On really specific terms it might be: box 1 - 25 results box 2 - 0 results box 3 - 150 results box 4 - 2 results box 5 - 0 results And the final list has 177 results. But if the list is larger then it's truncated with the least relevant entries being left out. I can't even imagine an efficient architecture that would allow to retrieve it all. After all, you would have to store it somewhere while it's being merged and served.
|
Lots0

msg:51272 | 11:20 pm on Jan 11, 2003 (gmt 0) |
Thanks bcc1234 That makes it a little more understandable :-)
|
heini

msg:51273 | 11:27 pm on Jan 11, 2003 (gmt 0) |
AV cuts off at 1K too, ATW cut off is at 4K.
|
Lots0

msg:51274 | 11:30 pm on Jan 11, 2003 (gmt 0) |
Yup I was just checking that out myself. your right the cutoff is 1000 for google and AV - I hadn't gotten to the end on fast yet :)
|
|