Forum Moderators: open

Message Too Old, No Replies

Why tease with millions of results and only show 1000?

         

puppetmaster

8:27 pm on Dec 3, 2003 (gmt 0)

10+ Year Member



This might be a stupid question, but why does google show 1-10 of about 3,733,300,000 if they only list the top 800 to 1000 sites. Why even show they have 3 million + sites if they don't even list them. Kind of misleading.

Maybe someone has an answer to this.

Thanks,

jeremy goodrich

8:53 pm on Dec 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>3 million

They've got 3 billion+ pages in their index.

Reason being, not all 3 billion pages cover the same material ;)

jpavery

9:18 pm on Dec 3, 2003 (gmt 0)

10+ Year Member



I'm with you on this. I for my KW it says 1 of about 170,000... so I tried to go deep - I wanted to see site 169,999. I could only get to listing 763.

I do not have an explanation.
JP

pageoneresults

9:38 pm on Dec 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google knows that very few people are going beyond page 3 of the SERPs (30 results). No need to waste resources. I believe the numbers are there just to show size, you know, it's a sub-conscious thing.

P.S. The total results returned has steadily increased over the past few months. I feel this is in anticipation of the IPO. One of those "our index is bigger than theirs" type things.

2oddSox

9:52 pm on Dec 3, 2003 (gmt 0)

10+ Year Member



Well as we're seeing lately with crappy search results, size isn't that important. It's what you do with what you have that matters.

2odd...

Chndru

9:54 pm on Dec 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I could only get to listing 763.

Did you try to remove the filter? you can do that either manually by adding &filter=0 in the search query or by clicking on the link "repeat the search with the omitted results included." (at the end of the results shown).

jim_w

9:58 pm on Dec 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is a web site that allows you to enter KWs and it will look at the 1st 1000 hits and tell you where you rank. I have nothing to do with this site. I used it once. (fell like I'm disclaiming stocks on CNBC)

<snip>

[edited by: WebGuerrilla at 1:25 am (utc) on Dec. 4, 2003]
[edit reason] Sorry, no tools [/edit]

nakulgoyal

12:48 am on Dec 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you use the filter as well, even then it shows less then what it shows in number?

Any explanation or reasoning for this anywhere?

oodlum

12:59 am on Dec 4, 2003 (gmt 0)

10+ Year Member



I recall reading an article somewhere that stated that the "of about YYY results" was to cover the disparity caused by sites that were removed manually for legal reasons etc.

I'll see if I can find it...

nakulgoyal

1:15 am on Dec 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was looking for some info for a client. He has 1500 indexed pages. when I try find them using the filter as well, I just find 400. Do you mean 1100 removed for legal issues? Unbeleivable. And this is for every website? Even for Microsoft? :-)

oodlum

1:46 am on Dec 4, 2003 (gmt 0)

10+ Year Member



Just telling you what I read. Two seperate issues I think.

1. Not saying exactly how many results - covers the removed sites

2.. Displaying nowhere near the amount or results they estimated (ie. stopping at 800 of 50,000). Possibly what pageoneresults said

Chndru

2:43 pm on Dec 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> for legal issues

If they were removed for DCMA issues, there will be a link at the end of the results, mentioning that some searches are removed to comply with DCMA regulations or something similar to that

killroy

2:55 pm on Dec 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm It always bugged me, especially forsites with less then 100 pages...

My site with 820 pages shows (without filter) 540 out of about 760-1200 depending on datacenter.

SN

jpavery

3:03 pm on Dec 4, 2003 (gmt 0)

10+ Year Member



Chndru,
I did remove the filter and click on show omitted results. Still only 763.
JP

BigDave

5:23 pm on Dec 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They do it because of the resources that would be required to actually rate all 170k pages on every search just to satisfy the curiosity of a few dozen people a year.

shady

5:27 pm on Dec 4, 2003 (gmt 0)

10+ Year Member



If you can't find what you need within the top 1000 sites, either:

1) You are searching for the wrong term
2) Google is inadequate

I guess this is a matter of opinion :)

(edited due to appalling grammer by shady!)

Herenvardo

12:00 pm on Dec 6, 2003 (gmt 0)

10+ Year Member



Trully it's a matter of resources... and response time.
If you look at the right, you shall see something like "Search took 0.09 seconds".
I'll tell you how does an engine like google work:
1st: The system gets a search query from the user and sends it to the database.
2nd: The database processates the query and return all matches and the total of results. The matches are ordered by some criteria than only G knows...
3rd: The system get the first results (around 3k~5k) and send them as a query to the PR database.
4th: Once it has all the PR values for thoose files, many are dicarded as spam or pages from the same domain, etc.
5th: The system gets the PR score and calculates a relevancy score based on search terms for each file.
6th: The final score for that search and for each file is calculated based on both scores. How are they combined is another mystery that only G knows...
7th: If there survive more than 1000 results, the system discards the lowest scored.
8th: The results are returned to the user, ordered by their final score. The user will never get more than 1000 results, but s/he will have them in a fraction of second.

Scoring and ordering more results wouldn't increasse time linearly, but exponently. If ordering 1000 results takes 10 ms, ordering 10000 would take 220 ms. Working with more real numbers: a search may take 0.20 seconds, and it orders 1000 results. If it has to order 170k results, it would take almost two hours. Ordering 2 million results would take 25 years of continuous work...
For this reason, G gives a result enough but not averwhelming. Nobody who searches wants to wait some years to get the results...

Greetings,
Herenvardö

Herenvardo

12:31 pm on Dec 6, 2003 (gmt 0)

10+ Year Member



wops! I calculated the times using a basic algorythm to order sequences... there are some algorythms that are faster, and would take less time:
170k results: 59s
2million results: 840000s (9 days)

The problem with this is that it is a recursive algorythm, and it needs a huge amount of memory to work. For 170k results, it would need 27Gb of memory. For the 2m results, there would be needed 4Terabytes (4000Gb)! For a thousand results, it only takes 0.2s and uses 1mb of memory.

Somebody wants to know how memory would need G if they have to rank all the pages in their index for a search? ;)
The task would need more than 9 ExaBytes (1 Exabyte = 1024^3 Gb ~ a billion gigabytes) and it would take 66 years.
Using the slow algorythm, it would need less memory (only 3Gb), but it would take more than 300 billion years!

Greetings,
Herenvardö

[edit]note for non-english people: a billion in English means a thousand millions. In other languages it means a million of millions. I used the term in english, so take the english meaning;)[/edit]

wellzy

1:36 pm on Dec 6, 2003 (gmt 0)

10+ Year Member



I believe only webmasters would try to go this deep for a search term. Since most users never get past the fist few pages, there would be no need to list them all. IMO if you can't find what you are looking for in the first few pages than you need to refine your search.

Duniyadnd

4:08 am on Dec 8, 2003 (gmt 0)

10+ Year Member



"I'll tell you how does an engine like google work" - Herenvardö

Don't forget that google openly admits that it saves the queries we have previously made courtesy of the cookie stored on our harddrive. Each time we do a search, it can search that table first to get the top listing sites quicker and faster. It could be a "cached" page as well. Never know.

rbarker

5:47 am on Dec 8, 2003 (gmt 0)

10+ Year Member



I think G's page return limit is partly designed to avoid allowing people to harvest their data.

If you query "mailto:info@" at G it says they have over 400K page returns. Spammers would have a field day using a simple perl mod designed to harvest email addresses. You'll find these limits at other engines as well.

JasonHamilton

4:04 pm on Dec 8, 2003 (gmt 0)

10+ Year Member



Because:

1) No reason for google to offer their database to every user who wants it.

2) Search results are on a given input. Google ranks it as best it can and gives you the best listings. The rest of the listings don't count for that search, or don't count enough to matter as far as google is concerned.

3) Google makes claim to 3 billion indexed pages, and they do indeed have that many to search through. When you do a search, it isn't searching through only 1000 listings, it's searching through pretty much all of them in order to get you your 1000. There is no misleading claim, in fact, pretty much all the search engines have a hard limit on the number of results you can actually see.