homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

8 Billion items, what about relevance?
Does the size of googles index increase or reduce relevancy?

 3:30 pm on Nov 12, 2004 (gmt 0)

I've been concerned about the quality of results delivered by Google for some months, now it seems that are happy to add 4 billion pages to an index that was already stuffed with irrelevant pages.

Do you agree that the number of pages is irrelevant and if so would you welcome a site which trumps the 8 billion by several degrees of magnitude simply to stop the "we've got the biggest index"?

It's just a thought from yesterday that has been bugging me, if enough people think that trumping G's 8 billion is worthwhile I will spend the weekend building a couple of sites that can legitimately claim to have more pages listed than Google, it will all be irrelevant pages from one site though, listed on a second site that does a full text search on those pages. (please don't question the technical stuff on it, it will work)



 2:41 pm on Nov 20, 2004 (gmt 0)

Relevancy has not changed for the big queries in my opinion. However, for those 6-10 word specific queries, the bigger index is certainly welcome relief and a good increase in relevancy.

In the past, those queries would return zero results - now many of them are returning a few results. Those hard to find gems are no coming up!


 5:33 pm on Nov 20, 2004 (gmt 0)

Searching 8,058,044,651 web pages

I don't understand why anyone believes that number?

When I do a site:www.mydomain.com search on Google, it tells me my site has 194 pages. In fact, there are ony 144 pages. Where did the extra 50 pages come from?

If we do the math (and my math skills are pretty poor) ... that's about 35% more pages than those which actually exist!

If we follow the logic and assume that all sites have been attributed with 35% more pages than really exist, then the figure shown above has been falsely inflated by at least 2.8 billion pages ... has it not?


 6:09 pm on Nov 20, 2004 (gmt 0)

I'm with Liane (well, not really;)),
I 301'd a subdirectory of a mature site back in june and since the change to 8 billion pages many of the 301'd pages are showing up on one site and as a supplemental result on the other.
Besides, google doesn't care if the larger index increases relevancy, only if it increases revenue.


 6:09 pm on Nov 20, 2004 (gmt 0)


I don't think the issue is cut and dried. Consider two URLs:

* www.bestwidgets.com
* bestwidgets.com

On some servers, those are exactly the same page, on other servers, they might be entirely different content, or one might not exist at all. It is tricky to know what is a unique page.

Another example would be database driven URLs vs human readable URLs. Some sites offer both, at least for key pages. For example:

* bestwidgets.com/cda/0,3254,10584,00.html
* bestwidgets.com/large/blue.html

They could be exactly the same page. How is Google or any search engine supposed to know the difference? Can it be assumed that because they were the same at one point, that the data will be the same later on? Should SEs even be expected to compare every page in a domain with each other to identify equalities?


 6:20 pm on Nov 20, 2004 (gmt 0)

Should SEs even be expected to compare every page in a domain with each other to identify equalities?
Good point. Why should a search engine concern itself with the quality of its index when it's revenue is inversely proportionate to that quality.

 6:50 pm on Nov 20, 2004 (gmt 0)

Why should a search engine concern itself with the quality of its index when it's revenue is inversely proportionate to that quality.

Simple: Because declining quality would lead to a drop in both traffic and revenue.


 7:12 pm on Nov 20, 2004 (gmt 0)

But time and time again we hear that joe surfer does not know when he is getting inferior results. And that he never notices the sites that are missing from the results.


 9:28 pm on Nov 20, 2004 (gmt 0)

>Where did the extra 50 pages

First, it isn't necc "pages" but rather "urls".

I think RFranzen is right on in suggesting: www.domain.com/foo and domain.com/foo and even aaa.bbb.ccc.ddd ip addresses are unique urls. It is quite easy to see how Google could index 4x the number of urls.

Personally, I don't think G's quality has ever been as good across the board as it is right now. Many of the really spammy sectors are slowly getting cleaned up (aka: things like travel...etc)

I really think a bigger idex pays off for everyone.

aka2: lets start targetting more 6-10 keyword phrases ;)


 10:24 am on Nov 22, 2004 (gmt 0)

I really think a bigger index pays off for everyone.

Sure it does and I agree that the 4, 5 & 6 keyword phrases are working very well these days. I also agree that Google is less spammy than ever before, though there are spammy sites still plaguing the index.

However, I still don't understand the numbers of URLS reported for my site! I searched for "www.mysite.com" not ".mysite.com". Isn't a search specific to "www.mysite.com" supposed to return only results for "www.mysite.com" and does not include ".mysite.com"?

No matter how you cut it, there are only 144 unique URLS for my site. It is reporting 194 URLS. The number has been falsely inflated.

ERGO ...
Searching 8,058,044,651 web pages
is questionable at best.

First, it isn't necc "pages" but rather "urls"

Perhaps we should tell Google that their semantics are incorrect. They should change this statement to read: Searching 8,058,044,651 URL's ;)


 3:18 pm on Nov 22, 2004 (gmt 0)


Could you add your homepage to your webmasterworld profile? Then some of us could take a look and maybe figure out what Google is seeing that you don't.

-- Rich


 4:42 pm on Nov 22, 2004 (gmt 0)

No ... but I sent you a sticky. :)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved