Forum Moderators: open
Do you agree that the number of pages is irrelevant and if so would you welcome a site which trumps the 8 billion by several degrees of magnitude simply to stop the "we've got the biggest index"?
It's just a thought from yesterday that has been bugging me, if enough people think that trumping G's 8 billion is worthwhile I will spend the weekend building a couple of sites that can legitimately claim to have more pages listed than Google, it will all be irrelevant pages from one site though, listed on a second site that does a full text search on those pages. (please don't question the technical stuff on it, it will work)
In the past, those queries would return zero results - now many of them are returning a few results. Those hard to find gems are no coming up!
Searching 8,058,044,651 web pages
I don't understand why anyone believes that number?
When I do a site:www.mydomain.com search on Google, it tells me my site has 194 pages. In fact, there are ony 144 pages. Where did the extra 50 pages come from?
If we do the math (and my math skills are pretty poor) ... that's about 35% more pages than those which actually exist!
If we follow the logic and assume that all sites have been attributed with 35% more pages than really exist, then the figure shown above has been falsely inflated by at least 2.8 billion pages ... has it not?
I don't think the issue is cut and dried. Consider two URLs:
* www.bestwidgets.com
* bestwidgets.com
On some servers, those are exactly the same page, on other servers, they might be entirely different content, or one might not exist at all. It is tricky to know what is a unique page.
Another example would be database driven URLs vs human readable URLs. Some sites offer both, at least for key pages. For example:
* bestwidgets.com/cda/0,3254,10584,00.html
* bestwidgets.com/large/blue.html
They could be exactly the same page. How is Google or any search engine supposed to know the difference? Can it be assumed that because they were the same at one point, that the data will be the same later on? Should SEs even be expected to compare every page in a domain with each other to identify equalities?
Why should a search engine concern itself with the quality of its index when it's revenue is inversely proportionate to that quality.
Simple: Because declining quality would lead to a drop in both traffic and revenue.
First, it isn't necc "pages" but rather "urls".
I think RFranzen is right on in suggesting: www.domain.com/foo and domain.com/foo and even aaa.bbb.ccc.ddd ip addresses are unique urls. It is quite easy to see how Google could index 4x the number of urls.
Personally, I don't think G's quality has ever been as good across the board as it is right now. Many of the really spammy sectors are slowly getting cleaned up (aka: things like travel...etc)
I really think a bigger idex pays off for everyone.
aka2: lets start targetting more 6-10 keyword phrases ;)
I really think a bigger index pays off for everyone.
Sure it does and I agree that the 4, 5 & 6 keyword phrases are working very well these days. I also agree that Google is less spammy than ever before, though there are spammy sites still plaguing the index.
However, I still don't understand the numbers of URLS reported for my site! I searched for "www.mysite.com" not ".mysite.com". Isn't a search specific to "www.mysite.com" supposed to return only results for "www.mysite.com" and does not include ".mysite.com"?
No matter how you cut it, there are only 144 unique URLS for my site. It is reporting 194 URLS. The number has been falsely inflated.
ERGO ...
Searching 8,058,044,651 web pagesis questionable at best.
First, it isn't necc "pages" but rather "urls"
Perhaps we should tell Google that their semantics are incorrect. They should change this statement to read: Searching 8,058,044,651 URL's ;)