Forum Moderators: open
Previous estimates stated between 90-97% of the pages on the web are not indexed. Can anyone cite a reputable article or source for this information?
Thanks
A speaker at the Search Engine Strategies conference in New York last March said detailed study of his logs revealed that almost one-third of his main site's traffic came from links on pages that the search engines didn't seem to know about.
That's not 90%, but it's significant.
In this thread (April 2004) i made an estimate that the Google index of 4,285,199,774 pages was equal to no more than 10-20% of published pages (and probably lower). Calculation is in post (msg #:12):
How many pages estimated on the Internet? [webmasterworld.com]
>> a reputable article or source
You'll have to judge on that, but i usually admit when i'm proven wrong ;)
I know that if these pages were indexed and even assigned an exact zero pagerank so that they are at the very bottom of search results -- I'd still get traffic to them, because topics are unique and might not be covered anywhere else on the internet.
And I'm pretty sure my site is not the only one like that.
So, bottom line, even before we talk about form-filling bots there is some room for improvement in current bots.