Forum Moderators: open

Message Too Old, No Replies

The Invisible Web

Anyone have the latest stats on the number of hidden pages?

         

xbase234

4:36 pm on Jul 9, 2004 (gmt 0)

10+ Year Member



In talking about the "invisible web", I'm referring to the number of estimated pages not indexed by search engines due to the fact that they are hidden behind search boxes.

Previous estimates stated between 90-97% of the pages on the web are not indexed. Can anyone cite a reputable article or source for this information?

Thanks

xbase234

11:06 pm on Jul 12, 2004 (gmt 0)

10+ Year Member



since no one else has responded, I'm throwing a wild number out there - 90% not indexed.

anyone want to challenge this number?

buckworks

11:17 pm on Jul 12, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This doesn't answer the question, but it relates:

A speaker at the Search Engine Strategies conference in New York last March said detailed study of his logs revealed that almost one-third of his main site's traffic came from links on pages that the search engines didn't seem to know about.

That's not 90%, but it's significant.

encyclo

11:23 pm on Jul 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The figures are just guesswork, based on estimations of the volume of content on company intranets, password-protected resources and the like. To really know how many unindexed pages there are, you'd have to index them... which doesn't quite work!

Don't forget that 76% of statistics are invented ;)

claus

11:35 pm on Jul 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> the number of estimated pages not indexed by search engines

In this thread (April 2004) i made an estimate that the Google index of 4,285,199,774 pages was equal to no more than 10-20% of published pages (and probably lower). Calculation is in post (msg #:12):

How many pages estimated on the Internet? [webmasterworld.com]

>> a reputable article or source

You'll have to judge on that, but i usually admit when i'm proven wrong ;)

andrew_m

11:57 pm on Jul 12, 2004 (gmt 0)

10+ Year Member



Well, I can say that on one of my sites all pages are accessible by crawling, no passwords, no forms to fill out. Yet only about 7% of them are indexed simply because some of them are too deep -- googlebot exhausts entry-point pagerank before reaching them.

I know that if these pages were indexed and even assigned an exact zero pagerank so that they are at the very bottom of search results -- I'd still get traffic to them, because topics are unique and might not be covered anywhere else on the internet.

And I'm pretty sure my site is not the only one like that.

So, bottom line, even before we talk about form-filling bots there is some room for improvement in current bots.

JonR28

5:01 pm on Jul 22, 2004 (gmt 0)

10+ Year Member



Don't forget that 76% of statistics are invented ;)

"Actually 60% of all statistics are made up and 30% of those aren't even true!"
-Homer Simpson

Chndru

5:11 pm on Jul 22, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since it's invisible web (not just published pages), the theoritical answer would be infinite. Especially if you include dynamic pages.