Forum Moderators: open
(This is a story from the New York Times and all credits are given to them. WebMasterWorld takes no credit for this story. -G)
Jan 31 2001: The New York Times reports that the limits of current search engines’ indexing ability means that they have access to less than one percent of all the pages on the Web.Up to 500 billion pieces of content are hidden from the search engines, according to search specialist BrightPlanet.com. This un-indexed region of the Web is being dubbed the “deep Web” and BrightPlanet.com estimates that it may be 500 times larger than the surface Web that search engines try to cover.
-G
-G
The story has also been picked up recently by NUA, but it is old ews I am afraid.
That being said, I believe they are right about the amount of information available on the net that we never see. Major SEs have no hope of indexing it all. I believe the future is in smaller, very focused search engines set up in a cross referenced network. Others from various parts of the world seem to have the same belief. We will see who is the first to surface.
Onya
Woz
Smaller specialist engines are suit the architecture of the Net better.
What the Net does better than anything else, is provide information to small interest groups that are otherwise physically disparate.
While some see it as a mass marketing medium, and many sites are designed on this premise, it was never going to last long. If you look at the Dot Com Morgue the great majority of these failed ventures were based on the "mass marketing" premise. As a result they were poorly positioned, badly focused and targeted. Sites targeting specific focused groups, with targeted focused advertising and revenue models may still be doing OK - of course the dedication and low overheads of these small webs helps too.
Remember the Web is just an interconnected set of modes, and indexing the whole web may have been almost possible 6 years ago but now the principle and architecture of the Net is asserting itself.
I call it the Disaggregated Web and hail it's comeback!
Specialist search engines, funded by various models such as PPC, advertising, subscription, volunteerism, and government funding may well be the next trend on the Net. (And I use "the Net" nomenclature deliberately rather than "the Web")
Going further, Small sites may thrive while big ones die. Perhaps Yahoo, which gets over it's bigness by determined efforts to target different groups will certainly survive, but they have to keep on positioning and targeting even better...
I think it was the economist Schumacher (excuse the spelling) who said "Small is beautiful".
..Yep, AV did get it wrong with their old slogan!
Incomplete Indexing of Surface Web Sites
Engine---------ODP Pages
Open Directory--248,706
AltaVista--------17,833
Fast-------------12,199
Northern Light---11,120
Go (Infoseek)-----1,970Clearly, the engines themselves are imposing decision rules with respect to either depth or breadth of surface pages indexed for a given site. There was also broad variability in the timeliness of results from these engines. Specialized surface sources or engines should therefore be considered when truly deep searching is desired.
First, I've been trying to formulate a query on Google to get a handle on how many ODP pages they have indexed. Searching on: dmoz site:dmoz.org returns 637,000 pages. Since ODP reports 248,706 total category pages, something seems off.
Second, this underlines for me the importance of submitting directory pages to the spiders at search engines. With the possible exception of Google, you cannot simply assume that they will find a given directory entry, even in the ODP.
Third, the deep web seems to present a very real need and opportunity to develop a different kind of search resource, but my guess is that it will need to start in academia -- the way Google did -- and not through commercial concerns.
Results 1 - 10 of about 545,000. Search took 0.26 seconds
range, interesting, no?
I've seen, when browsing through google's version of dmoz, some sites that still work with 0 page rank. I think it's because of being booted from teh google db, therefore, even the link from dmoz doesn't count.
The other aspect of this shrinkage is linkrot.
Have to say, I love this thread! When I've had more coffee, I'll probably dive in.
The deep web is fascinating, you could use this site as a perfect example. Where else could you find this many seo/webmaster experts, all chatting and growing the size of the knowledge database? And how hard is it to find this url in the engines?
Cheers,
Han Solo
at the bottom of the ODP home page ( [dmoz.org] ) is their more or less real time stat on sites, editors, and categories. Currently it's:
2,347,914 sites - 34,017 editors - 339,288 categories
I'm sure that "Sites" means listings, not the number of unique websites.