homepage Welcome to WebmasterWorld Guest from 54.205.99.71
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
Forum Library, Charter, Moderators: phranque

SEM Research Topics Forum

    
New search engines go deep
grnidone




msg:818711
 3:19 pm on Feb 1, 2001 (gmt 0)

[nua.ie]

(This is a story from the New York Times and all credits are given to them. WebMasterWorld takes no credit for this story. -G)


Jan 31 2001: The New York Times reports that the limits of current search engines’ indexing ability means that they have access to less than one percent of all the pages on the Web.

Up to 500 billion pieces of content are hidden from the search engines, according to search specialist BrightPlanet.com. This un-indexed region of the Web is being dubbed the “deep Web” and BrightPlanet.com estimates that it may be 500 times larger than the surface Web that search engines try to cover.


 

rencke




msg:818712
 3:52 pm on Feb 1, 2001 (gmt 0)

Bright Planet has published a very interesting white paper about this. Worth reading:
[completeplanet.com...]

grnidone




msg:818713
 4:03 am on Feb 2, 2001 (gmt 0)

I glanced at the summary and it looks fascinating. I think I might print it out and curl up with a highlighter this weekend.

-G

grnidone




msg:818714
 4:06 am on Feb 2, 2001 (gmt 0)

This was once in Breaking News, but I moved this to Research Topics. It seems like it is better placed in this forum.

-G

Woz




msg:818715
 12:02 pm on Feb 2, 2001 (gmt 0)

There was some kafuffle some time ago triggered by a press release from Bright Planet on the very same subject. Mmmm, I thought, maybe these people are onto something. So I visited their site, and downloaded their brand new search program, which turned out to be a spruced up version of Mata Hari. <sigh> Oh well.

The story has also been picked up recently by NUA, but it is old ews I am afraid.

That being said, I believe they are right about the amount of information available on the net that we never see. Major SEs have no hope of indexing it all. I believe the future is in smaller, very focused search engines set up in a cross referenced network. Others from various parts of the world seem to have the same belief. We will see who is the first to surface.

Onya
Woz

chiyo




msg:818716
 2:06 pm on Feb 2, 2001 (gmt 0)

Woz, I second your motion. Ive surmised for about a year now that the future of Web search will be in smaller focused databases designed for a specific audience. There is no real revenue model now for databases that search the whole net, though maybe a few with limited oveheads or some whizzbang strategy may make it. Google comes to mind.

Smaller specialist engines are suit the architecture of the Net better.

What the Net does better than anything else, is provide information to small interest groups that are otherwise physically disparate.

While some see it as a mass marketing medium, and many sites are designed on this premise, it was never going to last long. If you look at the Dot Com Morgue the great majority of these failed ventures were based on the "mass marketing" premise. As a result they were poorly positioned, badly focused and targeted. Sites targeting specific focused groups, with targeted focused advertising and revenue models may still be doing OK - of course the dedication and low overheads of these small webs helps too.

Remember the Web is just an interconnected set of modes, and indexing the whole web may have been almost possible 6 years ago but now the principle and architecture of the Net is asserting itself.

I call it the Disaggregated Web and hail it's comeback!

Specialist search engines, funded by various models such as PPC, advertising, subscription, volunteerism, and government funding may well be the next trend on the Net. (And I use "the Net" nomenclature deliberately rather than "the Web")

Going further, Small sites may thrive while big ones die. Perhaps Yahoo, which gets over it's bigness by determined efforts to target different groups will certainly survive, but they have to keep on positioning and targeting even better...

I think it was the economist Schumacher (excuse the spelling) who said "Small is beautiful".

..Yep, AV did get it wrong with their old slogan!

bigjohnt




msg:818717
 3:21 pm on Feb 2, 2001 (gmt 0)

vortals

tedster




msg:818718
 5:39 pm on Feb 2, 2001 (gmt 0)

Here is one interesting section of the pdf document to me.

Incomplete Indexing of Surface Web Sites


Engine---------ODP Pages
Open Directory--248,706
AltaVista--------17,833
Fast-------------12,199
Northern Light---11,120
Go (Infoseek)-----1,970

Clearly, the engines themselves are imposing decision rules with respect to either depth or breadth of surface pages indexed for a given site. There was also broad variability in the timeliness of results from these engines. Specialized surface sources or engines should therefore be considered when truly deep searching is desired.

First, I've been trying to formulate a query on Google to get a handle on how many ODP pages they have indexed. Searching on: dmoz site:dmoz.org returns 637,000 pages. Since ODP reports 248,706 total category pages, something seems off.

Second, this underlines for me the importance of submitting directory pages to the spiders at search engines. With the possible exception of Google, you cannot simply assume that they will find a given directory entry, even in the ODP.

Third, the deep web seems to present a very real need and opportunity to develop a different kind of search resource, but my guess is that it will need to start in academia -- the way Google did -- and not through commercial concerns.

han solo




msg:818719
 6:05 pm on Feb 2, 2001 (gmt 0)

The size of dmoz is interesting, since you brought it up, tedster. I can't seem to get the number you got, but rather someplace in the

Results 1 - 10 of about 545,000. Search took 0.26 seconds

range, interesting, no?

I've seen, when browsing through google's version of dmoz, some sites that still work with 0 page rank. I think it's because of being booted from teh google db, therefore, even the link from dmoz doesn't count.

The other aspect of this shrinkage is linkrot.

Have to say, I love this thread! When I've had more coffee, I'll probably dive in.

The deep web is fascinating, you could use this site as a perfect example. Where else could you find this many seo/webmaster experts, all chatting and growing the size of the knowledge database? And how hard is it to find this url in the engines?

Cheers,

Han Solo

tedster




msg:818720
 6:22 pm on Feb 2, 2001 (gmt 0)

Just to clarify, the ODP number is from the pdf version of the actual paper from Bright Planet (rencke's link). It represents the number of actual category and sub-category pages in the ODP when BP did their research.

tedres




msg:818721
 6:30 pm on Feb 2, 2001 (gmt 0)

FWIW,

at the bottom of the ODP home page ( [dmoz.org] ) is their more or less real time stat on sites, editors, and categories. Currently it's:
2,347,914 sites - 34,017 editors - 339,288 categories

I'm sure that "Sites" means listings, not the number of unique websites.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved