Forum Moderators: open

Message Too Old, No Replies

Fast announces worlds largest search engine

         

Brett_Tabke

8:29 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Fast is now announcing that it has built the worlds largest search engine [biz.yahoo.com]. (575million docs).

NFFC

8:35 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"...with over 575 million full-text web documents....
built by crawling and examining over 1.5 billion web documents"

Is that good PR or good dupe checking?

Kamikaze

8:51 pm on Oct 12, 2000 (gmt 0)

10+ Year Member



I thought Google was over a billion docs...

Brett_Tabke

9:04 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



But Google is only 500million in the db, the other 500million are no where to be found. Part of them are claimed to be 'indexed links' only, and the rest? Where are they?

I bet if push came to shove, you'd have a real hard time finding over 100million in EITHER of their databases.

NFFC

9:05 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"560 million full-text indexed web pages and 500 million partially indexed URLs."

That's PR.

JamesR

9:21 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I bet if push came to shove, you'd have a real hard time finding over 100million in EITHER of their databases.

Yeah, Fast is not as good as detecting dupes as they say. Maybe exact dupes, character for character, but 90% and below they don't do too well, neither does Google. Inktomi seems to be the king on that one, but who can tell these days. Still, 100 million is still a stinkin' lot of pages.

Brett_Tabke

11:05 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm saying that I doubt they have that kind of db live on the web. I don't doubt they have that many docs on disk, but doubt that they have that many urls indexed in the live db. eg: you could spider them for a year and I doubt you get more than 100 million unique urls. I don't think they are lying, they just aren't distinguishing between the docs setting dead in the offline vault and the live db.

The difference between a 100million documents and 500million is monumental. The level of complexity is a 1000's of times more difficult.

I split the other topic [webmasterworld.com] on Fast off over to the Fast board.

JamesR

11:17 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My question is if these numbers on how many documents indexed are counting duplicates. With the number of doorways submitted to an engine, you could easily see the numbers skyrocket. But if 90% are spam dupes, who cares about 1.5 billion? Just trying to figure out what is hype, practically "who cares" numbers, and what are legit, useful pages (understanding the gray areas involved).

WebSpinner

2:16 am on Oct 13, 2000 (gmt 0)

10+ Year Member



The SE's are getting like the car manufactures. Every notice how "Every SE" has the "World's Largest DB" LOL.. Someone stop the madness...

:-)

Air

2:48 am on Oct 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All PR aside, I like Fast, I thought it would rise to greater prominence than it has, but it looks poised to do so.

rcjordan

3:08 am on Oct 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>but it looks poised to do so.
I think that's a direct quote from threads in May.

rencke

3:02 pm on Oct 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I thought it would rise to greater prominence than it has

I wonder if the Norwegian scientist from the University of Trondheim, who are behind Fast, really intend to be in the search engine business at all. That requires a lot more capital than the search engine software business. If you read the technological description at www.fast.no it seems to me that they are arguing their case to search engine companies, hard pressed by the need to upgrade their hardware. And they have been successful in selling software licenses, as we all know. Any thoughts, anyone?

rcjordan

3:20 pm on Oct 13, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>That requires a lot more capital than the search engine software business.
Added to that is the annoying little problem that most of the SE's have not been able to generate enough revenues to cover their costs, i.e., the business model looks very, very anemic now.

littleman

8:53 pm on Oct 13, 2000 (gmt 0)



>The difference between a 100million documents and 500million is monumental. The level of complexity is a 1000's
>of times more difficult.

Ok, this may sound really lame but why couldn't SE maintain multiple databases and then access them the way metacrawler does and then eliminate the dups?

Brett_Tabke

9:55 pm on Oct 13, 2000 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That is how they do it little, but every time you double the database it makes the process more than just twice as difficult. You have to deal with all those indexes and sorting stuff out across the network.

It was Inktomi that developed the 'parrallel' search engine first using just that technique.