Forum Moderators: bakedjake
"Although the technology is still at a beta stage, it can crawl 20 million URLs in 24 hours so it could have huge commercial potential when it's fully launched." (Matt Ellis, chief technology officer for Looksmart Internation)
If Looksmart/Grub can crawl 20 million per day - what speed can the other crawlers go at? are these stats available anywhere?
I suppose now that the "i've got the biggest index" battle has kind of stabilised, now we could have the "I can crawl the fastest" race:
Google Vs. Ask Vs. FAST Vs. Inktomi etc etc
what do you guys think?
maku
I don't know official numbers of urls crawled per month but I'm guessing it goes in order of most to least:
FAST, Google, Inktomi, Wisenut (and now grub) and Ask
I think I heard somewhere that when Wisenut actually does a crawl it can do around 75-80 million, you'll have to fill me in on the others.
Rob
I recall Google or another major SE saying they could do about 50 million in a day. Perhaps it was FAST..?
If the speed of their crawling is the only advantage of the distributed crawling paradigm, imho, they need to re-evaluate.
Relevance, freshness - and a value proposition to draw in the surfers - that is what they need.
Were you aware that WiseNut seems to have been pieced together (intially) via meta searching some other engines...? :) Just getting web pages isn't hard - why would that mean this new fandangled thing will suddenly be "better"?
The freshness of the Google index is only a small part of the reason they are 'the guard' I guess I would infer from the language in your post?
Aside from that, the article is talking about 'huge commercial potential...' & you do too realize that many, many, many webmasters know how to configure their site to ban bad bots, yes?
So...aside from getting a bunch of webmasters irritated, what else would crawling so much & so fast do for the search engines ability to answer a user's query in the best way?
Nothing. End of story, imho. It's easy to get data, to make it useful - much harder.
Then, to get people to use that which you've built - harder still.
I think that the big boys can be broken into 2 categories:
Relevant:
Google
FAST
Inktomi
Irrelevant:
Altavista
Teoma
Wisenut
The way I decide is by having (like most people probably have) a favourite query that I 100% know the 4 or 5 most relevant sites on the net for the topic. Then see if they appear in at least the top 10 results on each SE.
I know that Fast does do regular crawls, but they seem to be on a smaller scale. For a couple of the sites that I watch, it took 9 months for them to be fully crawled. And Fast is still including results from one of my sites that I shut down 8 months ago.
Inktomi seems to be improving by leaps and bounds, based on the small sample that I look at. Results are much fresher than they ever used to be. I would say that it is currently "fresher" than Fast.
Teoma and AltaVista.... not much to say about them really. How often does Wisenut actually crawl? There was one relatively recently, but wasn't the one before that 6 months previously?