Forum Moderators: bakedjake

Message Too Old, No Replies

speed of the crawlers

         

papamaku

7:48 pm on Apr 30, 2003 (gmt 0)

10+ Year Member



I just read an article in PcPro (UK mag) about the Looksmart-Grub aquisition and it had a quote I hadn't seen before:

"Although the technology is still at a beta stage, it can crawl 20 million URLs in 24 hours so it could have huge commercial potential when it's fully launched." (Matt Ellis, chief technology officer for Looksmart Internation)

If Looksmart/Grub can crawl 20 million per day - what speed can the other crawlers go at? are these stats available anywhere?

I suppose now that the "i've got the biggest index" battle has kind of stabilised, now we could have the "I can crawl the fastest" race:

Google Vs. Ask Vs. FAST Vs. Inktomi etc etc

what do you guys think?

maku

jrobbio

9:00 pm on Apr 30, 2003 (gmt 0)

10+ Year Member



Last week the grub client crawled 150 million urls in a day before they limited it the progress because of robots.txt issues.

I don't know official numbers of urls crawled per month but I'm guessing it goes in order of most to least:
FAST, Google, Inktomi, Wisenut (and now grub) and Ask

I think I heard somewhere that when Wisenut actually does a crawl it can do around 75-80 million, you'll have to fill me in on the others.

Rob

jeremy goodrich

9:03 pm on Apr 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



20 million per day is NOT impressive for a distributed crawler.

I recall Google or another major SE saying they could do about 50 million in a day. Perhaps it was FAST..?

If the speed of their crawling is the only advantage of the distributed crawling paradigm, imho, they need to re-evaluate.

Relevance, freshness - and a value proposition to draw in the surfers - that is what they need.

charge

5:51 am on May 1, 2003 (gmt 0)



Jeremy GOOGLE can crawl 150 mill urls a day.Grub can crawl that amount with 2,000 clients and at present they have over 11,000 clients down loaded ready to go.I realize that crawling is only the first stage but its a good start.I read GRUB is hoping to refresh the web everyday.If they pull it off and its a big ask it could be the changing of the guard.Some people are looking foward to GRUB giving GOOGLE a run for their money.
cheers,
charge!

jeremy goodrich

5:57 am on May 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Charge - people could care less about the number of pages in the index.

Were you aware that WiseNut seems to have been pieced together (intially) via meta searching some other engines...? :) Just getting web pages isn't hard - why would that mean this new fandangled thing will suddenly be "better"?

The freshness of the Google index is only a small part of the reason they are 'the guard' I guess I would infer from the language in your post?

Aside from that, the article is talking about 'huge commercial potential...' & you do too realize that many, many, many webmasters know how to configure their site to ban bad bots, yes?

So...aside from getting a bunch of webmasters irritated, what else would crawling so much & so fast do for the search engines ability to answer a user's query in the best way?

Nothing. End of story, imho. It's easy to get data, to make it useful - much harder.

Then, to get people to use that which you've built - harder still.

charge

6:44 am on May 1, 2003 (gmt 0)



Jeremy interesting that you mention WISENUT I read that it is now the most relevant searchengine just behind inktomi and google.And I pay little credence to inktomi due to them paying for the test.
cheers
charge!

jeremy goodrich

7:08 am on May 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wait, it sounds like you just said the test was worthless - so why cite it as saying Wisenut is relevant?

and what's that got to do with their crawler going faster? you're losing me here. :)

charge

7:34 am on May 1, 2003 (gmt 0)



Jeremy I doubt inktomi would be more relevent than GOOGLE.And your loosing me.GRUB feeds WISENUT which recently was shown to be a relevent search engine.( draw in the surfers) read what you said.
cheers.
charge!

papamaku

9:31 am on May 1, 2003 (gmt 0)

10+ Year Member



Sorry charge, but IMHO Wisenut is sooo irrelevant:

I think that the big boys can be broken into 2 categories:

Relevant:

Google
FAST
Inktomi

Irrelevant:

Altavista
Teoma
Wisenut

The way I decide is by having (like most people probably have) a favourite query that I 100% know the 4 or 5 most relevant sites on the net for the topic. Then see if they appear in at least the top 10 results on each SE.

charge

11:02 am on May 1, 2003 (gmt 0)



Papa FAST came last.
c.c.c

charge

11:08 am on May 1, 2003 (gmt 0)



Papa those test results are on searcheginewatch.com.Who knows they could be hopeless but I quoted what was reported.
cheers,
charge!

poluf1

11:30 am on May 4, 2003 (gmt 0)

10+ Year Member



Google ususally visits my sites once in about 1.5 to 2 months. Considering that to be an average for their 3 billion sites that would give 50-70 million pages visited a day.
But I guess there are sites visited more often so it could be more.

Bobby_Davro

11:55 am on May 4, 2003 (gmt 0)

10+ Year Member



For me, Google still retains the freshness award. Every month they crawl and update everything, plus it uses Freshbot most days. Can we actually say that of any of the others?

I know that Fast does do regular crawls, but they seem to be on a smaller scale. For a couple of the sites that I watch, it took 9 months for them to be fully crawled. And Fast is still including results from one of my sites that I shut down 8 months ago.

Inktomi seems to be improving by leaps and bounds, based on the small sample that I look at. Results are much fresher than they ever used to be. I would say that it is currently "fresher" than Fast.

Teoma and AltaVista.... not much to say about them really. How often does Wisenut actually crawl? There was one relatively recently, but wasn't the one before that 6 months previously?