Forum Moderators: open

Message Too Old, No Replies

Is Freshbot now Deepbot?

The line is getting drawn ever thinner

         

trillianjedi

4:18 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've seen several postings about this now in the last few days, although this is my first actual experience of it.

I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).

It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.

In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.

I'm interested in hearing from others who are seeing the same.

TJ

nancyb

4:35 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This past month has been very different, so freshie and deepie may be used differently than in the past. Freshbot has always gone deep into my site though - after the deep crawl was over.

teeceo

4:40 pm on May 22, 2003 (gmt 0)

10+ Year Member



One of my band new sites is getting hit hard by freshbot so, maybe.....

teeceo.

Googleguy?

webwoman

4:45 pm on May 22, 2003 (gmt 0)

10+ Year Member



Freshbot has been as thorough and deep as deep bot at my sites for several months now.

parabola

4:47 pm on May 22, 2003 (gmt 0)

10+ Year Member



I'm a bit confused as to what you mean as Freshbot has always "acted" like deepbot on my pages - crawling them all?

Is it because I have only a few dozen pages?

Critter

4:51 pm on May 22, 2003 (gmt 0)

10+ Year Member



Heya TJ:

So far fresh only got my home page...but that was only around 45 minutes ago, so it may be back...I'll let you know.

For the record, my site has only been online since February. This is the first time I've ever been visited by fresh, even though deep's visited me for two months now.

Peter

eflouret

4:59 pm on May 22, 2003 (gmt 0)

10+ Year Member



Well, I launched three new web sites about 10 days ago and they now show about 24 pages of each one on www and other google indexes. I thought that was only possible with deepcrawl, but I don't know that much.

Enrique

hetzeld

5:04 pm on May 22, 2003 (gmt 0)

10+ Year Member



Hi,

Freshbot came today on a brand new site (online since last week) and grabbed every single page :)

Dan

BigDave

5:14 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While there are certainly some big changes underway, I think that this might be the way that google might choose to update the current index.

If you think about it, freshbot is designed to insert changes into the current index, whereas deepbot builds a changeset that gets merged all at once.

In the past few months I have seen many examples of fresh pages that seem to be "sticky", staying in the index, even after their fresh date disappears.

It might make sense for them to set up freshbot to crawl deep and set all the pages to sticky. Then shut it off for a few days while backlinks and PR are calculated.

I suspect that this method will take a little longer than doing a normal deep update cycle, but it will bring back in the missing sites and pages quicker. An additional disadvantage is that freshbot will be busy crawling deep instead of keeping the normal fresh pages fresh.

They might be writing off ever doing deep crawls again, but it is too early to make that call, while everything is still in flux.

MHes

5:19 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BigDave

Spot on.

We found new sites were sticking last month if linked to by high pr sites. I suspect this will become common for all new sites from now on.

nancyb

5:21 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



eflouret,

freshie has always added new pages to the index but these had a date formatted in green (i.e. the name fresh)and did not necessarily "stick" until after deepbot found them. I haven't been watching closely, but haven't noticed those green fresh dates recently on the sites I normally watch.

I think this last month has changed a lot of "rules" and we will just have to wait awhile to find out the new ones, that is if the new ones are ever discernible. My gut says that it will be more difficult now and, hopefully, harder for the unethical to skew the index.

Critter

5:25 pm on May 22, 2003 (gmt 0)

10+ Year Member



I don't know about not keeping things fresh BigDave. Why couldn't fresh come by and get all the pages for a site every two/three days?

Peter

BigDave

5:43 pm on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Critter,

Fresh could cover the entire web every couple of days, but they would need a lot more machines to handle that. I have never had fresh go deeper than PR4 for established pages.

If fresh is only picking up PR4+ pages, this is a very small minority of pages.

Right now deepbot does the portion of the web that it covers in about a week. That just isn't that easy to compress into every 3 days. Google also has a stated goal of increasing the number of pages in their index this year from 3G to 10G. That is a major increase in required processing power to fresh all those pages when a relatively small percentage of them change that often.

Oaf357

5:44 pm on May 22, 2003 (gmt 0)

10+ Year Member



From my site (~50 pages) freshbot hasn't acted like deepbot at all.

I came two days ago and grabbed a few pages, came last night and grabbed a few more. Typically, deepbot comes and grabs them all within a day.

Critter

5:52 pm on May 22, 2003 (gmt 0)

10+ Year Member



Heya BigDave:

With three billion pages to crawl every three days that comes to approximately 11,000 pages per second, with bandwidth usage around 113 Mb/s (or approx 1Gb/sec) (assuming 10K of text per page (probably a little high)) --well within reach of a distributed solution. *My* server in the basement can serve up over 3,000 pages *per second*.

Given 10,000 machines for storage and distribution, you're only hitting on average 1 machine per second to crawl the entire web.

This is, of course, simplified; but crawling the entire web in three days is entirely doable from a bandwidth/retrieval standpoint with a distributed solution.

If they're gonna do 10 billion pages then multipy above by 3.5 :)

Peter

This 211 message thread spans 15 pages: 211