homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 211 message thread spans 8 pages: < < 211 ( 1 2 3 4 [5] 6 7 8 > >     
Is Freshbot now Deepbot?
The line is getting drawn ever thinner
trillianjedi




msg:104752
 4:18 pm on May 22, 2003 (gmt 0)

I've seen several postings about this now in the last few days, although this is my first actual experience of it.

I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).

It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.

In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.

I'm interested in hearing from others who are seeing the same.

TJ

 

BigDave




msg:104872
 2:17 am on May 26, 2003 (gmt 0)

Both crawl and crawler are now coming from freshbot IPs. You will drive yourself crazy trying to figure out what is going on now based on past experience.

dvduval




msg:104873
 10:21 pm on May 26, 2003 (gmt 0)

Because this last update was kind of strange, is that why this page hasn't been updated?
[webmasterworld.com ]
I had bookmarked it and was trying to plot the Dominic date. Is there an agreed upon date for Dominic yet?

trillianjedi




msg:104874
 11:05 pm on May 26, 2003 (gmt 0)

I would have thought it's pretty easy to store deepbot's crawl results, get freshbot to crawl those, grabbing the PR and backlinks info along the way and shoving into the index on the fly as it already does.

That way deepbot and freshbot can just be left running and there's no "google-dance" required - the PR iterations can be done on seperate machines on deepbot data, then the whole lot is drawn in by freshbot alongside it's normal rounds.

That way, minty fresh and a more regular cycle. Freshbot also knows better than deepbot which pages have actually changed.

A merging of the algorithms if you like.

TJ

g1smd




msg:104875
 11:13 pm on May 26, 2003 (gmt 0)

>> I would have thought it's pretty easy... <<

Three Billion web pages.

shrirch




msg:104876
 3:17 am on May 27, 2003 (gmt 0)

Freshbot indexes normally do not update the cache. Or do they?

parabola




msg:104877
 3:39 am on May 27, 2003 (gmt 0)

Yes, they have for a while. I wouldn't pay much attention to the cache Google shows.

trillianjedi




msg:104878
 9:55 am on May 27, 2003 (gmt 0)

Three Billion web pages.

I know, it's quite amazing but they do manage it!

Having over 200,000 (or whatever it is) PC's in a distributed network does help of course...

I wasn't trying to take anything away from the achievement google have made, but what I perhaps should have said is ".... it would not require much in the way of additional resources to do this...."

TJ

dididudu




msg:104879
 7:18 pm on May 27, 2003 (gmt 0)

Can anyone confirm that the fresh bots still only like >=PR4 pages? Cause if this is true, then there is no hope for the new sites to ever get into the index... assuming there is no more deep crawl...

Well, for my site (over 100 pages), only 2 pages get crawled once a few days... googlebot never crawled more than 7 pages / day... And I don't see it that often... what would bring in count for if googlebot "likes" the site or not?

onionrep




msg:104880
 7:26 pm on May 27, 2003 (gmt 0)

Hi dididudu

I noticed crawler10 in my logs today crawling pages that I know are not in the index (pages were created yesterday), so do therefore, have no settled PR.

IP address was 64******* which, as has been pointed out is supposedly the freshbot.

To me this means, that either deepbot is out and about masquerading as freshbot, or that deep and fresh are now one and the same..

WebGuerrilla




msg:104881
 10:22 pm on May 27, 2003 (gmt 0)


I would agree. I put up a link to a ficticious page in order to see if FB would grab it. It did, which would indicate that this FB crawl from 64.68* is actually the deep crawl.

parabola




msg:104882
 10:33 pm on May 27, 2003 (gmt 0)

That's an interesting find. I think GG mentioned that he was "pleased" to see that people are noticing the freshbot acting like deepbot.

My best guess is we are in the middle of the deepcrawl and (almost) no one seems to be taking notice!

steveb




msg:104883
 11:25 pm on May 27, 2003 (gmt 0)

Prior to two weeks ago, freshbot hit me everyday this year (usually for 20% to 40% of my site) except two. No freshbot three times this week, and only one day did it take over 10% of my site.

If freshbot is supposed to be taking over for deepbot then it is utterly lame. I had been assuming that Google was crawling less because it realized it didn't have the resources to do what it was trying to do. I'd rather see freshbot disappear and deepbot actually do the thing right than have freshbot mucking everything up.

At this point, freshbot has shown no ability to do what deepbot was able to do circa December/January.

Dolemite




msg:104884
 11:35 pm on May 27, 2003 (gmt 0)

I would agree. I put up a link to a ficticious page in order to see if FB would grab it. It did, which would indicate that this FB crawl from 64.68* is actually the deep crawl.

I don't see what this indicates...freshbot has always been about finding new links and new pages. Please elaborate.

hetzeld




msg:104885
 9:39 am on May 28, 2003 (gmt 0)

I've had close to 2500 visits from freshbots (64.68.*) in one hour time last night.
I've never seen a freshbot THAT hungry (and fast)! :)
It definitely looks like a deepcrawl.

Dan

Adam_C




msg:104886
 10:25 am on May 28, 2003 (gmt 0)

This thread might be of interest, where the Fresh/deep convergence was discussed back in March:

[webmasterworld.com...]

ncsuk




msg:104887
 10:29 am on May 28, 2003 (gmt 0)

My site gets crawled about 3/4 times a day. I have also noticed that FB seems to do some sort of directory browsing because it picked up a ton of pages I have in a test called "test1" on my server of which there are 0 links to.

Kind of a problem really.

vincevincevince




msg:104888
 10:38 am on May 28, 2003 (gmt 0)

ncsuk....
just a guess, but maybe you followed a link from one of your test pages which ended up in some else's referrer logs, which google crawled?

x_m




msg:104889
 10:53 am on May 28, 2003 (gmt 0)

I don't see what this indicates...freshbot has always been about finding new links and new pages.

I agree. I have been playing with freshbot for few months and seems it picks up everything what is linked from page considered by bot to be important regardless if the linked page is old or brand new. I can't see any evidence of deepbot like behaviour (i.e. spidering low PR pages on new domains)

XM

vik_c




msg:104890
 7:17 am on May 29, 2003 (gmt 0)

Maybe the guys at G-plex just reassigned IPs. So the IP addresses (64..) we've attributed to Freshbot now belong to Deepbot.

trillianjedi




msg:104891
 12:09 pm on May 29, 2003 (gmt 0)

Remember guys that GG only stated that he was "glad that people noticed freshie behaving like deepbot" and not that this was going to be the new way from now on.

What was concluded in this thread (from evidence left in site logs) is that, for this particular update, freshbot was crawling the April deepbot data and not the sites direct.

I think it was generally agreed that this was a means of getting the rolled-back index (possibly March or Feb.) that was the basis for Dominic reasonably up to date. In other words, they freshbotted in the April deepcrawl data.

We noticed, GG responded to say "glad you noticed".

There is nothing to indicate at the moment that this is the new way for the future. Only speculation (by myself included).

Fun though speculation is, it's also quite dangerous.

TJ

g1smd




msg:104892
 12:24 am on May 30, 2003 (gmt 0)

Hmm, with the 23 May 2003 fresh tags, my #1 and #2 sites suddenly dropped to #64 and #6. Both are index.html pages that have been top of the rankings for nearly 2 months (occasionally dropping 1 or 2 places with another fresh site temporarily placed higher), and were new sites first published late March. I had just changed the #2 page to have two outgoing links (the new one to another new site) instead of only one to the, as was, #1 page. At the same time the #1 page dropped from PR6 to PR4. I removed the link and yesterday these pages bounced back to #14 and #1 with 27 May 2003 fresh tags. I don't know if the link had anything to do with it or whether it was a Google burp; but this was the first time in ages the sites changed position by more than 2 places.

However, I notice that the date format for the fresh tags on -sj and -fi and others is now the old US format of May 27, 2003 instead. I'm in the UK. Another subtle change. These always had 27 May 2003 format dates before.

bether2




msg:104893
 12:36 am on May 30, 2003 (gmt 0)

So, we are now saying that we are not going to see deepbot anymore i.e. We will only see freshbots. which means the monthly update is not going to take place again...

darkroom,

Googleguy has said in the follwing thread that we should expect "at least another update of the form where the crawl/index cycle finishes and then data centers are updated in the traditional dance."

[webmasterworld.com...]
(search this thread for the two references to "traditional update")

Beth

mfishy




msg:104894
 1:50 am on May 30, 2003 (gmt 0)

bether2,

I am glad you pointed this out again.

It seems many keep wondering if we are now in some sort of perpetual update process while GoogleGuy plainly stated that we should expect at least one more traditional update (made it sound like many more as the system is new).
It's almost as though every other post should have this dropped in.

IMHO, we will see a huge change in SERPS after the next update when we see some recent backlink data.

Critter




msg:104895
 11:38 pm on Jun 1, 2003 (gmt 0)

Deepbot (from freshbot IPs) is definately in my site now, because it's getting pages that have never been gotten by Google before.

Before this time the Googlebot (fresh I guess) was simply retrieving pages that had been crawled in April.

So...this leads me to believe that the deepcrawl has started (at least for me), although it may have started some time ago with higher PR sites.

TJ, have any comments on your logs?

Peter

Stefan




msg:104896
 11:53 pm on Jun 1, 2003 (gmt 0)

it's getting pages that have never been gotten by Google before

I've been seeing this from freshie too, the last couple of days, but it still seems pretty slack, i.e. not finding all of the new pages and not anywhere near as busy as a deepcrawl. Maybe it will be a slow fresh deepcrawl and find them all soon, I hope :-)

Clark




msg:104897
 12:03 am on Jun 2, 2003 (gmt 0)

Some new pages, but that doesn't mean it's deepcrawl-style. To my memory, freshbot typically used to catch some new pages too. So maybe it's just freshbot's functionality being brought back.

hotice_2002




msg:104898
 1:50 am on Jun 2, 2003 (gmt 0)

Yes, the freshboy(64.68.XX.XX) has grasped hundreds of my new webpages. I believe this kind of behaviour should be "deep crawler"!
lol

catch2948




msg:104899
 3:35 am on Jun 2, 2003 (gmt 0)

Not only is freshbot following deeper than most times before, it is also following too ...

Strange though ... On websites where freshbot is picking up some new links that I started acquiring, the websites that the links are on are showing up under keyword searches that my links are targetted toward, rather than my site which the links are pointing to ... This should mean that the deep crawl has yet happened?

I wondering if there is now going to be some sort of "double crawl" per index cycle ...

trillianjedi




msg:104900
 10:31 am on Jun 2, 2003 (gmt 0)

Hiya Critter.

Yes, this actually started for us on Friday, although we didn't notice that she was picking up stuff that was not in April deepcrawl data until Saturday.

I didn't post because, to be completely honest, I'm a little bored with google now and will just wait until it's settled.

I can confirm your findings though, although whether or not it's anything out of the usual for freshbot is a little less clear. She's been known to go very deep in the past.

TJ

Stefan




msg:104901
 10:57 am on Jun 2, 2003 (gmt 0)

steveb, we're just desperate for a deepcrawl and a new index... anything looks good right now. You're right, though, freshie was the first to pick my site up a couple of weeks after it went online last year.

mfishy




msg:104902
 11:31 am on Jun 2, 2003 (gmt 0)

Picking up new pages? So what? Freshbot has always picked up any new page I have put up.

This 211 message thread spans 8 pages: < < 211 ( 1 2 3 4 [5] 6 7 8 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved