Forum Moderators: open
I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).
It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.
In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.
I'm interested in hearing from others who are seeing the same.
TJ
That way deepbot and freshbot can just be left running and there's no "google-dance" required - the PR iterations can be done on seperate machines on deepbot data, then the whole lot is drawn in by freshbot alongside it's normal rounds.
That way, minty fresh and a more regular cycle. Freshbot also knows better than deepbot which pages have actually changed.
A merging of the algorithms if you like.
TJ
Three Billion web pages.
I know, it's quite amazing but they do manage it!
Having over 200,000 (or whatever it is) PC's in a distributed network does help of course...
I wasn't trying to take anything away from the achievement google have made, but what I perhaps should have said is ".... it would not require much in the way of additional resources to do this...."
TJ
Well, for my site (over 100 pages), only 2 pages get crawled once a few days... googlebot never crawled more than 7 pages / day... And I don't see it that often... what would bring in count for if googlebot "likes" the site or not?
I noticed crawler10 in my logs today crawling pages that I know are not in the index (pages were created yesterday), so do therefore, have no settled PR.
IP address was 64******* which, as has been pointed out is supposedly the freshbot.
To me this means, that either deepbot is out and about masquerading as freshbot, or that deep and fresh are now one and the same..
If freshbot is supposed to be taking over for deepbot then it is utterly lame. I had been assuming that Google was crawling less because it realized it didn't have the resources to do what it was trying to do. I'd rather see freshbot disappear and deepbot actually do the thing right than have freshbot mucking everything up.
At this point, freshbot has shown no ability to do what deepbot was able to do circa December/January.
[webmasterworld.com...]
I don't see what this indicates...freshbot has always been about finding new links and new pages.
I agree. I have been playing with freshbot for few months and seems it picks up everything what is linked from page considered by bot to be important regardless if the linked page is old or brand new. I can't see any evidence of deepbot like behaviour (i.e. spidering low PR pages on new domains)
XM
What was concluded in this thread (from evidence left in site logs) is that, for this particular update, freshbot was crawling the April deepbot data and not the sites direct.
I think it was generally agreed that this was a means of getting the rolled-back index (possibly March or Feb.) that was the basis for Dominic reasonably up to date. In other words, they freshbotted in the April deepcrawl data.
We noticed, GG responded to say "glad you noticed".
There is nothing to indicate at the moment that this is the new way for the future. Only speculation (by myself included).
Fun though speculation is, it's also quite dangerous.
TJ
However, I notice that the date format for the fresh tags on -sj and -fi and others is now the old US format of May 27, 2003 instead. I'm in the UK. Another subtle change. These always had 27 May 2003 format dates before.
So, we are now saying that we are not going to see deepbot anymore i.e. We will only see freshbots. which means the monthly update is not going to take place again...
darkroom,
Googleguy has said in the follwing thread that we should expect "at least another update of the form where the crawl/index cycle finishes and then data centers are updated in the traditional dance."
[webmasterworld.com...]
(search this thread for the two references to "traditional update")
Beth
I am glad you pointed this out again.
It seems many keep wondering if we are now in some sort of perpetual update process while GoogleGuy plainly stated that we should expect at least one more traditional update (made it sound like many more as the system is new).
It's almost as though every other post should have this dropped in.
IMHO, we will see a huge change in SERPS after the next update when we see some recent backlink data.
Before this time the Googlebot (fresh I guess) was simply retrieving pages that had been crawled in April.
So...this leads me to believe that the deepcrawl has started (at least for me), although it may have started some time ago with higher PR sites.
TJ, have any comments on your logs?
Peter
it's getting pages that have never been gotten by Google before
I've been seeing this from freshie too, the last couple of days, but it still seems pretty slack, i.e. not finding all of the new pages and not anywhere near as busy as a deepcrawl. Maybe it will be a slow fresh deepcrawl and find them all soon, I hope :-)
Strange though ... On websites where freshbot is picking up some new links that I started acquiring, the websites that the links are on are showing up under keyword searches that my links are targetted toward, rather than my site which the links are pointing to ... This should mean that the deep crawl has yet happened?
I wondering if there is now going to be some sort of "double crawl" per index cycle ...
Yes, this actually started for us on Friday, although we didn't notice that she was picking up stuff that was not in April deepcrawl data until Saturday.
I didn't post because, to be completely honest, I'm a little bored with google now and will just wait until it's settled.
I can confirm your findings though, although whether or not it's anything out of the usual for freshbot is a little less clear. She's been known to go very deep in the past.
TJ