Both crawl and crawler are now coming from freshbot IPs. You will drive yourself crazy trying to figure out what is going on now based on past experience.
Because this last update was kind of strange, is that why this page hasn't been updated?
I had bookmarked it and was trying to plot the Dominic date. Is there an agreed upon date for Dominic yet?
I would have thought it's pretty easy to store deepbot's crawl results, get freshbot to crawl those, grabbing the PR and backlinks info along the way and shoving into the index on the fly as it already does.
That way deepbot and freshbot can just be left running and there's no "google-dance" required - the PR iterations can be done on seperate machines on deepbot data, then the whole lot is drawn in by freshbot alongside it's normal rounds.
That way, minty fresh and a more regular cycle. Freshbot also knows better than deepbot which pages have actually changed.
A merging of the algorithms if you like.
>> I would have thought it's pretty easy... <<
Three Billion web pages.
Freshbot indexes normally do not update the cache. Or do they?
Yes, they have for a while. I wouldn't pay much attention to the cache Google shows.
I know, it's quite amazing but they do manage it!
Having over 200,000 (or whatever it is) PC's in a distributed network does help of course...
I wasn't trying to take anything away from the achievement google have made, but what I perhaps should have said is ".... it would not require much in the way of additional resources to do this...."
Can anyone confirm that the fresh bots still only like >=PR4 pages? Cause if this is true, then there is no hope for the new sites to ever get into the index... assuming there is no more deep crawl...
Well, for my site (over 100 pages), only 2 pages get crawled once a few days... googlebot never crawled more than 7 pages / day... And I don't see it that often... what would bring in count for if googlebot "likes" the site or not?
I noticed crawler10 in my logs today crawling pages that I know are not in the index (pages were created yesterday), so do therefore, have no settled PR.
IP address was 64******* which, as has been pointed out is supposedly the freshbot.
To me this means, that either deepbot is out and about masquerading as freshbot, or that deep and fresh are now one and the same..
I would agree. I put up a link to a ficticious page in order to see if FB would grab it. It did, which would indicate that this FB crawl from 64.68* is actually the deep crawl.
That's an interesting find. I think GG mentioned that he was "pleased" to see that people are noticing the freshbot acting like deepbot.
My best guess is we are in the middle of the deepcrawl and (almost) no one seems to be taking notice!
Prior to two weeks ago, freshbot hit me everyday this year (usually for 20% to 40% of my site) except two. No freshbot three times this week, and only one day did it take over 10% of my site.
If freshbot is supposed to be taking over for deepbot then it is utterly lame. I had been assuming that Google was crawling less because it realized it didn't have the resources to do what it was trying to do. I'd rather see freshbot disappear and deepbot actually do the thing right than have freshbot mucking everything up.
At this point, freshbot has shown no ability to do what deepbot was able to do circa December/January.
|I would agree. I put up a link to a ficticious page in order to see if FB would grab it. It did, which would indicate that this FB crawl from 64.68* is actually the deep crawl. |
I don't see what this indicates...freshbot has always been about finding new links and new pages. Please elaborate.
I've had close to 2500 visits from freshbots (64.68.*) in one hour time last night.
I've never seen a freshbot THAT hungry (and fast)! :)
It definitely looks like a deepcrawl.
This thread might be of interest, where the Fresh/deep convergence was discussed back in March:
My site gets crawled about 3/4 times a day. I have also noticed that FB seems to do some sort of directory browsing because it picked up a ton of pages I have in a test called "test1" on my server of which there are 0 links to.
Kind of a problem really.
just a guess, but maybe you followed a link from one of your test pages which ended up in some else's referrer logs, which google crawled?
|I don't see what this indicates...freshbot has always been about finding new links and new pages. |
I agree. I have been playing with freshbot for few months and seems it picks up everything what is linked from page considered by bot to be important regardless if the linked page is old or brand new. I can't see any evidence of deepbot like behaviour (i.e. spidering low PR pages on new domains)
Maybe the guys at G-plex just reassigned IPs. So the IP addresses (64..) we've attributed to Freshbot now belong to Deepbot.
Remember guys that GG only stated that he was "glad that people noticed freshie behaving like deepbot" and not that this was going to be the new way from now on.
What was concluded in this thread (from evidence left in site logs) is that, for this particular update, freshbot was crawling the April deepbot data and not the sites direct.
I think it was generally agreed that this was a means of getting the rolled-back index (possibly March or Feb.) that was the basis for Dominic reasonably up to date. In other words, they freshbotted in the April deepcrawl data.
We noticed, GG responded to say "glad you noticed".
There is nothing to indicate at the moment that this is the new way for the future. Only speculation (by myself included).
Fun though speculation is, it's also quite dangerous.
Hmm, with the 23 May 2003 fresh tags, my #1 and #2 sites suddenly dropped to #64 and #6. Both are index.html pages that have been top of the rankings for nearly 2 months (occasionally dropping 1 or 2 places with another fresh site temporarily placed higher), and were new sites first published late March. I had just changed the #2 page to have two outgoing links (the new one to another new site) instead of only one to the, as was, #1 page. At the same time the #1 page dropped from PR6 to PR4. I removed the link and yesterday these pages bounced back to #14 and #1 with 27 May 2003 fresh tags. I don't know if the link had anything to do with it or whether it was a Google burp; but this was the first time in ages the sites changed position by more than 2 places.
However, I notice that the date format for the fresh tags on -sj and -fi and others is now the old US format of May 27, 2003 instead. I'm in the UK. Another subtle change. These always had 27 May 2003 format dates before.
|So, we are now saying that we are not going to see deepbot anymore i.e. We will only see freshbots. which means the monthly update is not going to take place again... |
Googleguy has said in the follwing thread that we should expect "at least another update of the form where the crawl/index cycle finishes and then data centers are updated in the traditional dance."
(search this thread for the two references to "traditional update")
I am glad you pointed this out again.
It seems many keep wondering if we are now in some sort of perpetual update process while GoogleGuy plainly stated that we should expect at least one more traditional update (made it sound like many more as the system is new).
It's almost as though every other post should have this dropped in.
IMHO, we will see a huge change in SERPS after the next update when we see some recent backlink data.
Deepbot (from freshbot IPs) is definately in my site now, because it's getting pages that have never been gotten by Google before.
Before this time the Googlebot (fresh I guess) was simply retrieving pages that had been crawled in April.
So...this leads me to believe that the deepcrawl has started (at least for me), although it may have started some time ago with higher PR sites.
TJ, have any comments on your logs?
|it's getting pages that have never been gotten by Google before |
I've been seeing this from freshie too, the last couple of days, but it still seems pretty slack, i.e. not finding all of the new pages and not anywhere near as busy as a deepcrawl. Maybe it will be a slow fresh deepcrawl and find them all soon, I hope :-)
Some new pages, but that doesn't mean it's deepcrawl-style. To my memory, freshbot typically used to catch some new pages too. So maybe it's just freshbot's functionality being brought back.
Yes, the freshboy(64.68.XX.XX) has grasped hundreds of my new webpages. I believe this kind of behaviour should be "deep crawler"!
Not only is freshbot following deeper than most times before, it is also following too ...
Strange though ... On websites where freshbot is picking up some new links that I started acquiring, the websites that the links are on are showing up under keyword searches that my links are targetted toward, rather than my site which the links are pointing to ... This should mean that the deep crawl has yet happened?
I wondering if there is now going to be some sort of "double crawl" per index cycle ...
Yes, this actually started for us on Friday, although we didn't notice that she was picking up stuff that was not in April deepcrawl data until Saturday.
I didn't post because, to be completely honest, I'm a little bored with google now and will just wait until it's settled.
I can confirm your findings though, although whether or not it's anything out of the usual for freshbot is a little less clear. She's been known to go very deep in the past.
steveb, we're just desperate for a deepcrawl and a new index... anything looks good right now. You're right, though, freshie was the first to pick my site up a couple of weeks after it went online last year.
Picking up new pages? So what? Freshbot has always picked up any new page I have put up.
| This 211 message thread spans 8 pages: < < 211 ( 1 2 3 4  6 7 8 ) > > |