Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).
It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.
In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.
I'm interested in hearing from others who are seeing the same.
Sorry, I'm not quite with you - can you explain a bit more?
I think you're saying that freshie is crawling pages on a site that are not linked from other pages on that site?
I don't quite follow how deepbot would have found them in April?
<EDIT: re-read your post and understand now. It hasn't followed either old-index links or other fresh links to get there, the only place could be from April deep crawl links. OK, that's interesting.....>
No, I mean that the pages fresh is getting right now can be found by crawling my site--but fresh has only got 100 or so pages, starting today. I've never seen the freshbot before.
When fresh first started crawling today she was getting pages that are 'deep' in the site, she didn't start with my home page (well she did, but the subsequent pages can't be gotten to from the home page).
So fresh must be crawling pages that deep got in April.
What you said made me revisit the logs with a slightly different view and you're absolutely right.
There are some pages for which my site is in the same boat as yours - freshie simply could not have got there by following existing fresh links.
So, this raises another "what is going on here then" type question.
Is it possible that google is using the freshbot "method" as a means of getting the April deepcrawl data into the "new" index I wonder?
I'm looking at a log that contains April data. And FB is requesting the pages in identical order sa last month.
But what isn't clear is whether or not it really is FB. It could simply be DP running from IP's that have been FB in the past.
I think things will be much clearer tomorrow. If this crawl is really part of the fresh system, we should all end up pretty happy by this time tomorrow.
It could simply be DP running from IP's that have been FB in the past.
Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot? Would explain the timeframe I guess, but seems like a lot of work and I don't really see why they'd do it that way around.
As you say, we'll perhaps know more tomorrow.
My instinct is google is now using freshie as a means of introducing the April deepcrawl pages into the index rather than having to update all of the indexes to a single new build.
But heck, it's all speculation until there's a fat lady somewhere singing her heart out!
Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot?
These machines would most likely do network boots. change an entry in your DHCP server, send a remote reboot and the bot machine comes up as a totally different beast.
I know for a fact that google uses PXE to network boot many of their systems. I cannot say for certain that they do it with their Googlebot machines, but if they don't, they should.
As I mentioned in another thread, I have seen several of my April deep crawl pages in the index. I had an error in my navigation bar that was only htere for a very short time during the deep crawl. I have found four of those pages "cached" in google. One of these was in before freshbot hit it yesterday.
It has not yet changed over to the freshbotted version from yesterday.
About 1/3 of my pages that I have looked at are versions that were freshbotted since the last deep crawl, and have been move in permanently. I made a change in late April that I have looked for by viewing source of the cache.
joined:Nov 20, 2000
That's the moot point... the missing data is probably the cause of the diminished quality of the Google dbase.
Just slapping the missing pages back in the index anywhere is not the issue. It's getting the quality pages ranked properly again, with a 'proper' PR and relevancy calculation.
Of course, as speculated, FB could now have the facility to acheive all that. In other words, re-ranking on the fly, without an entire (monthly) re-calcuation for the whole database.
Sadly, that's all it is... speculation. However, something has to give soon with this index in terms of the backlink data, to align with GG's comments (weeks not months: and we are weeks into this thing already)