Welcome to WebmasterWorld Guest from 184.108.40.206
Forum Moderators: open
I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).
It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.
In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.
I'm interested in hearing from others who are seeing the same.
Sorry, I'm not quite with you - can you explain a bit more?
I think you're saying that freshie is crawling pages on a site that are not linked from other pages on that site?
I don't quite follow how deepbot would have found them in April?
<EDIT: re-read your post and understand now. It hasn't followed either old-index links or other fresh links to get there, the only place could be from April deep crawl links. OK, that's interesting.....>
No, I mean that the pages fresh is getting right now can be found by crawling my site--but fresh has only got 100 or so pages, starting today. I've never seen the freshbot before.
When fresh first started crawling today she was getting pages that are 'deep' in the site, she didn't start with my home page (well she did, but the subsequent pages can't be gotten to from the home page).
So fresh must be crawling pages that deep got in April.
What you said made me revisit the logs with a slightly different view and you're absolutely right.
There are some pages for which my site is in the same boat as yours - freshie simply could not have got there by following existing fresh links.
So, this raises another "what is going on here then" type question.
Is it possible that google is using the freshbot "method" as a means of getting the April deepcrawl data into the "new" index I wonder?
I'm looking at a log that contains April data. And FB is requesting the pages in identical order sa last month.
But what isn't clear is whether or not it really is FB. It could simply be DP running from IP's that have been FB in the past.
I think things will be much clearer tomorrow. If this crawl is really part of the fresh system, we should all end up pretty happy by this time tomorrow.
It could simply be DP running from IP's that have been FB in the past.
Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot? Would explain the timeframe I guess, but seems like a lot of work and I don't really see why they'd do it that way around.
As you say, we'll perhaps know more tomorrow.
My instinct is google is now using freshie as a means of introducing the April deepcrawl pages into the index rather than having to update all of the indexes to a single new build.
But heck, it's all speculation until there's a fat lady somewhere singing her heart out!
Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot?
These machines would most likely do network boots. change an entry in your DHCP server, send a remote reboot and the bot machine comes up as a totally different beast.
I know for a fact that google uses PXE to network boot many of their systems. I cannot say for certain that they do it with their Googlebot machines, but if they don't, they should.
As I mentioned in another thread, I have seen several of my April deep crawl pages in the index. I had an error in my navigation bar that was only htere for a very short time during the deep crawl. I have found four of those pages "cached" in google. One of these was in before freshbot hit it yesterday.
It has not yet changed over to the freshbotted version from yesterday.
About 1/3 of my pages that I have looked at are versions that were freshbotted since the last deep crawl, and have been move in permanently. I made a change in late April that I have looked for by viewing source of the cache.
joined:Nov 20, 2000
That's the moot point... the missing data is probably the cause of the diminished quality of the Google dbase.
Just slapping the missing pages back in the index anywhere is not the issue. It's getting the quality pages ranked properly again, with a 'proper' PR and relevancy calculation.
Of course, as speculated, FB could now have the facility to acheive all that. In other words, re-ranking on the fly, without an entire (monthly) re-calcuation for the whole database.
Sadly, that's all it is... speculation. However, something has to give soon with this index in terms of the backlink data, to align with GG's comments (weeks not months: and we are weeks into this thing already)
But I'm glad that people have noticed that freshie has been crawling deeply in addition to normal freshie duties.
joined:Nov 20, 2000
I'm not talking the fresh stuff here, which is (or at least was) relatively superficial and short term in the ranking system (come and go). I'm talking stable ranking.
It's this data that has surely caused most of those more recent, but high quality, sites to go/fall. I assume that this will NOT be tomorrow, but "more than weeks, less than months".
That timescale actually sounds like the next mid-June cycle to me (assuming there is one anymore), with the new algo implementation replacing the mid-May cycle.
P.S. About the questions you raised about country language/tld/redirection, I think we've got some changes scheduled soon (next week) that should bring it in line with what many users expect again. Just wanted to let you know that that was coming too.
GoogleGuy says that the index is in all servers.
What does it means?
I've been continuosly checking google by searching for 'homeandoutdoors' which is the name of one of my sites, and since it only returns results from that site, it is easier for me to check the number of pages indexed.
When I perform that search in different datacenters I get different results:
www -> 2440 pages
www2 -> 272 pages
www3 -> 272 pages
www-cw -> 2360 pages
www-fi -> 2200 pages
www-sj -> 272 pages
www-ex -> 316 pages
So it seems that "the index is the same in all servers" is not what I understand. There is something that I don't know about. Perhaps these are not the servers googleguy says, or perhaps the meaning of 'index' is different from what I supposed.
joined:Nov 20, 2000
I think someone mentioned the other day that predictability is always helpful for business. This at least gives us some general direction.
You said before that backlinks would be factored in "gradually" and "overtime". That implied continual updating of backlinks over a long period of time. Now it sounds like you are saying that the new data (backlinks) will be factored in all at once, in a few weeks (more than weeks, less than months). Thus, we are stuck with the current backlink count (and missing anchor text) until around mid-June. Is this a correct interpretation?
Again, just my opinion. right or wrong, a pizza place isn't gonna swap me a 12 inch pepperoni for it.
Not quite sure where everyone got the idea that we were going to have some sort of continual update process. Seems the one they have in place makes more sense, technology wise
No one knows what Google is doing. Everyone has had different interpretations and opinions. Hence, 9 update threads with thousands of posts. Regardless, Googleguy's use of the words "gradually" and "overtime" implied to a number of people a slow, continual update of backlinks. Not just one dump of new backlinks 6 weeks later (i.e., a regular monthly update). Besides, if Inktomi and Alltheweb are able to do continual updates, who is to say Google can't (technologically-speaking)?
Reminds me of the old school saw: "The day you decide to turn over a new leaf is the day the teacher decides to make an example out of you."
You are right that Google is surely capable of doing many things.
IMO, if anything, they have shown how difficult it is to bring in backlinks rather than a shift towards more fluidity. If not, why else would we be seeing Feb data?
Thanks GoogleGuy for clearing everything up and letting us know to expect a regular update at some point with recent backlinks. I think this is the info we have been waiting for.