Forum Moderators: Robert Charlton & goodroi
The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.
OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)
Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.
So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.
Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.
OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.
The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.
Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......
For confirmation that this is a whole new build of the index MC said on his blog:-
"the test data center certainly has some different crawling and indexing characteristics."
OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.
Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.
What are other observations people have seen with the new crawling and indexing on this test dc.
Well, I just happened to be looking through my logs and I saw Mozilla Googlebot looking for this URL on my website:
[foo.com...]
GOOGLE404probe... Hmm, good thing I returned a 404 this time. That bot does nothing but look for trouble!
Top 40 for full site name :) - although internal page outranks the homepage - I guess that is progress :)
Also a site only crawled by Mozilla Googlebot makes it into the serps (250 pages)
Plenty of pages crawled by Mozilla Googlebot have not made into the serps - so I still wonder if G are using a sample data set.
I think it is very much still in development.
Not sure why so much Mozilla Googlebot crawling is necessary for the amount of pages added.
Some people have said that pages need to be crawled 3 times before being added (I cant verify that for my site for the pages not added) - but that does not sound to efficient.
I have deffo got more pages added to the test dc - however, Mozilla Googlebot has crawled about 10-20 times more that have not been added.
But at least now I will welcome Mozilla Googlebot with open arms - rather than shake my head in desperation at the activity of the bot.
Well maybe - there have been so many rumours about the bot.
I guess that it will be a while before we see the full impact of the test DC anyway - eg MC talked about progress on 301s, canonicals etc, and that the DC will roll out in a matter of months.
So all the hoped for improvements the test DC may bring will not happen overnight I guess.
>>I guess that it will be a while before we see the full impact of the test DC anyway - eg MC talked about progress on 301s, canonicals etc, and that the DC will roll out in a matter of months. <<
I'm an optimistic reseller, you know :-)
I see Google resolving most of those matters once we reach March 2006.
Meanwhile, I'm gonna write an e-Book;
Breaking Google's Douplicates Code in 30 minutes
Of course, Inigo would get a free example of my e-Book for review on his blog :-)
Also some sites which were stopped from crawling by googlebot and went as supplements are now back in gozilla bot index,
dont know how it will workout and what google's plans are,
One thing I would like to know from google what has been the problem and why has taken over a year to (fix) if it fixet.
[edited by: zeus at 11:10 am (utc) on Dec. 21, 2005]
Although some sites might be back - others are only just starting to come back - so maybe that is what will delay the full roll out of the index - eg - it is still being built.
MC has talked about it taking months to roll out :/
I agree Zeus it does not look like the sites that have come back have regained there full power yet.
Not sure what you mean - the "Big Daddy" update is only just beginning. :)
PS. I agree I dont think Jagger was really ever over - but "Big Daddy" looks like it will replace it anyway - so Jagger may just have been groundwork.
1. Am seeing competitor sites that have been previously banned (removed from the index) back in and ranking
2. Looks like they haven't switched on the "don't let sitewide bought links work" button yet - this type of link buying seems to be working a treat on this test DC
I'm sure there is a long way to go yet on this DC - is GoogleGuy asking for any feedback on it yet?
When looking for a three keyword phrase a result from my page is showing up with
- an old abandoned domain-adress (redirected from www.widget.com to www.widdget.com since Allegra)
- a filename instead of the rewritten url (filename is 301-redirected since Allegra and forbidden by robots.txt)
This result is not marked as supplemental
When doing a search for site:www.widget.com (old) still > 8.000 results show up. None of them is marked as supplemental. However these are old results from last November. No cache dates are shown.
Interesting - but up to now not very encouraging.
They show a crawl date like:
X-Google-Crawl-Date: Thu, 25 Nov 2004 04:38:32 GMT
That date is different from cache date shown in the first line.
[edit]oops - that date is not shown with any cached page. I can see that date only when searching for site:www.olddomainname.xy[/edit]
[edit 2]Sticky me if you like to see a screenshot[/edit]
[edited by: taps at 12:16 pm (utc) on Dec. 21, 2005]
If only they would use all the flippin data crawled by Mozilla Googlebot - not just a tiny sample. Loads of hits again by Moz bot last night and I will probably see 1 page make this test index :(
EG:-
- Lots of Mozilla Bot crawling - I was hoping that this would be added.
- MC confirmed in his blog that a site ordering had occured during Jagger - sites which have not been ordered are still not correctly ordered on the test DC.
- Homepages that have been re-found by Googlebot still dont rank.
Would be great to get some feedback from GG on some of these points.
Four other 64.* DC's show the Jagger updated BL, and PR.
The rest show Jagger PR, but NEW BL figures, and this is the majority of DC's.
In conclusion, a very mixed bag, with no certain direction :-)
But it looks like we might be moving in the right direction - when did your site go missing?
steve