Forum Moderators: Robert Charlton & goodroi
The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.
OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)
Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.
So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.
Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.
OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.
The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.
Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......
For confirmation that this is a whole new build of the index MC said on his blog:-
"the test data center certainly has some different crawling and indexing characteristics."
OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.
Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.
What are other observations people have seen with the new crawling and indexing on this test dc.
Hmmmzz.
Yes Mozilla Googlebot does tend to share the same ip as Mediapartners-Google.
Please come back test DC! We(I) miss you and the hope you may bring.
It sure will be nice to see Google traffic back at the website that Google lost for 30 days in Bourbon and again in Jagger1.
Mmmm.... six thousand missing visitors a day returning...
Andrea
[edited by: lawman at 11:47 am (utc) on Dec. 19, 2005]
[edit reason] No Links To User's Website [/edit]
Back to Mozilla Googlebot and the New Index (that is not at the DC at the moment)
Mozilla Googlebot active again last night after a couple of days off - shame the test data is not visible at the moment.
The test results have not been available for about 36-48 hours.
Google could complete these crawls if web hosts supported GZIP compression saving Google (and your server) 3 to 4 times the bandwidth. Perhaps even with the few hosts that do support GZIP compression Google, with the new Mozilla bot, actually does complete a crawl, before building a new index, and Yipee, all our pages are properly indexed in the "Test" data center. Unfortunately they still have to crawl with the old bot to check for "cloakers", webmasters that serve different GZIP compressed content versus uncompressed content.
If your webhost supported GZIP perhaps all our webpages would make it into the next index update!
Check your website logs, look at the file size loaded by the new Mozilla Googlebot versus the old Googlebot for one of your larger pages. You may find the file size is 3 to 4 times smaller with the Mozilla Googlebot. If this is so, GREAT, you support GZIP compression! Most browsers request GZIP compressed pages (unless you have Norton Internet Security), your server may optionally comply serving GZIP compressed content or spew out 4 times as many bytes serving uncompressed content.
There are complications with this, but probably only for a few major players, Brett can tell you! If Brett could serve compressed content, (perhaps with a new version of APACHE set to level 1 (lowest) compression) he might not have had to shut off all the search engines in his robots.txt (YET).
www.anothersite.tk.mysite.com/ is listed in a site:mysite.com command search yet there are no subdomains whatsoever on that mysite.com domain.
note the cache date on these type pages is from July and they are listed as supplementals - however they just showed up a few days ago and do NOT show for a site:www.mysite.com command.
I did report it using a spamreport so that Google can see the issue but figured I might as well get it out in the open for discussion
1. There are no supplemental results for my domain,
2. Number of links updated...
For the last three days, those DCs are showing the same old mixed-up crud that I've been viewing for months. I hope those good results I was seeing are residing somewhere and will make an appearance again soon! At least then, I would get some hope back. :-(
Unfortunately 64.233.179.104 isn't doing it for me either.
You want a 4 on the end of that my friend:-
[64.233.179.104...]
However, the DC does not show test data at the moment as far as I can see - I must have missed it :(
good morning reseller - I certainly agree and look forward to seeing a great set of results out of this testing phase
having so many millionares in one company the googleplexers might have already started their christmas break!
do you guys get long breaks over christmas in the US?
In Aus we practically shut down in the week between christmas and the new year...
I am pretty sure that it is not showing test data at the moment.
Although there are some oddities in the DC.