Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Mozilla Googlebot and the New Index at 64.233.179.104

Moved on from Jagger

         

Dayo_UK

9:58 am on Dec 13, 2005 (gmt 0)



OK - Jagger is over - long live "Big Daddy" - as named by MC for the test DC.

The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.

OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)

Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.

So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.

Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.

OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.

The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.

Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......

For confirmation that this is a whole new build of the index MC said on his blog:-

"the test data center certainly has some different crawling and indexing characteristics."

OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.

Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.

What are other observations people have seen with the new crawling and indexing on this test dc.

garyr_h

7:26 am on Dec 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry but, seeing a few www.red-widgets-are-cool-and-blue-widgets-are-too-but-cool-is-a-cool-cool.com sites. (yes, that long and that repeated)

Dayo_UK

9:02 am on Dec 15, 2005 (gmt 0)



Well to me the test dc is not showing test results still. Perhaps the longest it has been away - lets hope they are adding more data.

Hmmmzz.

phantombookman

9:09 am on Dec 15, 2005 (gmt 0)

10+ Year Member



I have been thinking that something was afoot for a least a fortnight now.
Normally I build a new page and within 3-4 days it goes top 5 or even #1 in the index.

Recently, just the odd page makes it after a week or so

alvinfic

11:21 am on Dec 15, 2005 (gmt 0)

10+ Year Member



In my log, I am seeing that it is having the same IP address as Adsense's Mediapartners-Google/2.1

Can that be possible? Or am I seeing the wrong one?

Dayo_UK

11:24 am on Dec 15, 2005 (gmt 0)



>>>In my log, I am seeing that it is having the same IP address as Adsense's Mediapartners-Google/2.1

Yes Mozilla Googlebot does tend to share the same ip as Mediapartners-Google.

Please come back test DC! We(I) miss you and the hope you may bring.

BillyS

3:42 pm on Dec 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>In my log, I am seeing that it is having the same IP address as Adsense's Mediapartners-Google/2.1

Sorry, this is pretty common with me too. For some reason they run it from the same IP address.

Will Spencer

3:40 am on Dec 16, 2005 (gmt 0)

10+ Year Member



Perhaps this is why Google is ignoring my repeated reinclusion requests -- because they are attempting to fix all of the 30x series breaks at once.

It sure will be nice to see Google traffic back at the website that Google lost for 30 days in Bourbon and again in Jagger1.

Mmmm.... six thousand missing visitors a day returning...

OvERMiND

9:07 am on Dec 16, 2005 (gmt 0)

10+ Year Member



Hi, my website <snip> has gone down, for some keywords, like "stemma livorno" from #1 to about #650. Even if I search an exact phrase inside a page I result always at last place. Is it a temp problem or I should do something? thanks!

Andrea

[edited by: lawman at 11:47 am (utc) on Dec. 19, 2005]
[edit reason] No Links To User's Website [/edit]

Gimp

9:09 am on Dec 16, 2005 (gmt 0)

10+ Year Member



Andrea,

Do like I do. Panic.

OvERMiND

9:12 am on Dec 16, 2005 (gmt 0)

10+ Year Member



I don't like panic :P

Dayo_UK

9:21 am on Dec 16, 2005 (gmt 0)



OvERMiND - you do not supposed to put urls in posts here - however, if I was you though I would think about making the pages more unique from each other etc.

Back to Mozilla Googlebot and the New Index (that is not at the DC at the moment)

Mozilla Googlebot active again last night after a couple of days off - shame the test data is not visible at the moment.

OvERMiND

9:51 am on Dec 16, 2005 (gmt 0)

10+ Year Member



sorry for the url :-)
I see "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" crawling pages very fast, about 2 page per second.
I see "Googlebot-Image/1.0" too

zeus

10:38 am on Dec 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it looks like they have turned back to old failure results again

Dayo_UK

10:39 am on Dec 16, 2005 (gmt 0)



zeus

The test results have not been available for about 36-48 hours.

bumpski

2:09 pm on Dec 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think since Google introduced "Everflux" they really don't "complete" crawls anymore. They still "build" new indices (snapshots) but some pages have not been crawled since the last index was built so they show up "partially indexed" or URL only (Or now a new state Title Only). The URL only pages tend to be those lowest in page rank, or recently changed in minor ways (size), etc. I know there is no evidence of bad or missed crawls of numerous pages which are randomly URL only for my site(s). (If Google has DNS performance problems there would be no evidence of bad crawls though.)

Google could complete these crawls if web hosts supported GZIP compression saving Google (and your server) 3 to 4 times the bandwidth. Perhaps even with the few hosts that do support GZIP compression Google, with the new Mozilla bot, actually does complete a crawl, before building a new index, and Yipee, all our pages are properly indexed in the "Test" data center. Unfortunately they still have to crawl with the old bot to check for "cloakers", webmasters that serve different GZIP compressed content versus uncompressed content.

If your webhost supported GZIP perhaps all our webpages would make it into the next index update!

Check your website logs, look at the file size loaded by the new Mozilla Googlebot versus the old Googlebot for one of your larger pages. You may find the file size is 3 to 4 times smaller with the Mozilla Googlebot. If this is so, GREAT, you support GZIP compression! Most browsers request GZIP compressed pages (unless you have Norton Internet Security), your server may optionally comply serving GZIP compressed content or spew out 4 times as many bytes serving uncompressed content.

There are complications with this, but probably only for a few major players, Brett can tell you! If Brett could serve compressed content, (perhaps with a new version of APACHE set to level 1 (lowest) compression) he might not have had to shut off all the search engines in his robots.txt (YET).

FromRocky

8:01 pm on Dec 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Test DC 64.233.179.104 is back with new caches as late as December 13.

followgreg

9:34 pm on Dec 16, 2005 (gmt 0)

10+ Year Member



This DC is my default Google.com right now.
So far looks good and I see updated cache.
I am not talking about the cache date, cause I haven't checked them all, but at least the page title on some sites I watch was updated 3 days ago and this DC shows the changes...

vdoyl

10:32 pm on Dec 16, 2005 (gmt 0)

10+ Year Member



Hi,

I do not know if it is a change, but I noticed that on the test DC 64.233.179.104

site: example.com

returns pages from the subdomains of example.com

As far as I can remember subdomains were treated as separate websites before?

Correct me if I am wrong.

Kangol

10:51 pm on Dec 16, 2005 (gmt 0)

10+ Year Member



I did a site:www.domain.com and almost all of my PHPBB forum pages came up as supplementary. This can't be good.

Marval

10:57 pm on Dec 16, 2005 (gmt 0)

10+ Year Member



Im seeing a new Canonical issue with this new datacenter - of course it seems that the regular data centers have picked up on doing it as well that the algorithm doesnt seem to be able to address - completely different than the old Canonical issues we had: Here's the new version:

www.anothersite.tk.mysite.com/ is listed in a site:mysite.com command search yet there are no subdomains whatsoever on that mysite.com domain.
note the cache date on these type pages is from July and they are listed as supplementals - however they just showed up a few days ago and do NOT show for a site:www.mysite.com command.

I did report it using a spamreport so that Google can see the issue but figured I might as well get it out in the open for discussion

Newman

12:03 am on Dec 17, 2005 (gmt 0)

10+ Year Member



[64.233.179.10...]

1. There are no supplemental results for my domain,

2. Number of links updated...

Nikke

1:15 am on Dec 17, 2005 (gmt 0)

10+ Year Member



Test DC 64.233.179.104 is back with new caches as late as December 13.

Hmmmm. I see the fresh date from December 15 (Yay!) but the cache is from December 5...

Rainie

4:34 am on Dec 17, 2005 (gmt 0)

10+ Year Member



My optimism is gone for now. Late last week and early this week, the test DC and a couple of others showed my canonical problem from Allegra improving. (Rankings would be the next thing to worry about, but I think that's putting the cart before the horse.)

For the last three days, those DCs are showing the same old mixed-up crud that I've been viewing for months. I hope those good results I was seeing are residing somewhere and will make an appearance again soon! At least then, I would get some hope back. :-(

Unfortunately 64.233.179.104 isn't doing it for me either.

reseller

8:23 am on Dec 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good morning Marval

>>Im seeing a new Canonical issue with this new datacenter<<

I guess we should expect to see many strange things on that TEST DC. It might just indicate that the folks at the plex are still testing.. and testing.. and testing :-)

Powdork

8:38 am on Dec 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmm. I see the fresh date from December 15 (Yay!) but the cache is from December 5...
Is it possible it was last crawled on the 5th. But it has been requested since then and a 304 was returned, most recently on the 15th?.

taps

9:55 am on Dec 17, 2005 (gmt 0)

10+ Year Member



Cannot reach [64.233.179.10...]

Anyone else with the same problem?

Dayo_UK

10:20 am on Dec 17, 2005 (gmt 0)



Taps

You want a 4 on the end of that my friend:-

[64.233.179.104...]

However, the DC does not show test data at the moment as far as I can see - I must have missed it :(

Marval

10:28 am on Dec 17, 2005 (gmt 0)

10+ Year Member



Dayo_UK - Im seeing some sort of intermediate version of test data - its actually a new data set that seems to have some updated caches but at the same time a much higher page count for some popular single keywords than the standard SERPs collection - but not as high as that original test set

good morning reseller - I certainly agree and look forward to seeing a great set of results out of this testing phase

otech

10:28 am on Dec 17, 2005 (gmt 0)

10+ Year Member



same, those test results have been gone for days now...

having so many millionares in one company the googleplexers might have already started their christmas break!

do you guys get long breaks over christmas in the US?

In Aus we practically shut down in the week between christmas and the new year...

Dayo_UK

10:32 am on Dec 17, 2005 (gmt 0)



Marval

I am pretty sure that it is not showing test data at the moment.

Although there are some oddities in the DC.

This 126 message thread spans 5 pages: 126