Welcome to WebmasterWorld Guest from 23.20.238.193

Message Too Old, No Replies

Mozilla Googlebot and the New Index at 64.233.179.104

Moved on from Jagger

     
9:58 am on Dec 13, 2005 (gmt 0)

10+ Year Member



OK - Jagger is over - long live "Big Daddy" - as named by MC for the test DC.

The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.

OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)

Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.

So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.

Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.

OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.

The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.

Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......

For confirmation that this is a whole new build of the index MC said on his blog:-

"the test data center certainly has some different crawling and indexing characteristics."

OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.

Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.

What are other observations people have seen with the new crawling and indexing on this test dc.

5:18 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've noticed on the mentioned test DC:
1. Some caches dated December 2nd and the rest in November.
2. All of the URL only pages have converted to supplimentals. Thus, there is no URL only left on this test DC. Does this indicate you either have a full or supplimental listing from this test DC? There are only two types of listing? Does anyone know what this supplimental listing mean?
5:32 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



I added over 3000 pages to my site in one go last month. Probably unwise as I only have 400 pages indexed! Mozilla has picked them all up. Googlebot hasn't got any of them. They seem to rank o.k on the test d.c.
5:38 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



FromRocky

Yes, lots of url onlys have the title and description back - this really need to be recrawled but the required recrawling of supplimentals is pretty much standard accross the dcs.

The caches in December - the ones I have seen tend to be normal Googlebot rather than Mozilla Googlebot. Normal Googlebot still adds pages to the test dc - but Mozilla Gbot does not add pages to the other dcs or so it seems.

fatpeter

What sort of cache date are you showing for those pages picked up by Mozzilla Googlebot - and are pretty much all the pages crawled by the bot indexed. EG 3000 crawled and it appears 3000 indexed?

6:37 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



"What sort of cache date are you showing for those pages picked up by Mozzilla Googlebot - and are pretty much all the pages crawled by the bot indexed. EG 3000 crawled and it appears 3000 indexed?"

Can't go past a 1000 but it looks like they were all crawled and added.Cache dates around the middle of november. Only odd thing... a site: search always gave an accurate number of about 400. I added 3000 and now the site: search gives 11000 results.

7:15 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



Wow I am excited! Mozilla bot is the main bot that has been crawling me for months. My pages actually show up in this index, hurray!

Strange how it shows the page count as being 10x what it actually should be.

Please update to this version!

7:26 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



ddogg

What sort of cache dates are you showing?

Are you seeing a pretty much full crawl to index return - or just a sample?

7:36 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



My cache dates are earlier than what current Google is showing. End of November it appears.

Seems my whole site is indexed. In current Google 99% of my pages are url's only. In this version they are actually indexed and ranking like normal (no sandbox or any weirdness, 3 1/2 year old site though so shouldn't be sandboxed anyway).

I had been getting deep crawls by Mozilla bot for months but very very few pages would actually end up indexed. This is more like it!

7:51 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member



In my case I see the non www is gone and only www. is index + no supplemental results, ok still no ranking, but thats because its not a update.

I also see a established site has gone from 50.000 indexed pages to 900, but that could have something to do with mozilla bot theory.

8:57 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



Geez .. no sup results ... no 10 year old caches ...

Does this mean my being a troll is at an end?

I think not he he he!

9:21 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



On a site without a 301 redirect.
-Do site:name.com
....returns name.com/index.html
....followed by all of the pages as www.name.com/page.html
-Do site:www.name.com
....returns www.name.com/index.html
....followed by all of the pages as www.name.com/page.html
The only difference is the www on the home page. Number of pages is the same (home page is showing cached at the end of November)

On a site with the 301 redirect, results are identical with and without the www in the site command.

This looks like an improvement for the canonical issues.

I've also noticed several of my pages returning from supplemental and now ranking again.

9:51 pm on Dec 13, 2005 (gmt 0)

5+ Year Member



Hey all,

My experience:

What i'm seeing here is that google basically reincluded my whole site. The site was completely wiped out from google.com in the early days of Bourbon.

Since May we completely rewrote our site, fixed canonical issue, deleted thousands of pages, etc...

Our site and all of it's pages (200) are now back in the index and showing on 4th 5th page for competitive keywords.

SO:

-This index is quite fresh AND for us it's either:

1) penalties or more agressive filters are now yet applied

2) complete reinclusion and weight off of the multiple sites penalties that were on this domain.

Ok I hope it's option 2.. but...

COuld they still add filters and penalties to this index, or are we dealing with soon to be stable results with all site penalties applied?

10:12 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



the results on the test dc now look the same as the others - this keeps happening
10:48 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



OT - but can't access the other thread.
Spamming b's that have plagued my sector have gone from the Serps finally.

To whoever is watching who might have had something to do with it - thanks.

11:27 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



64.233.179.104

This is google.ru isn`t it?

12:49 am on Dec 14, 2005 (gmt 0)

10+ Year Member



64.233.179.104 has found all but 10 of the pages listed on google.com for my main site. (9,590).

The latest cache is 4 days older than any I can find on google.com, but that cache hasn't been updated for 7 days now.

Googlebot, whatever I said, I didn't mean it. Please come back!

1:02 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



64.233.179.104 has by far the best results for me as far as indexing my site. On site search I have all pages indexed, the ones that had dropped into supplemental are now not- but- I have old pages that I long ago let go 404 just to get them out of the index. One has a cache date of september2, 2004-But otherwise I can live with this.
1:21 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Results have not changed for my sites. Could be those changes are yet to come. Matt Cutts says it's an update so I will continue to monitor that DC. Supplamental results are gone but I still have tens of thousands of strange extra pages in the index.
7:28 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good morning Folks

For the benefit of further discussion, I'm recalling what Matt wrote regarding the test DC.

"Broker Boy, I do expect that data center to eventually go live, but it will take a few months, in all likelihood. That data center (64.233.179.104) recently moved into regular rotation recently, and I wouldn�t be surprised if one more data center joined it in the next week or so. After that, I�d expect those two data centers to stay in the rotation (but not spread) until after the holidays. Not sure about that, but that�s my best guess."

"Joe, I believe we've instituted some more intuitive results for site: queries within the last few weeks. The test data center will be where most of the progress on 301s/canonicalization takes place."

Wish you all a great day!

7:35 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Howdy Reseller

Intresting, I missed the comment to Joe on MC blog.

(I kind of think that Big Daddy might really be the real Jagger3 - ie GG talked about Canonical, 301 things and the base for a new index as Jagger3 and this did not happen - but hey whatever - I still think MC/GG get excited sometimes about a change and say it is coming before it actually happens or is ready to happen, we do know that they have pride in working for Google so I guess they get excited too when big things happen - lets hope it comes through).

The test DC is not showing test results for me at the moment.

8:36 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good morning Dayo_UK

>>I kind of think that Big Daddy might really be the real Jagger3 - ie GG talked about Canonical, 301 things and the base for a new index as Jagger3 and this did not happen - but hey whatever - I still think MC/GG get excited sometimes about a change and say it is coming before it actually happens or is ready to happen <<

What I like most about GG and Matt is that they are posative and optimistic fellow members. Great.. the two gentlemen get excited sometimes and tell us things before it happen :-)

For example:

GG & Matt! Now I have 50% of my pre-Allegra Google referrals. When do I get the rest of my pre-Allegra traffic back. Thanks a bunch :-)

9:52 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Good morning,

can somebody please confirm that the Testcenter 64.233.179.104 currently uses the same results as www.google.com. I don't see testresults anymore.

The results are the same all along. Same cache dates, some number of pages returned. Bad as ever since Jagger for our site.

Am I the only one seeing this?

10:07 am on Dec 14, 2005 (gmt 0)

10+ Year Member



recar - you're not the only one - this has been happening every now and again for the past couple of weeks
10:11 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dayo, if indeed they are building a new index from scratch - to take over from the current mess - that's really big news, and kudos on your observations.
11:34 am on Dec 14, 2005 (gmt 0)

10+ Year Member



oddsod

The test DC is not showing test data at the moment - when it next goes live I would be intrested in your observations on your sites which may have non-www, canonical problems etc.

Cheers

2:04 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



My site is ranking pretty poorly on all DC's but the .uk one where we are doing a lot better.
I can't see if Jagger has got to the .uk site yet or not - I hope so!
1:25 am on Dec 15, 2005 (gmt 0)

5+ Year Member



I am not seeing this bot in my logs.

Can we get an idea of how many people have been crawled by this IP?

1:40 am on Dec 15, 2005 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Can we get an idea of how many people have been crawled by this IP?

The difference should be:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

versus

Googlebot/2.1 (+http://www.google.com/bot.html)

In the user agent field.

I've got around 1,200 pages crawled by Mozilla/5.0 and another 700 by Googlebot 2.1 for far this month. This is for a website with approximately 1,000 pages.

5:23 am on Dec 15, 2005 (gmt 0)

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I am seeing the test results now matching the rest of the dc's except that the test dc has removed some js redirect doorways from the serps.

edited to make more sense.

7:17 am on Dec 15, 2005 (gmt 0)

10+ Year Member



I am seeing the test results now matching the rest of the dc's

I'm seeing the same.. I told my collegues a few days ago, I think it was before the weekend, that the datacenter on this IP shows very old results, with cache dates in early november.

Next to that, I wasn't seeing any URL-only results, but i do seem them again now. After doing some testing the 'Big Daddy' now seems to be the same as most of the datacenters again.

Too bad, maybe they know that we know and switched IP's ;)

This 126 message thread spans 5 pages: 126
 

Featured Threads

Hot Threads This Week

Hot Threads This Month