Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Mozilla Googlebot and the New Index at 64.233.179.104

Moved on from Jagger

         

Dayo_UK

9:58 am on Dec 13, 2005 (gmt 0)



OK - Jagger is over - long live "Big Daddy" - as named by MC for the test DC.

The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.

OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)

Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.

So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.

Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.

OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.

The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.

Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......

For confirmation that this is a whole new build of the index MC said on his blog:-

"the test data center certainly has some different crawling and indexing characteristics."

OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.

Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.

What are other observations people have seen with the new crawling and indexing on this test dc.

BillyS

12:37 pm on Dec 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The test DC was active yesterday as reported. I noticed an increases in pages indexed.

Google.com - 10,300

64.233.176.104 - 685 (last time the center was active it was in the 500s.

Marval

11:04 pm on Dec 17, 2005 (gmt 0)

10+ Year Member



Dayo_UK - actually it seems to be showing a new index with cache dates as fresh as today the 17th. One keyword I use as an indicator here has in the past been around 70 mill results - then up to 200 mill. That test set had 750 mill results and this new index on that IP has around 325 mill so some new pages evidently

BillyS

2:02 am on Dec 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My content management system does not handle 301 / 404s very well (inconsistent, and I'm still trying to figure out why).

Well, I just happened to be looking through my logs and I saw Mozilla Googlebot looking for this URL on my website:

[foo.com...]

GOOGLE404probe... Hmm, good thing I returned a 404 this time. That bot does nothing but look for trouble!

Dayo_UK

9:11 am on Dec 21, 2005 (gmt 0)



Test results are back ;)

Top 40 for full site name :) - although internal page outranks the homepage - I guess that is progress :)

Also a site only crawled by Mozilla Googlebot makes it into the serps (250 pages)

Plenty of pages crawled by Mozilla Googlebot have not made into the serps - so I still wonder if G are using a sample data set.

reseller

9:33 am on Dec 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dayo_UK

>>Test results are back ;)<<

Yes. And this time with updated cache as far as my sites are conserned.
Soon will be the BigDaddy day :-)

Dayo_UK

9:40 am on Dec 21, 2005 (gmt 0)



Reseller

I think it is very much still in development.

Not sure why so much Mozilla Googlebot crawling is necessary for the amount of pages added.

Some people have said that pages need to be crawled 3 times before being added (I cant verify that for my site for the pages not added) - but that does not sound to efficient.

I have deffo got more pages added to the test dc - however, Mozilla Googlebot has crawled about 10-20 times more that have not been added.

But at least now I will welcome Mozilla Googlebot with open arms - rather than shake my head in desperation at the activity of the bot.

reseller

9:48 am on Dec 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dayo_UK

>>Not sure why so much Mozilla Googlebot crawling is necessary for the amount of pages added.<<

Rumor has it that Mozilla Googlebot is an anti-spam anti-duplicates detective bot ;-)

Dayo_UK

9:53 am on Dec 21, 2005 (gmt 0)



Reseller

Well maybe - there have been so many rumours about the bot.

I guess that it will be a while before we see the full impact of the test DC anyway - eg MC talked about progress on 301s, canonicals etc, and that the DC will roll out in a matter of months.

So all the hoped for improvements the test DC may bring will not happen overnight I guess.

reseller

10:02 am on Dec 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dayo_UK

>>I guess that it will be a while before we see the full impact of the test DC anyway - eg MC talked about progress on 301s, canonicals etc, and that the DC will roll out in a matter of months. <<

I'm an optimistic reseller, you know :-)

I see Google resolving most of those matters once we reach March 2006.

Meanwhile, I'm gonna write an e-Book;

Breaking Google's Douplicates Code in 30 minutes

Of course, Inigo would get a free example of my e-Book for review on his blog :-)

Marval

10:56 am on Dec 21, 2005 (gmt 0)

10+ Year Member



Dayo_UK - Im seeing the test results database live on google com - without the fresh cache - also tried hitting the test DC and it was down when I tried - but there has been some filters applied to the test data since the last time I saw this new index. Although the number of pages is huge there are also some definite improvements in many areas with less scraping content - although Im noticing a trend towards listing the "authority labelled" sites at the top, then a bunch of forums and fake/syndicated blogs as the rest of the top 10 for many of the two word terms I checked.
Best guess here is that the scraper filter is being tested and a slight twist of the dial for on-page content

powerofeyes

11:04 am on Dec 21, 2005 (gmt 0)

10+ Year Member



Think the big Daddy results will be live right after this new year celebrations, I dont see anything wrong with the new results compared to the current jagger 3 results, Infact most the areas I check I see updated caches and good results,

Also some sites which were stopped from crawling by googlebot and went as supplements are now back in gozilla bot index,

dont know how it will workout and what google's plans are,

zeus

11:06 am on Dec 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also see fine site: results again, no non www. in the search and I dont see any supplemental results, ofcause there is still no ranking because there has not been a update with these results, nice to see that google still know how to index a site.

One thing I would like to know from google what has been the problem and why has taken over a year to (fix) if it fixet.

[edited by: zeus at 11:10 am (utc) on Dec. 21, 2005]

Dayo_UK

11:07 am on Dec 21, 2005 (gmt 0)



powerofeyes

Although some sites might be back - others are only just starting to come back - so maybe that is what will delay the full roll out of the index - eg - it is still being built.

MC has talked about it taking months to roll out :/

I agree Zeus it does not look like the sites that have come back have regained there full power yet.

baron13

11:19 am on Dec 21, 2005 (gmt 0)

10+ Year Member



Why are so many people telling that the update is over!? The update is not over and the serps are still very bad. If the update would be over, Google would be never the best search engine again.....

Dayo_UK

11:22 am on Dec 21, 2005 (gmt 0)



baron13

Not sure what you mean - the "Big Daddy" update is only just beginning. :)

PS. I agree I dont think Jagger was really ever over - but "Big Daddy" looks like it will replace it anyway - so Jagger may just have been groundwork.

baron13

11:33 am on Dec 21, 2005 (gmt 0)

10+ Year Member



Ups....does that mean that Jagger was only a "Pre-update" and the big one is starting right now!?
:-)

zeus

11:38 am on Dec 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



baron it maybe starts after newYear or Christmas

UK_Web_Guy

11:41 am on Dec 21, 2005 (gmt 0)

10+ Year Member



2 observations about this test DC

1. Am seeing competitor sites that have been previously banned (removed from the index) back in and ranking

2. Looks like they haven't switched on the "don't let sitewide bought links work" button yet - this type of link buying seems to be working a treat on this test DC

I'm sure there is a long way to go yet on this DC - is GoogleGuy asking for any feedback on it yet?

taps

12:01 pm on Dec 21, 2005 (gmt 0)

10+ Year Member



I see some strange things happen on [64.233.179.104...]

When looking for a three keyword phrase a result from my page is showing up with
- an old abandoned domain-adress (redirected from www.widget.com to www.widdget.com since Allegra)
- a filename instead of the rewritten url (filename is 301-redirected since Allegra and forbidden by robots.txt)

This result is not marked as supplemental

When doing a search for site:www.widget.com (old) still > 8.000 results show up. None of them is marked as supplemental. However these are old results from last November. No cache dates are shown.

Interesting - but up to now not very encouraging.

taps

12:03 pm on Dec 21, 2005 (gmt 0)

10+ Year Member



Another information from Google can be seen when opening the cached version of a page.

They show a crawl date like:
X-Google-Crawl-Date: Thu, 25 Nov 2004 04:38:32 GMT

That date is different from cache date shown in the first line.

[edit]oops - that date is not shown with any cached page. I can see that date only when searching for site:www.olddomainname.xy[/edit]

[edit 2]Sticky me if you like to see a screenshot[/edit]

[edited by: taps at 12:16 pm (utc) on Dec. 21, 2005]

Dayo_UK

12:04 pm on Dec 21, 2005 (gmt 0)



Yes, it does look like a lot of work needs to be done.

If only they would use all the flippin data crawled by Mozilla Googlebot - not just a tiny sample. Loads of hits again by Moz bot last night and I will probably see 1 page make this test index :(

Dayo_UK

12:32 pm on Dec 21, 2005 (gmt 0)



The most disappointing thing from my POV is that it has not progressed at all from the last time we saw the DC.

EG:-

- Lots of Mozilla Bot crawling - I was hoping that this would be added.

- MC confirmed in his blog that a site ordering had occured during Jagger - sites which have not been ordered are still not correctly ordered on the test DC.

- Homepages that have been re-found by Googlebot still dont rank.

Would be great to get some feedback from GG on some of these points.

Eazygoin

1:08 pm on Dec 21, 2005 (gmt 0)

10+ Year Member



On the following DC's, I have OLD PR and OLD Backlinks stats:
216.239.57.105
64.233.171.99
64.233.171.104
64.233.171.105
64.233.171.147

Four other 64.* DC's show the Jagger updated BL, and PR.

The rest show Jagger PR, but NEW BL figures, and this is the majority of DC's.

In conclusion, a very mixed bag, with no certain direction :-)

phantombookman

2:01 pm on Dec 21, 2005 (gmt 0)

10+ Year Member



I am seeing the number of results/returns for some terms in my area very much higher than the other day.
One has gone from 4 million to over 6 million!
Another has more than doubled

BillyS

2:58 pm on Dec 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've now progressed from ~550 to ~680, right now showing 777... I need to play the lottery!

reseller

3:13 pm on Dec 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BillyS

>>I've now progressed from ~550 to ~680, right now showing 777... I need to play the lottery!<<

who knows. Maybe your lucky number is 707 :-)

NoLimits

3:41 pm on Dec 22, 2005 (gmt 0)

10+ Year Member



Looking at the Test DC has me giddier than I've been in a long time. My mother ship site is coming back in the Test DC stronger than ever.

... hey G - you guys can cut the Moz Index loose any time now. Just flip the switch, pull the hatch, cut her loose - but whatever you do, don't turn the dial!

Dayo_UK

3:46 pm on Dec 22, 2005 (gmt 0)



Lol - Nolimits - I hope for a few more dials to be turned yet.

But it looks like we might be moving in the right direction - when did your site go missing?

steve40

4:24 pm on Dec 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think I would get too excited yet, there are a few filters not applied to this data set yet.
from prior major changes data is added first with a few filters applied then as more data added additional filters are put in place and just before rollout new filters are applied,
I can see one site in results that I think has a dupplicate filter on due to multiple domains with similar data not applied yet ( maybe they lost that filter but I doubt it ) so still some cooking and brewing and skimming of sites off the top , but I think myself we will see this start to roll out 31st December through early January with ongoing tweaking.
Will be interested if the core algo is left untouched with the new data .
just watching, waiting and guessing like many others
here

steve

cws3di

5:05 pm on Dec 22, 2005 (gmt 0)

10+ Year Member




I totally agree steve40

With the understanding that this is test data, I believe that a lot of what we see does not have late-stage filters applied, specifically the dup content filter.
.

This 126 message thread spans 5 pages: 126