Google Datacenters Watch: 2006-01-30

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Datacenters Watch: 2006-01-30

Observations, Analysis and Remarks

johnwards

3:55 pm on Jan 30, 2006 (gmt 0)

< continued from [webmasterworld.com...] >

This is just odd.

The 64.* DC's return about 300 pages from my site.

The 216.* DC's return about 46,000 pages from my site.

And the 66.* return 69,000 pages from my site.

Currently I have about 65,000 pages.

If I go to google.co.uk I get 46,000 pages. If I go to google.com from my US based server I get the same 46,000 results.

It is all very odd and confusing.

[edited by: tedster at 9:56 pm (utc) on Jan. 30, 2006]

nohllywd

3:43 am on Feb 2, 2006 (gmt 0)

I just check BD a few messages back and my main page is listed twice. Both are the same url. Does this make sense to anyone?

Steph_R

4:07 am on Feb 2, 2006 (gmt 0)

Maybe one is with www and one is without www?

reseller

6:40 am on Feb 2, 2006 (gmt 0)

Good morning Folks

I wish to thank Matt Cutts for keeping us informed by posting his latest BigDaddy weather report on his blog.

[mattcutts.com...]

Here is an informative part of what Matt posted:

"In case you don�t want to download a 70 megabyte audio file, here�s the latest on Bigdaddy. Bigdaddy continues to roll out and is now available at three data centers. In addition to 66.249.93.104 and 64.233.179.104, Bigdaddy is now up at 216.239.51.104. We�ve been going through the spam feedback and acting on it, and reading through the general search feedback as well."

Wish you all a great day.

Blade3

7:52 am on Feb 2, 2006 (gmt 0)

Ellio
>>I see no changes from recent results for our keywords in top ten on both BD and non BD.

As Reseller said there is a change in number of SERP results. For widgets query, the results on the three DC's are as follows:

[64.233.179.104...] 50,300,000

[64.233.187.104...] 22,200,000

[64.233.187.99...] 22,200,000

Very interesting.

Ellio

8:11 am on Feb 2, 2006 (gmt 0)

>>>>>As Reseller said there is a change in number of SERP results. For widgets query, the results on the three DC's are as follows:

[64.233.179.104...] 50,300,000

[64.233.187.104...] 22,200,000

[64.233.187.99...] 22,200,000

Very interesting. <<<<<

But in my experience the Big Daddy DC's have always returned a much higher total number of results.

Whats new?

Dayo_UK

10:14 am on Feb 2, 2006 (gmt 0)

In MC Blog he was asked about PR and he said:-

"PR updates are completely orthogonal."

Not really come accross the term orthogonal as I dont have a Math background.

However, looking at a definition I suppose it could mean that the PR update happens at a right angle/unrelated? to what we are seeing on BD.

But whatever it seems to indicate that PR is calculated seperately to what is going on at BD.

However, if the main purpose of BD is to follow 301,302s etc then the recalculation of PR I would have thought is vital to any progress we might see on BD.

Apparently MC also said in the chat room for the radio station:-

"You did say in chat when i asked, that there will be some refinement to the results."

So I guess it is still a case of what we are seeing on BD may not be what we get....?

dakman

10:23 am on Feb 2, 2006 (gmt 0)

all i know is when the new DC's are fully propogated with the new results, just like most updates but even more in this one, some will love it, some will hate it...

so far it looks like for me and dozens of other spam-free sites i monitor daily it looks like its going to do more harm then good...

indexed page counts are way low ... ive read other forums where some completely spam-free have gone from 17,000 pages to 100 that are appearing on the big daddy dc's...

i'm experiencing similiar results and ive seen so many other sites that are too with the new results...

on the bright side you get matt and others reporting more indexed pages as a result...

You know here's the truth and im sure many of you can relate:

Does it sometimes frustrates you as a webmaster how unpredictable google or others can be... its like the locus of control... google has it...

im sure other webmasters like myself feel the same way about it... we can control our content... our link development.. etc
but at the end of the day one algo shift or new update we cant control it... cant control sandbox or aging...

dont get me wrong the benefits can be huge if you are able to leverage google serp's... and im going to continue to fight the SEO battle and win as ive done many times in the past ... but also instead of trying to predict the latest google update all day i try to maintain some sanity and stick to PPC which I can control pretty much 99.9% of the time..

johnwards

10:26 am on Feb 2, 2006 (gmt 0)

The confirmed DCs on Matts blogs have differing cache dates.

1st of Feb
[66.249.93.104...]

21st Jan
[64.233.179.104...]
[216.239.51.104...]

What is strange is the 2 DCs with the 21st of Jan cache dates return differing amounts of results...the 216 DC returns the same amount of results as the 66 DC.

I would love to know which cache set a fully live BigDaddy DC is going to use.

colin_h

10:52 am on Feb 2, 2006 (gmt 0)

Hi DayoUK,

I found this really interesting definition of orthogonal - [atis.org...]

If you read further into the associated definitions it gives real insight into Google's thought processes.

I think that they felt that conflicting parameters and data influences were rendering their search results inaccurate. It might be that the new infrastruacture that MC talks about is the process by which relevancy is judged. Almost as if they are compiling a group of results and then subjecting these results to a series of intersecting parameters (held totally seperately from the initial results) ... eventually leading to cleaner and more accurate results. A bit like sculpture ... chipping away at the raw material until your vision of perfection manifests.

This might be the reason why I can't find my incoming links on Google at the moment. It might be that it's to be another orthogonal parameter to be added later.

All the best

Col :-)

Dayo_UK

11:04 am on Feb 2, 2006 (gmt 0)

colin_h

Yes, that is a nice definition.

With BD I have always been saying that I think the identification of the Canonical/301/302 has been a lot lot better but pages are not ranking.

I hope that when/if these two signals are combined/updated as in the case of PR internally - then we will see the sites with the Canonical/301/302 problems return.

>>> It might be that the new infrastruacture that MC talks about is the process by which relevancy is judged.

Probably not relevancy at this stage but process of identification with regards to 301/2/Canonicals while avoiding interferance from ranking changes at this stage.

From MC comment I just cant see if he is basically say this paramater has not been applied - he has sort of said that it is a seperate paramater but did not answer the quest. of whether a PR update (internally) has been applied or will be applied (obv. one day it will if it has not already)

colin_h

11:54 am on Feb 2, 2006 (gmt 0)

DayoUK,

I can see why they want to get this new infrastructure up and running. Imagine, they will not have to make changes to just one algorythm anymore. They will have several "orthogonal" controls with which to make minor adjustments to their results. I think, if they manage to pull it off, that times will be far less turbulent ahead for webmasters.

All the Best

Col :-)

Eazygoin

12:22 pm on Feb 2, 2006 (gmt 0)

Sorry guys, but I think you have this word 'orthogonal' completely out of context.

As a singular word, its meaning is 'Not pertinent to the matter under consideration', and that what I think was meant i.e BD has no bearing on any upcoming PR update, and is concentrating on other issues.

dataguy

12:33 pm on Feb 2, 2006 (gmt 0)

I think, if they manage to pull it off, that times will be far less turbulent ahead for webmasters.

Or we could all be screwed!

As posh as it sounds, the only way I can maintain sanity living at the whims of a search giant like Google is to focus on visitor experience, not SEO.

If Google stopped sending me traffic I would have to lay off a few employees and the thought of that keeps me up at night. Still, the only real control I have is over visitor experience on MY sites.

300m

12:38 pm on Feb 2, 2006 (gmt 0)

Dayo:

"You did say in chat when i asked, that there will be some refinement to the results."

He did say that, but he also said just prior to that in the audio that it would be more transparent.

If it is going to be transparent, then I can only assume that there will be nothing significant.

The one thing that gives me hope is that at the end of the interview he was given the chance to rant about stuff. He says that he is not happy about sub domain spam. He kind of made it sound like he was going to be going after that next.

I hope he does it soon because zilch in changes to the results that I see means that BH wins and Google is going to allow it. I am the person that asked the question in chat and also the person that posted that on his blog.

Dayo_UK

12:45 pm on Feb 2, 2006 (gmt 0)

Eazygoin

Yes, I sort of agree with that and sort of covered it in what I said - perhaps not explained it well.

If PR is not 'Not pertinent to the matter under consideration' then is it (the updated values as found by Google xml queries) being ignored etc at this stage in BD?

If Google is starting to correctly follow redirects and resolving canonical issues PR/Ranking is very very pertinent into the overall matter under consideration - eg a Ranking fix for Canonical urls - whether it is pertinent to the matter of identification which BD is focusing on is a different issue.

EG. Is this why we are seeing a fix for identification in some cases but not the follow on fix which would be for the ranking of the page?

Edited - Dont want to show the query as it might be against TOS

[edited by: Dayo_UK at 12:59 pm (utc) on Feb. 2, 2006]

300m

12:49 pm on Feb 2, 2006 (gmt 0)

I think the only thing that is going to change the index results at this point would be what Matt refers to as a "data refresh". Because for me, that is when this all started.

Eazygoin

12:57 pm on Feb 2, 2006 (gmt 0)

Dayo_UK

You are far more proficient in canonical issues than I will ever be, as I have never studied it coherantly. I watch coments as an observer, but as it hasn't affected my sites, and so I keep distant from it from a purely selfish point of view.

What I see as relevant in the current realm of things, is that a PR update will come quite soon, but it will not have any credence from BD integration. To explain further, I believe that the next PR update [bearing in mind that PR values are constantly updated, but not made public prior to a TBPR update]will not take into account any changes caused by BD, which apparently is only showing on 3 DC's currently anyway.

Dayo_UK

1:04 pm on Feb 2, 2006 (gmt 0)

>>>>>>I believe that the next PR update [bearing in mind that PR values are constantly updated, but not made public prior to a TBPR update]will not take into account any changes caused by BD

Yes, maybe - but until we know for sure Google are using the PR calculations as obtained by BD infastructure then the Canonical/Hijack issue will not be fixed. IMO.

EG - If the BD crawl followed a redirect and correctly indexed the destination but this was not happening in the normal crawl then the destination may not show PR until the BD infastructure is used to calculate internal and displayed PR.

I find it hard to digest that MC says it is not pertinent when the evidence is that BD is made up of differnet crawled pages/caches/backlinks and redirect infastructure.

I can understand if the matter under consideration is purely the identification of redirects etc - but for the bigger picture then surely it is.

[edited by: Dayo_UK at 1:10 pm (utc) on Feb. 2, 2006]

Eazygoin

1:10 pm on Feb 2, 2006 (gmt 0)

Yes, that could also be the case, assuming that the BD data is not considered 'stand alone' prior to being fully integrated into the 'common' SERP's.

Dayo_UK

1:13 pm on Feb 2, 2006 (gmt 0)

Yes, which takes me back to wondering if this calculation of BL,PR, Ranking and other indicators (as there must be one and it must be seperate from the main crawl) has been factored into Big Daddy or if the DC is using the ranking etc calculations as based on the normal/classic Google crawl.

Eazygoin

1:28 pm on Feb 2, 2006 (gmt 0)

Dayo, do you think that there is a single crawl for BL and PR updates?...or did I misundertsnad?

I thought that BL and PR were under constant review, as sites are crawled, much as new pages are indexed on a daily basis.

Dayo_UK

1:32 pm on Feb 2, 2006 (gmt 0)

There is two different types of data out there at the moment using different crawled material.

Big Daddy - Own Serps, Own Cache with Mozilla Googlebot crawl.

Normal - Own Serps, Own Cache with Normal Googlebot crawl.

As there are two crawls there should be 2 sets of BL, PR and Ranking infastructures (At least internally) for the two senarios. Whether BD is using this new BL, PR and Ranking infastructure yet I dont know. They should be both being constantly updated/reviewed as each of the seperate crawls occur.

Eazygoin

2:03 pm on Feb 2, 2006 (gmt 0)

So, if BD and classic SERP's have their own BL and PR values, then BD's must be hidden/internal as TBPR values are the same across the board.

IF that's the case, then they will probably remain hidden until they become spread throughout the SERP's. Otherwise, any BL and PR values would not have any bearing on the bulk or classic SERP's as viewed by everyone but would rather be values for a few DC's that aren't on display much of the time.

Added: What I meant to mention is that seeing as we can't expect BD to be across the board until Mid March, IF there is a BL and PR update prior to that, then it would probably feed off the classic SERP's

[edited by: Eazygoin at 2:08 pm (utc) on Feb. 2, 2006]

g1smd

2:04 pm on Feb 2, 2006 (gmt 0)

I keep seeing people mention that this isn't a data refresh. However, when I look at all the pages that show a modern cache and modern snippet for some queries, and an ancient snippet for other queries (when searching for words that are no longer on the real page, that is) I can see lots of problems.

The ancient snippet is sometimes served with a modern cache, and for other sites it is served with an ancient cache for that page. I still haven't worked out what triggers Google to keep an old cache, or to show a modern cache against an ancient snippet; but it is abundantly clear that the supplemental database is filling up with poisoned data.

However, seing all the old cache data, and old snippet data out there now, the most important thing I can think of to clean this up would be exactly that: a data refresh.

Matt Cutts mentioned a "Supplemental Googlebot" in passing, a few weeks ago. He said that this would help clear up some of the problems with supplemental results. Has anyone seen any evidence of such a bot, yet? or, as I suspect, is Google still at the planning or building stages of this right now? It is funny that Supplemental Googlebot had never come up in previous postings either here or elsewhere.

Dayo_UK

2:28 pm on Feb 2, 2006 (gmt 0)

Eazygoin

Try to ignore displayed TBPR.

Normal crawl may have assigned a page a PR of 1, while Mozilla Googlebot crawl may assign the same page a PR of 4.

On the normal DCs we would be happy enough saying it is using the PR of 1 - on the BD DCs we would think that it would be using the PR of 4.....

However, who is to know that BD is using the PR that it has recalculated at this stage - instead of using its local DCs calculated PR there maybe a central place and the PR of 1 may take precedence at this stage while BD is still in development.

In recent experience when internal PR is recalculated (or so it appears) it hits all the DCs at pretty much the same time so maybe there is a central place where DCs query the PR for the serp calculation. Probably unlikely as it would slow the process down - but normal googlebot PR might locally be over-riding Mozilla Googlebot calculated PR.

This is all guesswork though.

But as BD has a different crawl and indexing characterstics (eg largely Mozilla Googlebot) then a PR recalculation will/is happen(ing)... I am trying to figure out if internally it has been applied to the BD dcs.

Judging by the serps and MC comment that this is not about ranking changes then I would guess internally it has not been applied. With MC also saying it is not pertinent at this stage is that more evidence that it has not been applied.

OK lots of people will say PR is pointless. But I dont just mean PR, I mean PR,BL,Ranking and other external factors that effect a sites ranking.

[edited by: Dayo_UK at 2:34 pm (utc) on Feb. 2, 2006]

300m

2:31 pm on Feb 2, 2006 (gmt 0)

g1smd
Thank you for posting that. More attention need to go to that subject imho.

In my observations the "data refresh" that occured on December 27th is unrelated to big daddy. At least that is what Matt said in his blog. Interestingly enough, in his earliest post about bigdaddy (before the initial feedback friday posts, he gave an ip to watch. I was looking good with the large index result increase, but then the 27th something happened that changed all of that. I went from having content rich web pages that were on target in the top 3 for the majority of the keywords I go after, to seeing subdomain spam, link spam and big directories and newspaper organizations.

The really weird thing about all of it is that a good bit of the pages that are ranking well on goole after that data refresh, have not been touched in a long time. Some pages even say in great big bold text (This page is obsolete)

Some are most certainly supp results.

At this point, bd is not what it is cracked up to be, unless they do another data refresh that addresses what occured on December 27th.

frakilk

2:57 pm on Feb 2, 2006 (gmt 0)

I have an important question, how likely is it that we will see an algo update / data refresh on the BD datacenters while they are rolled out? Will the Google team only be concentrating on the rollout or will they be able to 'multitask'?

Judging by the rough guess that Matt made on his blog that a new datacenter will become part of BD every 10 days and if there is no algo update until the rollout is complete, could we be facing another 2 or so months of terrible SERPs?

300m - the phrases that you are watching, do they return a large number of results or are they low competition phrases. The large majority of my traffic has been lost due to poor rankings for low competition phrases.

needinfo

3:05 pm on Feb 2, 2006 (gmt 0)

frakilk,
I've been wondering about the same thing excately.

300m

3:22 pm on Feb 2, 2006 (gmt 0)

well results and competiton are always debated from what i have read. From what I have noticed, yes they are very competitive keywords in my area.

Example:
71,300,000 on a non bd dc

and

133,000,000 on bd

before the data refresh, it was steady at 49,000,000

Are the keywords competitive? For the above example, in my area yes very. I have always been in the top 3 for the above example and it was a big money maker.

However, I do target less competitive keywords and those dropped as well.

g1smd

3:42 pm on Feb 2, 2006 (gmt 0)

Anyone know why the "HTML Cache" of PDF files does NOT include the crawl date?

It is there for HTML content, but not for PDF files.

This 275 message thread spans 10 pages: 275