Forum Moderators: Robert Charlton & goodroi
The Art of Google Datacenters Watch
Good morning Folks
It isn't only a passion but also discipline to observe, analyze and posting remarks about the DCs in general or specific DCs in particular. And patience and focus is the name of the game.
Mostly we are seeking predictions about how tomorrow serps might look like.
And as you might have noticed, watching Google datacenters is a very educating process. Take a look at some posts on this thread and you shall see important topics as canonical issues, supplemental issues, 301 redirect etc. explained in details.
In fact this thread reflects the huge high quality resources this great WebmasterWorld community has.
Keep those great observations, analysis and remarks coming ;-)
Thanks!
[edited by: tedster at 6:55 am (utc) on Jan. 23, 2006]
I'm not doing any datacenter watching, but one of my sites fluctuates (mostly on the high side lately) from under 1000 pages to over 20,000 pages. I'm assuming that's Big Daddy, and if it is, he has it wrong by about 20K pages.
For example, on a big daddy search, my site ranks #3 for its main keyword, but my site currently has a Page Rank of 0 due to a redirect to a different /dir. Once the TBPR update happens, the new /dir should have a Page Rank of either 4 or 5. At the time of the TBPR update, wouldn't the rankings improve for this site as well?
For example, now the TB is showing webmasterworld.com's PR as a 7. But if you pull the xml results from google 66.249.93.99 right now, you'll see webmasterworld.com has a PR of 8 (PR is marked with the tag <RK> in the xml file shown below):
[66.249.93.99...]
Is that right? I was told that the <RK> tag does represent the PR value for the url, but I am not sure if that is true or not.
I would take a real life example to illustrate this, because ot may not be possible to explain without it.
Assuming I am searching for 'General Motors Widgets'. Google is automatically assuming that General Motors is also called GM , so maps the query to results of 'G M Widgets'.
On BD search positions 2 , 3, 4 & 5, the results are for a term refereing to 'Genetically Modified Widgets' also called GM Widgets, are appearing .
Note that the orignal search was for the term 'General Motors Widgets' and not 'G M Widgets', but results shooting towards the top slot are for 'Genetically Modified Widgets' .
This seems to fail the objective of intellegent name mapping, if i may call it.
I could not notice this in any of the earlier updates, and results were ok, but i am noticing this in BD DCs now. I have tried a few acronyms and the results are far from satisfactory.
I ahve some data centres showing
Same pr, backlinks, ranking and pages in index as usual.
Some with everything the same, but different pr
Some with nothing at all on my site(its 4 months old, so i figure those data centres showing older results)
Some showing no backlinks but same pr.
Its a mess, no pattern.
Google has dozens of datacenters. The results differ on them. Nothing has changed about that basic fact.
Its a mess, no pattern.
That seems to be a recurring theme in many of the posts here. Is it possible that Google's site: command may be going the same direction as the link: command and Google Page Rank?
In other words, it is possible that what we have been confusedly witnessing over the past several months (since pre-Jagger) is the transition into the site: command becoming a useless tool (or at least an unreliable tool).
a. Page Rank still exists, but it is no longer transparent, and we can no longer keep up with our "real" score, since it includes hidden data and parameters we can no longer check.
b. All of our IBLs (in-bound links or backlinks) still exist, but we can no longer check them, and we are only served an apparently random, ever changing sample, which doesn't seem to be anything more than a lottery selection.
c. The site: command was the last useful thing we had available to check if our domain is "healthy" in the G index. i.e. which pages are actually "there". However, for the past year, we have all been panicked by the Supplemental Results, and now we are maybe seeing some of them go away randomly. Hmm. It is not likely G will EVER do away with Supplemental Results! They need them as part of the "historical" record of the domain, part of the G patent, right? However, the Supps have caused great furor (so did Page Rank and Backlinks once upon a time), so why should we think that G will continue to show us our own underwear? It will always be there, but they don't have to show it to us.
Many people here have been posting that the number of indexed pages for their site: command has been fluctuating all over the place. (Since pre-Jagger and now it seems to be getting even more "random" in Big Daddy). While some people say theirs is great, others are shocked at what is missing, or how G has thousands when there should be hundreds, etc.
I am afraid we may be witnessing the demise of our dependence on the site: tool
.
Huh, that's hilarious; I was just thinking the same thing: they've broken the site: command too. However, I'm only seeing it on one 'dynamic' (mod_rewrite) site. Doesn't appear to be my fault: MSN & Y! and -BD apparently have it right.
added: I've been checking Google site: on my other mod_rewrite sites after noticing this craziness and they show realistic page counts, and I haven't noticed any flux on site: queries for these other sites.
I just get the feeling that Google treats SEO techniques a bit like parasites ... feeding from a resource that they wish to control completely themselves. Let's face it, if Google become the only ones who can sell top listings ... there's a lot of money to be made.
Sorry for another paranoid rant ;-)
All the Best
Col {:-o
Good morning Folks
Life on WebmasterWorld has always been exciting. Always something happen that keep your blood pressure high and fluctuating . And you might have noticed that watching Google datacenters is a serious business not designed for the weak souls :-)
BigDaddy is spreading, but its quality status remains the same. We should be looking soon for improvements or lack of the same on the new infrastructure of Google index, not only on how and when BigDaddy is spreading.
Our friend at the plex Kentuckian Matt had an 8 a.m. meeting and a busy day yesterday. And that mightbe the reason why he hadn't much time to post a BigDaddy weather report.
As you know, we are approaching very critical days where many thing are expected to happen. PR update, backlink update and most important the algo update Allegra-II sometime next mounth (probably first week of February).
Therefore we should focus on algo and rank changes too, not only BigDaddy spreading.
Unfortunately, it seems BigDaddy hasn't brought yet good news to our fellow members whos sites have been suffering of canonical and supplemental issues. Sorry folks.
Wish you all a great Google Datacenters Watch day.
God bless WebmasterWorld community.
BD is on any DC where a search for sf giants has giants.mlb.com as result 1.
if that is true then my site has 41000 pages listed on this BD-DC, which were removed from my site last year.
The present search results show between 700 to 900 .
So - is BD bringing dead (non existant pages any longer ) pages into index again? i wud become a top ranker again, but this time without my pages being there in my site now.
i doubt it.
So - is BD bringing dead (non existant pages any longer ) pages into index again?Not necessarily. The BD results are simply a set with a specific canonical issue fix. Specifically, they represent the results where the search for 'sf giants' (no quotes) returns the mlb.com results rather than the sfgiants.com results, both of which redirect to the the mlb.com result.
added- What I meant to also say is that there have been changes in the indexes that contain bigdaddy results, so the canonical fix that represents bigdaddy is independent of some other changes that may cause your problems. It could also be a related and unintended result of the canonical fix.
The XML page seems to show the top 10 links or so for that site - with normally the actual domain page being top.
For sites with Canonical/Hijack problems the domain does not seem to appear anywhere in the top 10 - let alone top.
Soooooo - either internal PR has not been recalculated for these hijacked/canonical prob pages (I hope so) - or this is a penalty/problem that wont go away :(.
Starting to wonder if there is really anyway back for these sites hard and long term hit with Canonical/Hijack issues.
The underlying message for Googleguy and MC - you can take Googlebot to the homepage again (after redirect, canonical problems etc) but can you sort out the ranking penalty that hit them?
Does anybody care to guess what percentage of DC's are showing Big Daddy results now?
For example when I looking for:
something site:mysite.tld
I find that pages that do not have a backslash and have since been 301'd to the correct version are outranking the correct page. The 301'd version is showing as supplemental, but it is outranking the correct page.
I also removed pdf versions of my pages and have since 410'd those away, yet those old pdf pages (showing as supplemental) are outranking the html versions.
Anyone see this before?