Google Data Quality Down - Is this now a systemic issue? - Google Search and SEO forum at WebmasterWorld

This is the first in a weekly series of Moderator's opinion articles and as such I am going to open up on a subject that is affecting my day to day working.

Just how accurate is the data we are getting from Google these days? From what I am seeing in Google Analytics, Adsense and SERPs - not very accurate at all would be the answer. So we need some examples to back up this hypothesis, lets start with Google Adsense.

Well for over a month now we have been seeing reports of Inflated Adsense Page Views [webmasterworld.com], despite them disappearing for a few days, just when everyone thought the problem had gone away, they came back the day before yesterday and yesterday morning for me.

I'm only seeing relatively small numbers in comparison to my overall page view numbers, but others are seeing significant multiples of their normal daily page views in fake views. This is effectively rendering all RPM and CPM figures in Adsense useless.

Great if you are aware of these fake page views, you only have to maintain your own spreadsheet (yes it is nugatory work) to find out what your actual figures are. However for those who only look at their Adsense overview for figures and don't realise the problem this could be a bigger issue. Why you ask?

When you look at your RPM/CPM figures, you are generally looking at comparing one advertiser with another, so when Google's RPM/CPM goes down you look at BingAds or maybe an affiliate program provider or possibly one of the many new Ad Marketplaces that have popped up over the last couple of years - and when you run a test and see an improved RPM/CPM from one of these sources Adsense is dropped and the new advertiser is given full coverage.

So you would think it is in Google's interest to solve this problem as fast as possible - or at least identify and remove the fake page views as fast as possible to give clean data before they lose too many publishers! This has been going on since at least 11th February, so no solution or data cleansing in over a month!

Maybe Google could give us a page view disavowal tool, so that we can clean up our own data while we wait for a complete solution.

Ahh, you might say this is just an isolated incident of poor data quality. Not so! I have a goodly few more for you to think about, while I wonder whether Google has some major systemic data quality issues.

Before we move on to Google Analytics, let's briefly stop by Adsense Experiments, that are at the moment reporting technical difficulties (probably due to the fake page views), however even before this it was difficult to assess whether the data that was being reported was accurate as you would set up an Adsense experiment say between a blue title and a purple title and then look at the page to check how both were displaying and get a black title (see Adsense Titles [webmasterworld.com]) - doesn't give you a lot of confidence in the quality of the data you are getting from your experiment does it?

Let's start to take a look at Google Analytics

I'll start with something I noticed yesterday, here are a couple of chunks of a referrals report from GA Analytics1 [s11.postimg.org], Analytics2 [s11.postimg.org] showing data from 10th Feb to 8th Mar and 15th Feb to 16th Mar respectively. At some point Google seems to have randomly assigned some of my conversions values (even though I haven't set them up) and assigned different values to different referrers conversions, with the vast majority keeping a zero value. Okay I'm baffled by this one, but if I was using this data to track PPC conversions or other advertising conversions I would be really concerned.

In this post I noticed something with precision in Analytics reports (Report Precision Issues [webmasterworld.com]) with really different results being shown for visitor numbers at different precisions - at that time I wasn't clear on what was going on but further investigation shows that if you have selected some segments such as mobile, tablet and desktop and then go to the report for all sessions the precision will vary the number of visitors shown in the landing page report - this seems to be slightly better than it was when I first noticed it for the default precision (this was the really irritating part of the problem), but selecting faster processing still gives wildly inaccurate numbers with low to medium traffic sites.

Now we come on to Analytics experiments, probably the most irritating of the Google data quality issues as I use experiments all the time to test improvements to sites.

After a while you get used to some of Analytics Experiments quirks such as stopping an experiment after a couple of weeks saying "Since your Google Analytics Experiment, <Experiment Title>, is unlikely to identify a winner, we have ended the experiment" when the experiment appeared to be running nicely and one variation was doing quite well and telling you that the experiment has been stopped and is unlikely to identify a winner when it has reached a "Probability of Outperforming Original" of 94.5% or 5.6%. Working around this is just part and parcel of dealing with Google Analytics these days.

Probably the most annoying feature of Analytics Experiments has to be the fact that adding segments to try and identify if a change has performed well on a particular platform (say Mobile or Tablet or Desktop) will change the results for All Sessions. Sometimes I have seen the results report for All Sessions change from green to red (or vice versa) by adding segments and sometimes the change is from 3% improvement to 0.5% improvement or 8% worse to 4% worse.

Or in other cases I have a report with just All Segments showing a small improvement, so I have selected Desktop,Mobile and Tablet segments to see if it has performed well on one platform or badly on one platform only to find that each of the segments is showing a slight worsening of the selected objective. It has got to a state where I see a finished experiment and look at the results in different ways and then make an educated guess as to what the outcome really was and implement changes or not based on my educated guess. I think that I am getting some benefit from experiments but I don't have enough confidence in the data to feel really comfortable when making changes.

By the way it would be a great idea if we could set up experiments for a particular segment (say mobile or tablet) as this would allow us to get much better experiment results.

And all this without touching on the fact that Google Analytics has made finding a number of referrals per search engine report nigh on impossible to find and has left Bing out of the referrers report (it is almost impossible to find a reference to Bing in GA these days!) Also it doesn't provide a breakdown of which google domain your visitors are coming from - in some ways it now so irritating that it is hardly analytics any more.

Is Google falling apart or imploding? No far from it, but if you look closely at the data - something many webmasters do - you will probably find a few frayed and unraveling edges.

Google Data Quality Down - Is this now a systemic issue?

IanTurner

EditorialGuy

7_Driver

koan

blend27

keyplyr

iamlost

aristotle

masterjoe

Nutterum

Panthro

IanTurner

IanTurner

IanKelley

JS_Harris

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week