Welcome to WebmasterWorld Guest from 54.158.217.43

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

Google Data Quality Down - Is this now a systemic issue?

     
11:55 am on Mar 18, 2016 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts: 3610
votes: 37


This is the first in a weekly series of Moderator's opinion articles and as such I am going to open up on a subject that is affecting my day to day working.

Just how accurate is the data we are getting from Google these days? From what I am seeing in Google Analytics, Adsense and SERPs - not very accurate at all would be the answer. So we need some examples to back up this hypothesis, lets start with Google Adsense.

Well for over a month now we have been seeing reports of Inflated Adsense Page Views [webmasterworld.com], despite them disappearing for a few days, just when everyone thought the problem had gone away, they came back the day before yesterday and yesterday morning for me.

I'm only seeing relatively small numbers in comparison to my overall page view numbers, but others are seeing significant multiples of their normal daily page views in fake views. This is effectively rendering all RPM and CPM figures in Adsense useless.

Great if you are aware of these fake page views, you only have to maintain your own spreadsheet (yes it is nugatory work) to find out what your actual figures are. However for those who only look at their Adsense overview for figures and don't realise the problem this could be a bigger issue. Why you ask?

When you look at your RPM/CPM figures, you are generally looking at comparing one advertiser with another, so when Google's RPM/CPM goes down you look at BingAds or maybe an affiliate program provider or possibly one of the many new Ad Marketplaces that have popped up over the last couple of years - and when you run a test and see an improved RPM/CPM from one of these sources Adsense is dropped and the new advertiser is given full coverage.

So you would think it is in Google's interest to solve this problem as fast as possible - or at least identify and remove the fake page views as fast as possible to give clean data before they lose too many publishers! This has been going on since at least 11th February, so no solution or data cleansing in over a month!

Maybe Google could give us a page view disavowal tool, so that we can clean up our own data while we wait for a complete solution.

Ahh, you might say this is just an isolated incident of poor data quality. Not so! I have a goodly few more for you to think about, while I wonder whether Google has some major systemic data quality issues.

Before we move on to Google Analytics, let's briefly stop by Adsense Experiments, that are at the moment reporting technical difficulties (probably due to the fake page views), however even before this it was difficult to assess whether the data that was being reported was accurate as you would set up an Adsense experiment say between a blue title and a purple title and then look at the page to check how both were displaying and get a black title (see Adsense Titles [webmasterworld.com]) - doesn't give you a lot of confidence in the quality of the data you are getting from your experiment does it?

Let's start to take a look at Google Analytics

I'll start with something I noticed yesterday, here are a couple of chunks of a referrals report from GA Analytics1 [s11.postimg.org], Analytics2 [s11.postimg.org] showing data from 10th Feb to 8th Mar and 15th Feb to 16th Mar respectively. At some point Google seems to have randomly assigned some of my conversions values (even though I haven't set them up) and assigned different values to different referrers conversions, with the vast majority keeping a zero value. Okay I'm baffled by this one, but if I was using this data to track PPC conversions or other advertising conversions I would be really concerned.

In this post I noticed something with precision in Analytics reports (Report Precision Issues [webmasterworld.com]) with really different results being shown for visitor numbers at different precisions - at that time I wasn't clear on what was going on but further investigation shows that if you have selected some segments such as mobile, tablet and desktop and then go to the report for all sessions the precision will vary the number of visitors shown in the landing page report - this seems to be slightly better than it was when I first noticed it for the default precision (this was the really irritating part of the problem), but selecting faster processing still gives wildly inaccurate numbers with low to medium traffic sites.

Now we come on to Analytics experiments, probably the most irritating of the Google data quality issues as I use experiments all the time to test improvements to sites.

After a while you get used to some of Analytics Experiments quirks such as stopping an experiment after a couple of weeks saying "Since your Google Analytics Experiment, <Experiment Title>, is unlikely to identify a winner, we have ended the experiment" when the experiment appeared to be running nicely and one variation was doing quite well and telling you that the experiment has been stopped and is unlikely to identify a winner when it has reached a "Probability of Outperforming Original" of 94.5% or 5.6%. Working around this is just part and parcel of dealing with Google Analytics these days.

Probably the most annoying feature of Analytics Experiments has to be the fact that adding segments to try and identify if a change has performed well on a particular platform (say Mobile or Tablet or Desktop) will change the results for All Sessions. Sometimes I have seen the results report for All Sessions change from green to red (or vice versa) by adding segments and sometimes the change is from 3% improvement to 0.5% improvement or 8% worse to 4% worse.

Or in other cases I have a report with just All Segments showing a small improvement, so I have selected Desktop,Mobile and Tablet segments to see if it has performed well on one platform or badly on one platform only to find that each of the segments is showing a slight worsening of the selected objective. It has got to a state where I see a finished experiment and look at the results in different ways and then make an educated guess as to what the outcome really was and implement changes or not based on my educated guess. I think that I am getting some benefit from experiments but I don't have enough confidence in the data to feel really comfortable when making changes.

By the way it would be a great idea if we could set up experiments for a particular segment (say mobile or tablet) as this would allow us to get much better experiment results.

And all this without touching on the fact that Google Analytics has made finding a number of referrals per search engine report nigh on impossible to find and has left Bing out of the referrers report (it is almost impossible to find a reference to Bing in GA these days!) Also it doesn't provide a breakdown of which google domain your visitors are coming from - in some ways it now so irritating that it is hardly analytics any more.

Is Google falling apart or imploding? No far from it, but if you look closely at the data - something many webmasters do - you will probably find a few frayed and unraveling edges.
2:32 pm on Mar 18, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:June 28, 2013
posts:2973
votes: 521


And all this without touching on the fact that Google Analytics has made finding a number of referrals per search engine report nigh on impossible to find and has left Bing out of the referrers report (it is almost impossible to find a reference to Bing in GA these days!)

Weird. We see Bing referrals in GA every day. Maybe this issue is a glitch on one server?
9:58 pm on Mar 18, 2016 (gmt 0)

Full Member

10+ Year Member

joined:May 3, 2003
posts:273
votes: 22


I think this is a symptom of Google's attitude to webmasters - which is, that by and large, they don't give a damn.

Here's a case in point: On 19th April last year, there was a huge spike in AdSense revenue experienced by many webmasters - for us, it was 10x the average day. Google quickly identified the problem - and very efficiently (and quite correctly) ensured that we didn't get paid for all the fake clicks. But they failed to correct the stats in the AdSense reporting - which means that any report that includes that date is completely useless, and graphs that include that date show no trends whatsoever.

It's not that this was too hard to fix (they managed to identify the rogue clicks easily enough when it came to not paying for them) - it's just that they didn't care enough to do it.

I don't think any of the problems you identified are too difficult for the 30,000 super-smart Googler's to fix - it's just that they don't care enough to do it.
3:37 am on Mar 19, 2016 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2005
posts:1781
votes: 40


7_Driver, I agree, I hate seeing that day in my graphs because it makes such a high point that reading a trend for the other days very difficult, the overall graph is flat lined. Weirdly enough, my Adsense earnings have greatly decreased since that day, I always wondered if there was a link.
2:15 pm on Mar 19, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1889
votes: 56


Just imagine the amount of data they have to work with..... Being greedy and trying to collect every possible bit of data on yours/mine browsing habits, it might be a start when all that is starting to bite back.

With all those sites that are barley floating past page 3 in SERP, that are polluted with Analytics code on them, getting no more than 50 hits a month, where a SMART Webmaster opened up GWMT account for the client a decade ago and the client hasn't logged into it for the past few years, I am pretty sure the DATASET must be humongous.

And I am pretty sure the introduction of AdBlockes( IOS 9_2_1 is the most use OS on IPhone and Tab on my sites included) also throwing a nice, heavy, curved-ball at algorithms that are responsible for crunching the data for the reports in question. Sales people at Mobile Stores are actually bragging about this built in feature when trying to sell the phone, works like a charm.

So I would say, no, not that they are not interested but they have a boat load of skewed up data on their hands and unfortunately The Shareholders come 1st.
12:53 am on Mar 20, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:9287
votes: 449


Google Data Quality Down
First, the term "quality" may mean different things to different people, so that needs to be established before proceeding forward. Then the entire premise is comparative (e.g. "down" compared to what, or when.) If it can be determined that data quality was in fact higher (using our established definition) at some point in the past, then we can look at the possible factors that may have contributed to this outcome.
Is this now a systemic issue?
This question assumes one possible cause. I would need to be skilled enough to interpret the huge amount of data needed to come to that conclusion either way. I think most here will only compare their own demographic to arrive at an opinion.
2:08 am on Mar 20, 2016 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1052
votes: 228


One point to always keep in mind with Google and data is that, with few exceptions, Google only reveals a partial bucket sort result or some sample percentage of the whole - typically without an accompanying accuracy level. Plus one can never be sure whether it is the same buckets or number of buckets or from what total number of buckets or same sample percentage with same level of accuracy as prior. Statistically much Google shared data is not fit for purpose.
Note: as soon as one is sampling subsets such as goals, revenue, transactions, etc, the probability of sampling error can become significant.
Note: GA Premium offers non-sampled data. Because paying customers don't accept sampled results (especially sampled by others and particularly without associated accuracy).

I haven't seen the phantom AdSense views folks have been mentioning (knock my wooden head); however, it sounds like a data filter has a problem.
Note: as G tends to deliver reports dynamically it makes sense that some folks see more data variance than others: some calls filters work as desired, others not. And a partial sometimes fail is always more difficult to trace, identify, and rectify.
Note: G seems to like to extend data to offer more 'insights'. Part of that data extension seems to be an historic tendency to expect (as examples) www and non-www and, more recently, http and https to exist and by default contain duplicate data unless explicitly told different. And then, presumably, apply filters to conform data to reality. If one or more filters fail strange things are seen in reports. And occasionally in results.

Google query results are increasing hit and miss as to quality in many niches. As a searcher I have nicknamed Google GIGO and it is now, at best, my third SE of choice. But then I am a reasonably knowledgeable experienced searcher; the ads on top followed by Wikipedia, Amazon, and YouTube plus the eye catching Answer/Knowledge Boxes and Google Shopping are probably sufficient for a majority of the general public. No need to look below the fold or leave the G filter bubble. AOL can only drool.
1:19 pm on Mar 20, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3179
votes: 221


the ads on top followed by Wikipedia, Amazon, and YouTube plus the eye catching Answer/Knowledge Boxes and Google Shopping are probably sufficient for a majority of the general public.

Maybe google should put a check box below the search field:

-- I'm part of the general public, so give me dumbed-down results.
or
-- I'm above the level of the general public, so give me intelligent results.
8:49 am on Mar 21, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:May 3, 2015
posts: 291
votes: 125


7_driver is absolutely right. I actually came to webmaster world after I Google has exhausted me of my love for them. I mean really, how hard is it to assign someone that can check up on what team is doing what, and keep webmasters updated regularly. It would be a trivial cost for this to happen, but Google doesn't really care. John Mu is cool, but he doesn't know what he's talking about half the time, its about the vaguest information you can give without annoying most webmasters. the last few years they have absolutely destroyed any good will many webmasters had for them.
11:03 am on Mar 21, 2016 (gmt 0)

Preferred Member from DE 

Top Contributors Of The Month

joined:Aug 11, 2014
posts:519
votes: 161


Well, I am one of the privilidged people who is using Analytics Premium and the data is 100%. Even take-the-data-with-a-pile-of-salt tools that link GSC to the landing pages in attempt to unmask the not provided data are working reasonably well. Now for the normal GA - the reliability of the data is upwords of 90% assuming you cleaned your profile out of the obvious referal and direct spam, bots and other fake traffic.
5:44 pm on Mar 21, 2016 (gmt 0)

Preferred Member

5+ Year Member

joined:Jan 6, 2011
posts:478
votes: 1


Everyone should really be using something else besides or, in addition to, GA for their site(s). There's several options but one of my longtime favorites is StatCounter. I think it's best for small-to-medium traffic sites but it's been great for high traffic sites as well. In my experience, it's far more accurate and reliable than GA and using the two together has helped me better understand my traffic.
4:04 pm on Mar 22, 2016 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts:3610
votes: 37


@EditorialGuy - I eventually found Bing referrals by going to Acquisition -> All Channels -> Organic Search -> Sources rather than via referrals however this just begged the question as to why r.search.yahoo.com and bing.com are included in this report and duckduckgo.com, uk.search.yahoo.com, in.search.yahoo.com and ca.search.yahoo.com are not.

Also does anyone have a clue as to where to find the breakdown between google.com, google.co.uk, google.gg etc

@keyplyr- I was basing the quality being down from my observations, these last couple of years the number of errors I have seen in Google data has gone up considerably - yes they have made their products significantly more complex and thus have many more reports and data items to deal with, but when errors start creeping in to the level seen recently it is probably time to rein in the new features and consolidate.

As to it being systemic - a single cause could be a lack of appropriate testing procedures before go live, failures in training new staff in the products they are working on, lack of funding to departments etc, etc. The debate here is to try and decide whether this is the case, not necessarily to determine the cause.

@Panthro - my main use of GA is for experiments - I can't find anything out there that provides similar levels of functionality in the A/B testing arena.
7:46 pm on Mar 23, 2016 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts: 3610
votes: 37


Just had one of those really strange experiment results - this is probably the biggest swing I have seen yet.

All Users selected (no other segments showing on report) Conversion rate +9.45%
All Users selected (with Desktop, Mobile and Tablet segments showing as options on the report) Conversion rate -7.19%

It is this kind of data that really dents my confidence in Google Analytics.
2:54 am on Mar 30, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2001
posts: 1265
votes: 13


I don't think any of the problems you identified are too difficult for the 30,000 super-smart Googler's to fix - it's just that they don't care enough to do it.


I don't think it's lack of caring, it's size and the bureaucracy that comes with it.

I can anecdotally attest to the effects of this on both the Adwords and Adsense sides. They seem to have lost agility with both products. And the people working there (or those I've spoken with) don't understand the systems they're working with as well as people there once did. Too many degrees of separation.

Hopefully the Adwords redesign will include some behind the scenes changes as well.
7:36 am on Mar 30, 2016 (gmt 0)

Senior Member

joined:July 29, 2007
posts:1780
votes: 100


I had a few reports in which a simple backpage and re-visit, without changing any settings at all, yielded different results with analytics.

In 2016 I'm making some changes, I'm removing tracking instead of adding it. I find critical or server issues via my log files and I am focusing on ad performance more than page performance. That data is already gathered by the ad server which contains some visitor information as well. It's a bit more difficult to reveal some avenues for improvement but in this post-SEO era where only content matters I don't focus on more than creating the best content I can anyway.

Works for me. Less tracking = more speed too, bonus.