Welcome to WebmasterWorld Guest from 18.205.246.238

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Huge disparity in GA data, can anyone shed light?

     
3:07 pm on Feb 19, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


I'm trying to get some data out of GA but what I'm seeing makes no sense. I'm looking at the Pageviews number over 6 months for part of a site.

So, Behaviour > Site Content > All Pages then apply a filter to view all pages in a folder. That gives 482,000.

When I apply any second dimension to split by traffic type I get an increase of 136,000!
- If I then apply 'Default Channel Grouping' I get 618,000.
- If I apply 'Source' or 'Source/Medium' I get 618,000 (slight difference <50).
- If I apply 'Traffic Type' I get 618,000 (again, slight difference <50)

The 'Unique Pageviews' also increases by 16,000 - from 252,000 to 268,000, with slight variations again <50.

There is a big a self-referral problem, but I can't see how that would skew the data here. The total of all Pageviews should be the same as the total for all Pageviews organised by type, surely!

Can anyone help me understand why the figures are so radically different when I split the data into Pageviews by traffic type?
9:28 pm on Feb 20, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Do you see the "This report is based on 123,456 sessions (12.34% of sessions)." message at any point (underneath the date)? This is most likely to occur after adding a dimension or filter.
3:41 pm on Feb 23, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


Hi Andy - thanks for your reply, and sorry for my slow response. Different folder, different numbers (all real), but same problem:

Behaviour > Site Content > All Pages - then a filter to show everything in /folder/
Pageviews: 888,532 (1.52% of total) | Unique Pageviews: 450,910 (1.83% of total)

Now apply a Secondary dimension which is 'Source/Medium'
Pageviews: 1,071,633 (1.83% of total) | Unique Pageviews: 467,261 (1.90% of total)

So the numbers, and the percentage they are supposed to represent, both increase.

Change the Secondary dimension and the numbers sometimes vary from each other <50, but still way over the 'raw' initial total.

How can organising traffic into sources show a bigger total? I'm trying to understand whether this a documented glitch in Analytics, something to allow for, or whether it points to something (else) wrong with the Analytics setup on the site. According to one audit tool, there are c35K pages on it without tracking code, and there are other issues.

I just need to try to understand possible reasons for what I'm seeing, the 'thought process' behind the numbers being displayed if you will.

1000 posts woooo! Not quite what I hoped for from my 1000th post but hey ho :)
6:38 pm on Feb 23, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Can I just clarify that you don't see the message about sampled data? :)

Congratulations on becoming a thousandaire! ;)
12:38 am on Feb 24, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 20, 2006
posts:2076
votes: 63


+Once you've applied the Sec Dim, what exactly does the Sec Dim button say on it?
For example: "Secondary dimension: Source / Medium".

+Top right, below date, left of grad hat, click data icon, shift to max precision for both... numbers still shift?
5:24 am on Feb 24, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


Thanks guys.

@ Andy - sorry, I missed that in your first post. Yes, there is a message about sampling!

@RhinoFish
+ Source, Source/Medium, Default Channel Grouping - anything that splits into subtotals provides the inflated figure for the total with a variance of <50
+ moved the slider from the default to far right, it made hardly any difference, again <50

I can't get rid of the sampling by adding a filter to isolate one source / medium, or by setting a smaller date range (1 month).

So, am I right to think that the most accurate 'real' numbers I can get will be to (a) take the first total without sampling, then (b) use the percentages of the subtotals once the secondary dimension is active to (c) split the first total up by those percentages?

Asked another way, is there any way to sidestep the sampling, or is the best I can do to try to calculate round it?
8:57 am on Feb 24, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


The only way to get rid of sampling is to "simplify" the query (less filters) or reduce the amount of data (shorter date range). If it's critical you get correct numbers, then grabbing weekly data and aggregating is one way, for example. Otherwise, you might have success by getting the data via a different report or reports that don't use filters (e.g. use built-in reports rather than building your own via filtering.

Basically, it's a limitation of Google's system that becomes a problem if you have a large dataset you need to analyse.
2:08 pm on Feb 24, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


I don't think exact is that critical, it's more keeping margins of error down. The main thing is to understand WHY it is happening so thanks for clarifying.
6:42 pm on Feb 24, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 20, 2006
posts:2076
votes: 63


Please humor me.

+Once you've applied the Sec Dim, what exactly does the Sec Dim button say on it?
For example: "Secondary dimension: Source / Medium".
6:53 am on Feb 25, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


In the second set of numbers given above, 'Source/Medium'

If I apply 'Source' or 'Source/Medium' or 'Referral Path' then the inflated figures are exactly the same as each other.
If I apply 'Traffic Type' or 'Medium' then there is a slight identical difference in the inflated figures.
If I apply 'Default Channel Grouping' then there is another slight difference.

Default sampling rate is 1.86%.
5:14 pm on Feb 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 20, 2006
posts:2076
votes: 63


Does it exactly say "Source / Medium"?
Or does it exactly say "Secondary dimension: Source / Medium"?
7:08 pm on Feb 25, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


The latter
9:32 pm on Feb 26, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 20, 2006
posts:2076
votes: 63


As Andy L suggested above, does a shorter date range make the shift go away?
You might have to pull pieces and compile.

Before you filtered it to get 482,000, what was the data set size?
I think (like others) that applying the Sec Dim is causing resampling, and that gives a shift.
If the filter you're applying is not correlated to the Sec Dim, I could see large shifts happening, as long as your filtering is severe (>95% of data removed).
8:49 am on Feb 27, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 16, 2009
posts:1082
votes: 79


I tried from 6 months down to 1 month, but even if going to a week makes it stop, I don't need the exact numbers THAT much!

The original unfiltered size (whole site) is tens of millions.

Yes, no matter what Sec Dim I apply there's always sampling and the biggest percentage I can get is still under 3%. And yes, the filter I'm applying (which is to group pages) is not related to the Sec Dim (which is to group traffic source).

I'm going to take the % that the split represents with the Sec Dim active and then apply that to the total without the Sec Dim active. The main objective here is to benchmark parts of the site against other parts, so the relationships between the numbers is more important that the actual numbers being spot on.

Thanks for your replies.