Thanks for the follow up Robert, I did get to the bottom of the problem shortly after posting the above, mostly by lucky timing, but there is no real way to block it in advance even though you know what it is. The script being run that damages my stats is not intentionally trying to do that but the end result is the same.
"fully renders" = visitor loads everything including favicon or mobile equivalent? That is, fully humanoid?
Do they accept cookies? execute javascript?
yes and yes, fully humanoid, pulls the favicon and executes javascript(incl adsense). Since adsense doesn't load without cookies enabled I'm also going to say yes on cookies but I haven't run it to test.
While I think it would be a bad idea to mention sites that run the script I do think it's safe to say that the base code has been hosted by Google Code for some time and the intended use is not malicious. Thankfully it's not in widespread use.
The thread you linked Robert isn't likely this particular script but it is likely very similar. The footprint would be near identical.
I'm going to send off a note to the Google Code team to let them know, they'll be able to see the same thing if they run it on a test site. I have a feeling they'll likely remove the script or change the code to load 3rd party sites only once and work from that single load by caching it or whatever.
Footprints, something to look for:
- many visits in a relatively short amount of time with a 100% bounce rate
- visits extremely varied in terms of browser used and mobile vs desktop.
- IPs and referrers are of no consequence, the IPs change between visit groupings and many IPs are used per event
- entire page is loaded including ads
- all hits will be to the same page but each time the script is run any page can be selected
That doesn't give you much to work with in terms of effectively blocking it. Unfortunately you can't even use code to slow down page loads when one particular page is requested more often than normal.
Why not? Because the script will apparently wait up to an hour and retry according to the site text. You might be able to mitigate some damage by disabling ads temporarily when a single page is loaded more often than expected in quick succession. I'm not sure if that's even an option for most webmasters, or if such a script exists.
If you look at your analytics and break page stats down by the hour you'll see the effects in the form of a spike during a one hour period. You can confirm it by seeing a 100% bounce rate, or nearly 100% if some regular visitors happened to visit at the same time. Unfortunately that isn't an exclusive footprint of just this one script.
I wouldn't even try to spot this particular problem on a heavily trafficked site, the scope is roughly 100 direct hits in size per event.
I'm not going to follow up any futher on this, there is really nothing more that can be done but hand it off to Google since they are hosting the code. Mentioning the site/code here will just make the problem spread and there isn't a viable way to stop it that I know of. If one becomes available I'm sure Robert will update the thread.
If anyone has a method of detecting an unexpectedly high number of requests to a single page within a set time that can temporarily turn advertising off for that page I'd love to hear about it. Such code would be useful vs many different bots for small and medium sites.