homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Identifying sources of unattributed, direct traffic that skew stats
JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4600519 posted 12:11 am on Aug 4, 2013 (gmt 0)

Problem: my site is receiving sudden spurts of direct traffic every 1-2 days and sometimes more than once per day. This has been going on for 3 weeks now though I only started to dig in to analytics and log files on Thursday. What I found:

- 130 to 390 direct bot visits, all to the index page with a 100% bounce rate within a few seconds of each other

But that's where this gets hard to track, the following is a breakdown of other signals

- 60% mobile, 40% desktop
- 13 to 17 visits in groups from the same city within 4 seconds, 100% bounce
- No more than 3 sightings of the same IP address
- No more than 5 visits from any one browser/OS
- User agents are equally varied
- Visit groupings come from various countries including the U.S., China, Russia, the U.K and Australia
- The same thing happens each time but the IPs are always different, as are user agents on browser/OS and the cities change 80% of the time as well
- Only a handful of referrers have been provided, 14 in total over 3 weeks

Technical site difficulties ruled out
- This has been going on for 3 weeks
- all of the cities/browsers/OS have normal non-bot traffic as well
- Host has confirmed odd traffic pattern, assures site is running optimally on their end.
- A close look at server logs shows that when a page fully renders the visitor reloads the page with a diffent browser/OS but same IP, repeated x3

One of the referrers yesterday tipped me off to look for an image service, namely a thumbnail service and I found what I believe to be the cause. A site is offering to test websites on 160+ different browsers from desktop and mobile devices to provide a detailed compatibility report.

Apparently the owner doesn't realize that the urls his/her site is creating can be followed by others and each time a new set of tests is run on my site. The results are public, anyone visiting the url triggers more testing OR the service is updating their records frequently, or both.

Problem - since I have no real piece of information to lock onto with the visits themselves I am not sure how to block this service. I checked adsense today and waited with the analytics live feature and sure enough even adsense spikes by roughly the same amount of visits.

100% bounce rate, false impressions in adsense, this service has got to stop hitting my site asap. Suggestions of a non-legal nature? If I can shut it out with htaccess that would be ideal... but how?

 

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4600519 posted 7:01 am on Aug 9, 2013 (gmt 0)

Note that the original poster, JS_Harris, identified the source of the problem soon after posting, so he didn't need feedback on his questions. His post, though, which has been sitting in our Google forum premod area while I've checked things out, rang some bells about other discussions we've had in WebmasterWorld about spurious traffic... and I've felt this post should be published, but with the condition from administrators here that (at least for now) we not identify the site or sites causing the problem.

Let's say simply that in this case, the source of the problem turned out to be a generic browser testing site, one which is intended to be benign and which used browsers on distributed computers to capture some data from their respective locations. I hope that doesn't mischaracterize what the setup is.

A thread that this post brought to mind for me is this one that's been running in Analytics from Feb 21, 2012 until now....

Logs Show Surge, but Not Human?
http://www.webmasterworld.com/analytics/4420174.htm [webmasterworld.com]

I doubt very much that this is the same thing as the surge thread. There appear to be many non-parallel aspects to that discussion and this one. At the same time, there appear to be some similarities, and it struck me that this might be of general enough interest to post here and prompt further discussion on the overall topic.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4600519 posted 8:16 am on Aug 9, 2013 (gmt 0)

- A close look at server logs shows that when a page fully renders the visitor reloads the page with a diffent browser/OS but same IP, repeated x3

"fully renders" = visitor loads everything including favicon or mobile equivalent? That is, fully humanoid?

Do they accept cookies? execute javascript?

Would it help to be able to block the back 2/3 of each visit?

JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4600519 posted 8:40 am on Aug 9, 2013 (gmt 0)

Thanks for the follow up Robert, I did get to the bottom of the problem shortly after posting the above, mostly by lucky timing, but there is no real way to block it in advance even though you know what it is. The script being run that damages my stats is not intentionally trying to do that but the end result is the same.

"fully renders" = visitor loads everything including favicon or mobile equivalent? That is, fully humanoid?

Do they accept cookies? execute javascript?

yes and yes, fully humanoid, pulls the favicon and executes javascript(incl adsense). Since adsense doesn't load without cookies enabled I'm also going to say yes on cookies but I haven't run it to test.

While I think it would be a bad idea to mention sites that run the script I do think it's safe to say that the base code has been hosted by Google Code for some time and the intended use is not malicious. Thankfully it's not in widespread use.

The thread you linked Robert isn't likely this particular script but it is likely very similar. The footprint would be near identical.

I'm going to send off a note to the Google Code team to let them know, they'll be able to see the same thing if they run it on a test site. I have a feeling they'll likely remove the script or change the code to load 3rd party sites only once and work from that single load by caching it or whatever.

Footprints, something to look for:
- many visits in a relatively short amount of time with a 100% bounce rate
- visits extremely varied in terms of browser used and mobile vs desktop.
- IPs and referrers are of no consequence, the IPs change between visit groupings and many IPs are used per event
- entire page is loaded including ads
- all hits will be to the same page but each time the script is run any page can be selected

That doesn't give you much to work with in terms of effectively blocking it. Unfortunately you can't even use code to slow down page loads when one particular page is requested more often than normal.

Why not? Because the script will apparently wait up to an hour and retry according to the site text. You might be able to mitigate some damage by disabling ads temporarily when a single page is loaded more often than expected in quick succession. I'm not sure if that's even an option for most webmasters, or if such a script exists.

If you look at your analytics and break page stats down by the hour you'll see the effects in the form of a spike during a one hour period. You can confirm it by seeing a 100% bounce rate, or nearly 100% if some regular visitors happened to visit at the same time. Unfortunately that isn't an exclusive footprint of just this one script.

I wouldn't even try to spot this particular problem on a heavily trafficked site, the scope is roughly 100 direct hits in size per event.

I'm not going to follow up any futher on this, there is really nothing more that can be done but hand it off to Google since they are hosting the code. Mentioning the site/code here will just make the problem spread and there isn't a viable way to stop it that I know of. If one becomes available I'm sure Robert will update the thread.

If anyone has a method of detecting an unexpectedly high number of requests to a single page within a set time that can temporarily turn advertising off for that page I'd love to hear about it. Such code would be useful vs many different bots for small and medium sites.

JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4600519 posted 10:06 am on Aug 9, 2013 (gmt 0)

I should add that it's very possible that what I ran into is isolated and/or that I am wrong from a technical point of view about the full impact potential(5000 events on 5000 pages causing 500,000 hits per script install daily). I hope I am wrong in fact, or that people at Google who know more than me are filtering the negative effects of this out in adsense/rankings etc already which is likely. I don't think I'd have even posted at all if I had known the cause before making my first post actually, I suggested it be deleted when I was told it was being reviewed since I had figured out the source. Its tough having a useful discussion without being able to identify the source of concern for all to evaluate themselves... which is probably why the post hung up in premod for a few days.

Still, hopefully a source of false and negative stats gets cleaned up as a result and some strange reports in your analytics/logs make a little more sense.

Robert - do you think setting up a temporary domain and running the script on it over and over might yield a full list of ips being used or other cumulative data that can be identified and filtered in this case? My logs cover several weeks and it wasn't enough. By nature the only person who should run this script on your site is you so filtering it would not inconvenience anyone but filtering it would clean up a lot of stats.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4600519 posted 8:18 pm on Aug 9, 2013 (gmt 0)

If anyone has a method of detecting an unexpectedly high number of requests to a single page within a set time

It's not just repeated requests, it's repeated requests with a different UA, right? This pattern isn't completely unheard-of among humans, but it's rare.

You'd need to log request headers and check whether a given IP has already been here, but is now showing up {with a different UA} or {without cookies}. That's assuming for the sake of discussion you're not so tightly focused that you might get legitimate unconnected visits from the same IP, for example different people in the same office.

Jonesy

5+ Year Member



 
Msg#: 4600519 posted 11:20 pm on Aug 10, 2013 (gmt 0)

You'd need to log request headers and check whether a given IP has already been here, but is now showing up {with a different UA} or {without cookies}.


Not a very likely circumstance for any one web site; but my wife and I -- both sitting in our home office -- would show up with the same IP and different UA's: she with some windows browser, and me with some linux browser.

Jonesy

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4600519 posted 12:09 am on Aug 11, 2013 (gmt 0)

my wife and I -- both sitting in our home office -- would show up with the same IP and different UA's

Yes, that's the drawback. There are rare situations where one IP : multiple UAs could be a legitimate human activity, like you turning to your wife and saying "Check out this site, here's the URL". I mostly get it for unfinished ebooks, where someone's checking whether it looks the same in MSIE and another browser.

You said somewhere along the line that the follow-up requests can come up to an hour later, right? So you can't even say "If these two requests come more than five seconds apart, it's two humans and I won't worry about it." You could block obvious robots, but you'll be left with a lot of false negatives.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4600519 posted 12:35 am on Aug 12, 2013 (gmt 0)

With regard to lucy24's and Jonesy's observations about likelihood of repeated requests with a different UA...

You had posted earlier...
No more than 3 sightings of the same IP address

I'd taken the "no more than 3" here to suggest it was likely that no one user was using more than 3 different browsers.

JS - Without getting too much into the particulars of this particular test site... (more for fear of its being using for sabotage, btw, than about outing the site)... I'm still not exactly clear about the initial setup and about how your site happens to be on the list of sites this browser test site is checking. Did you happen to check your own site with it?... and is this then one of those arrangements where they post your url in some sort public list of sites recently checked? I'm also not sure how you arrive at your "500,000 hits per script install daily" impact-potential number.

JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4600519 posted 11:43 pm on Aug 12, 2013 (gmt 0)

I'm still not exactly clear about the initial setup and about how your site happens to be on the list of sites this browser test site is checking.


I did not check my own site, someone else checked my site (repeatedly over four weeks and counting now). It's browser based and so anyone can type in any url to check.

I came to the 500,000 figure because every 'check' results in 100+ bot hits to the url and the site keeps track of recent checks. Looking at that page, and refreshing it, you can see that the site is running this check on 200+ urls per hour, which is near 5000 pages checked a day. 5000 pages x 100 hits each = 500,000 bot hits daily... and this is just one public instal of this script.

If just 20 sites are running this Google Code script, and each checking just 5000 urls per day, you're looking at 10 million bot hits. Since the traffic appears legitimate I would be surprised if some people aren't running this script privately to pad their numbers, perhaps on a site they wish to sell and so want traffic stats boosted a bit.

aboshakeeb



 
Msg#: 4600519 posted 7:25 pm on Sep 25, 2013 (gmt 0)

I went through this 2 year ago , the symptoms is little different but same direct annoying hits .
The puzzle solved after a year . it was a Trojan : TROJ_OBVOD.TA
more info here : [about-threats.trendmicro.com...]

the Trojan is still functioning and some machines is still infected until now .

Check if your Site is listed above .

dougwilson



 
Msg#: 4600519 posted 3:04 pm on Oct 10, 2013 (gmt 0)

temporary:

RewriteCond %{HTTP_HOST} (that-site.com) [NC]
RewriteRule ^.* %{HTTP_REFERER} [L]

for a few days or a week and see if they get tired of not getting through

I do it sometimes with referer spam sites. After a while they seem to lose interest

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved