homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

This 354 message thread spans 12 pages: < < 354 ( 1 2 3 [4] 5 6 7 8 9 10 11 12 > >     
Logs Show Surge, but Not Human?

 9:23 pm on Feb 21, 2012 (gmt 0)

On one site I work with, I've seen traffic go from 10K visits/day to 40K. The additional traffic looks human at first glance - it is captured by Google Analytics, It comes from diverse consumer IPs in the US and Europe (but not Asia), and the bounce rate is high but one out of ten visits or so loads another page.

On the non-human side, we have all of the traffic coming with no referrer, and it is all focused on a few pages that are hardly viral linkbait and would get one or two views on a good day. It's all IE (spread among 6 - 9), and a range of screen resolutions that look unusually aged (e.g., 1024x768).

Anecdotally, I've heard of a few other sites seeing this kind of traffic, but nobody knows what the purpose might be. It's not scraping, as it's the same pages that get hit. It's not intense enough to be an attack to take the site down, nor is the site likely to be the target of miscreants.

The level of traffic has gone up and down, but it's still happening.

Are any of your sites seeing this, and do you have any theories?

Any thoughts on screening this out of Analytics? It totally blows up time period comparisons.



 7:36 am on Mar 1, 2012 (gmt 0)

Does the bot hold cookies?


 8:38 am on Mar 1, 2012 (gmt 0)

netmeg: I understand seasonal, also understand bandwidth. Merely a query as to what number is hitting you that causes damage? I' at 150k hits with null injury to site, are you experiencing more than that? Is the host small/shared/low budget?

This unvalued traffic is krappola, of course, but most sites can take it, and most with value traffic/ads, can survive. Intrigued by the death knell tone... just seeking more info which might help others. At present, this is a hairy thorn and a bit painful, but will not put me out of business.


 1:48 pm on Mar 1, 2012 (gmt 0)

On an AdSense supported site, it would put the advertisers "at significant risk." It would very likely get me kicked out of AdSense, if I left it (and they'd be justified in doing so). This is not an affiliate site, it's a community service event site. At peak season, the site sucks up an enormous amount of resources for a short period of time, which need to paid for, but the nature of it is such that it's not easily otherwise monetized besides ads.

It also trashes my analytics, which I need to make business decisions (and possibly sell direct advertising) It obviously reads javascript, because it's being picked up by all my various stats and analytics programs. It kicks me up into the next two levels on various software packages that I purchase that are priced based on traffic, which is not a big deal as long as it can pay for itself, but adds up if it doesn't.

I don't know if it's an attack, or a mistake, or collateral damage, but it doesn't matter, I still have bills to pay. All I can do is put it on another domain, non-redirected, and start over.


 1:57 pm on Mar 1, 2012 (gmt 0)

@netmeg, thanks. All valid points and I get it. Thank you for the followup... not all sites are made to the same dimensions. Perfect sense.


 2:11 pm on Mar 1, 2012 (gmt 0)

I can't even redirect the old one, because it will just bring the traffic with it.

Can you salvage some of your good traffic by redirecting from pages that this whatever is not visiting? A page down from home etc?


 2:38 pm on Mar 1, 2012 (gmt 0)

Dunno yet. Have to talk this through with my developer. The old site is very well indexed, so introducing a new one with all the same content will have some challenges.


 2:43 pm on Mar 1, 2012 (gmt 0)

Not really my place to suggest, but if i read you correctly, you might consider replacing something with piwik, its free an superior to the other something which ain't free beyond basic level, for just the affected sites for now

ignore this if i've miss read


 9:20 pm on Mar 1, 2012 (gmt 0)

I've got piwik and it was a nasty surprise to find that robots are getting logged as humans. The original premise was that robots are on & off the page before the js even has time to activate-- but that wasn't taking into account the robots that deliberately read and act on the js. They don't arrive bearing signs that say I, Robot so you can't block them in advance.

Why it should be necessary to tell known quantities like b---- and g---- to stay the ### out of your analytics is a whole nother thread.


 9:42 pm on Mar 1, 2012 (gmt 0)

Yea I'm fine with the products I'm using. That's not the issue.

Found a thread over at the Google Groups (and boy are those hard to use now) with people having the same issue. Everyone seems to have been hit around February 21. Also poked around on the off chance that Symantec might be interested in this, but they won't talk to you if you're not using any of their products. Fail.

Then, because it's #conspiracytheorythursday, I started wondering if someone who got Pandalyzed and was really pissed at Google devised this insidious method of rendering Google Analytics data useless, and rendering me collateral damage. If it spreads? Who knows.

I mean, it could happen... right?


 3:32 am on Mar 2, 2012 (gmt 0)

I feel left out not receiving these bot hits!

What about putting a piece of javascript like this in the header?

//---bot check---
var realuser=10;
onmousemove = function() {realuser-=10; onmousemove=null;}
if (document.referrer == '') realuser-= 4;
if (document.cookie == '') realuser-= 4;
if (Number(new Date()) < 1330647000000) realuser-=6;
if (self.location=='') realuser-=6;
if (top.location!=self.location) realuser-=6;
if (browsername=='') realuser-=6;
if (navigator.appVersion=='') realuser-=4;
if (browsername=='Microsoft Internet Explorer') realuser-=2;
realuser = (realuser>0);
if (!realuser) realuser = confirm('Click ok if you know your visiting www.yourwebsite.com');
if (!realuser) {
alert('Think your machine may have a virus');
//---- ----

if (realuser) {
alert('Hello real user!');
//put adsense/analytics within this if


 2:02 pm on Mar 2, 2012 (gmt 0)

Unfortunately, since the page needs to load to deliver this code, the site's analytics would still be screwed up. And it's not clear the attacker would care if the page content differed.


 2:36 pm on Mar 2, 2012 (gmt 0)

@rogerd, i disagree.

If the analytics code is put within the last if statement it would produce a lot better results.

I suspect this is a bot running quietly on unspecting users machines. The alert would inform the user of this, and may prompt them enough to lead them to remove the bot from their machine.


 4:18 pm on Mar 2, 2012 (gmt 0)

I believe my problem is over (hopefully). It peaked on the 22nd at over 6K hits to the home page. Each day after that, it became less and less. As of today, at 9:00am, there were 2 hits to my home page. As far as I know, this issue is over for me.

While not pointing fingers at anyone here, I do believe there is something peculiar going on here. After a particular response from above, I found a contacts page from a company listed in that response. There were many contact emails for the US, so I emailed them all (29 in total for different cities across the US) on 2/22/12. The email message did not point fingers, but simply indicated my concern, and asked if they might have anything to do with it. Oddly, I didn't receive a single response from that email. Weird, but I do know this... After that email went out, traffic started to dwindle, and as mentioned- today, I am back to normal.

Again, just stating what happened, and not putting a blame on anyone, or anything. It was just odd that I did not get a response from sending out 29 emails. Not even a "no way it could be us" response... (which is what I expected).


 4:30 pm on Mar 2, 2012 (gmt 0)

(I have no idea what you just said in that - "particular response from above" ? )

These are not actually users, and I'm not at all interested in putting in code that pops up on their machine to tell them they're infected. And it's a WAY inefficient method for removing the problem.

Unfortunately, we don't seem to be able to convince anyone that this is a serious problem that could become a *really* serious problem.


 4:40 pm on Mar 2, 2012 (gmt 0)

"particular response from above"... includes a company name that might be the cause of the problem.

and correction: Email went out 2/28


 8:08 pm on Mar 2, 2012 (gmt 0)

After having evaluated some data from a site being hit by this bot attack I believe that I can now effectively block this traffic.

Unfortunately, this is a website analytics forum and not the Apache or Spider ID forum, so I certainly can't provide any specifics here as it would be off topic and off charter ;)

However, I will say that traditional log files, as usual, were completely useless in figuring out a solution to the problem. When it comes to logging and analysis of bot traffic that looks like MSIE, if you aren't tracking all header details, you're just wasting time.


 8:46 pm on Mar 2, 2012 (gmt 0)


[ applause ]


 9:04 pm on Mar 2, 2012 (gmt 0)

Again... my problem is fixed.

I addressed the same company as "staffjam" did. And his problem was fixed as well, immediately after he contacted them. hmm... Coincidence? Perhaps- All I know is that I am not experiencing this problem any more and they did not deny that this was happening.


 10:52 pm on Mar 2, 2012 (gmt 0)

dmember - sadly you're incorrect - the hits are still coming think and fast at around 20k per day to the homepage. Quite a few peopel are looking into this now - but none of us are getting close to a solution (so it seems.)


 11:33 pm on Mar 2, 2012 (gmt 0)

Well, I guess that answers that question. It's weird, though that no one contacted me about this out of the 29 US contacts. I have only gotten 2 hits to my home page today, but I will be sure to monitor it.

Anyway, I am sure google will take a big part in correcting this, whomever/whatever is doing this. If, and when they find out the source, I am sure they will twist someone's cap backwards.


 2:19 am on Mar 3, 2012 (gmt 0)

Guys, seriously, I'm in a bind here.

Can't publicly share what I found because if it is effective in stopping this problem, it's also not that hard for the source of the problem to fix their flaws.

I'm sharing some details with a few people under NDA at the moment, but I'm not sure how to proceed publicly. The reason is if the people doing this are capable of fixing what I found, which is very possible, we'll probably never stop them a second time.


 6:08 am on Mar 3, 2012 (gmt 0)

@incrediBILL, enough info has been released that effectiveness might already be compromised. For every defense there will another attack, just like for every night there is another day... (sigh)

And enough info has been released to harden the attack vector to make it even more difficult to stop (by adding realism).


 8:06 am on Mar 3, 2012 (gmt 0)

@tangor - actually, nothing in this thread even touches on what I used to block the bots, so nothing has been released yet, at least not from my camp ;)

Assuming they're all the same bots hitting the various sites, which has yet to be established, I should be able to nuke them all as long as they keep doing what they're doing!


 2:46 pm on Mar 3, 2012 (gmt 0)

incrediNILL - can you let us know who is behind this? Do you have that information?


 3:23 pm on Mar 3, 2012 (gmt 0)

if you aren't tracking all header details, you're just wasting time.

Sorry Bill, guess I didn't get this the first time around. And even if that is not the magic bullet... the pharts have been put on alert to tighten up.

What does confuse me, is the flood... to what purpose are these hits? What's the benefit? And to whom?


 3:31 pm on Mar 3, 2012 (gmt 0)

If this is indeed the company identified above, my best guess is that it is collection of comparative data that was assumed to benign because it's not actually taking sites down. The nimrods behind it are likely unaware of the impact on engagement metrics, Adsense, and comparative analytics.

Or, it could be a screwup that got magnified by the size of the network. E.g., a link gets followed from somewhere, and then filters into some kind of a list that launches the bot army.

All total speculation.


 3:58 pm on Mar 3, 2012 (gmt 0)

I am still getting hit with it but its down about 80% from peak. Still rather annoying. but slowly dropping.


 3:53 am on Mar 4, 2012 (gmt 0)

OK, I'll toss out an .htaccess bone that will work for some and not others.

Considering it appears to be an international group of IPs and the language is set to whatever their browser is set to, you can potentially get rid of a bunch of the botnet just by blocking non-english browsers, assuming you have an english only site.

# block MSIE non-en(glish) browsers
RewriteCond %{HTTP:Accept-Language} !en [NC]
RewriteRule .* - [F]

That's a nice 403 for those in the botnet.

Simply change the language code for your own purposes if you're protecting a site in another language.

This won't block them all, but it'll send a bunch bouncing off the site so it should calm down the traffic a bit.


 9:31 am on Mar 4, 2012 (gmt 0)

An update on the previous (which had a bug). Should be put within the header tags.

//---bot check---
var realuser=10;
onmousemove = function() {realuser+=20; onmousemove=null;}
if (document.referrer == '') realuser-= 4;
if (document.cookie == '') realuser-= 4;
if (Number(new Date()) < 1330647000000) realuser-=6;
if (self.location=='') realuser-=6;
if (top.location!=self.location) realuser-=6;
lang = (navigator.userLanguage);
if (!lang) lang = navigator.userLanguage;
if (!lang) lang = navigator.language;
if (lang=='') realuser-=6;
if (lang.indexOf('en')<0) realuser-=4;
if (browsername=='') realuser-=6;
if (navigator.appVersion=='') realuser-=4;
if (browsername=='Microsoft Internet Explorer') realuser-=2;
realuser = (realuser>0);
if (!realuser) realuser = confirm('Please ok if you know your visiting www.yourwebsite.com');
if (!realuser) {
if (document.body) document.body.innerHTML='';
alert('Think your machine may have a virus');
//---- ----

if (realuser) {


 1:16 am on Mar 6, 2012 (gmt 0)

Turns out, it can't be stopped after all. My site is out of business.


 2:43 am on Mar 6, 2012 (gmt 0)

Because of the bot, or was there other factors?
I've never failed in stopping bots hitting my sites. There must be something unque to them. Did you try my script above?

This 354 message thread spans 12 pages: < < 354 ( 1 2 3 [4] 5 6 7 8 9 10 11 12 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved