Welcome to WebmasterWorld Guest from 54.198.134.32

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Most of Your Traffic is Not Human

     
8:40 pm on Jul 6, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


It's a disappointing but eye-opening statistic that most of the traffic to our websites is not from actual people. In fact, well over half of our traffic is not human.

Bot traffic is in an uptrend. Most of this is from bad bots, or at least by bots that are not beneficial to our interests (depending on site model.)

Here's the estimated breakdown*:
28% Search Engine & other good bots
10% Scrapers & Downloaders
5% Hacking tools & scripts
1-3% Automated link spam
12% Other impersonators

Analytics & site reporting software is easily fooled by bots masquerading as human. That's not what they are built to do.

*based on 10k daily page loads (YMMV)
12:20 am on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


I find this issue of bad bot killing difficult to sell to customers. Most simply shrug shoulders and say I have a solution looking for a problem. The solution often taken is to pay more for more server. Bot ID and killing takes time and is expensive, and bot makers are creative. These companies do not acknowledge there is a problem, even when I point out a blatant bot (4k server hits in 6 hrs! That is not human). Bots are hidden and do their work in the shadows of the interweb, exposed only briefly by forensic log analysis of a select few.

It is great that you can offer web stats to prove that your sites cater to real humans. That is very novel!
12:31 am on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


It is great that you can offer web stats to prove that your sites cater to real humans. That is very novel!
Back to my sites for example, I sell ad space. I do allow some beneficial bots access, but potential clients can see stats showing real human traffic with the bots filtered from those stats.

It's true, many have no idea of the amount of non-human traffic and will compare my stats to other sites selling advertising. It's sometimes difficult to convince potential clients that the other sites are counting all traffic and not just humans.
1:45 am on July 10, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts: 1063
votes: 243


One easy solution is to set up a free cloudflare account, then use the firewall function of cloudflare to issue a captcha challenge to traffic from suspect countries. For me, this has reduced the number of bot hits dramatically and eased server load. Plus the added CDN bonus. Simple and free.
In my case, I only do business with the US, Canada, UK and Australia. Everything else gets challenged. No more bots.
1:51 am on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


samwest - that may work well for your specific interests.

However, consider there's a lot of useful information in raw access logs about which bots, from where, asking for what, using what platform, what methods, etc. This can be extremely valuable (for reasons I won't go into.)
4:12 am on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14624
votes: 392


...the collateral issue - it has to slow down things for actual visitors to have all that background noise constantly in the picture is damage enough.


I agree. WordFence is pretty good at setting thresholds for blocking certain kinds of activities related to hacking which helps to reduce server wide slowdowns. However there is much room for improvement.
1:48 pm on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


Site owners don't see a bad bot's attack until they crack the site. Performance degradation is slow and unnoticeable, like slowly boiling a frog, so site owners don't notice. Besides, performance degradation can be attributed to the actual site software. When it gets really bad host providers just tell the customer to upgrade their hosting package.

It is like house plumbing. Of course we know that plumbing is important, but out of sight out of mind, most people won't consider a plumbing upgrade until a catastrophe happens.
1:53 pm on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14624
votes: 392


Site owners don't see a bad bot's attack until they crack the site.


They're visible in 404 errors and many other ways.
1:56 pm on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


My customers would not know an access log if it hit then in the face. They very rarely even look at AWStat, as their eyes glaze over. For day to day operations they never see a 404, much less know what that even means.
6:01 pm on July 10, 2017 (gmt 0)

Full Member

joined:July 23, 2015
posts:254
votes: 76


@keyplyr
Example: A large number of botrunners hosting at AWS are marketing companies that gather data rolled into products they sell to help ecommerce clients develop ad campaigns.

If you publish Adsense or other ads you would want your site data included in these products to facilitate ad placement and drive up bidding. This translates to greater income from ad clicks.


I used to rely on Adsense, but has not been since 2013. The writing was on the wall even then and earlier. I am now opposite , I actually SPEND on Adwords. And such, I don't want any products that facilitate bidding UP. Because Adwords ROI is not there right now, let alone when some dudes bid it up with bots for the rest of us who try to make a real living in the real world of commerce. So AWS is blocked.

If publisher is good and relevant to our niche - and that is less than 1% - we have an affiliate program he can sign up for and send us relevant traffic that we can control. That way I don't have to pay Google a significant cut of my revenue.
7:56 pm on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


@smilie - the example about beneficial bots coming from Amazon hosting was more of a generic statement and not so much specific to your current advertising model.

Point is, blocking with a broad brush isn't a good idea. Neither is whitelisting if you don't diligently watch your hourly/daily logs to see exactly who/what is getting blocked by that filter.

I find new agents almost daily, investigate who they are & if there is any bennefit, decide whether to give them access.

@TorontoBoy - you have an opportunity to provide extended site management to your clients. Just have them read this thread :)
9:10 pm on July 10, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14246
votes: 551


I think I'm more lenient than keyplyr ;) except for a couple of specific categories.

Crudest possible test:
--page requests with 200 response vs. page requests getting 4xx response (I can ignore bona fide 404s because they're vastly outweighed by the manual 404 I serve to select robots, to say nothing of robots asking for long-gone pages).
On my teeny-weeny site it averages 2:1 permitted vs. not permitted for pages alone. When you consider that those 200s include authorized robots, that's a heck of a lot of non-humans. But if I look at all requests, not just pages:
--all requests with 200/304 response (to include static files such as images) vs. etcetera-as-above
the balance shifts to 6:1 in favor of authorized requests. That's an important difference if you're talking about overall server load.

On the other hand, some sites have to think about the truly malign robots, where an unintended 200 is only the beginning of your troubles.
11:41 pm on July 10, 2017 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:July 23, 2004
posts:508
votes: 48


I've been preaching about this for years -- I get a lot of newbies that think they're doing great until I point this fact out to them.
Want a ton of traffic? Just tweet something, and here come the bots - Hundreds of them from all over the board - Want even more traffic than that? Just sign up for every other social networking platform and you'll have bots pouring out of your ears.

Social networking is the scourge to any of your metrics regardless of type, style, or flavor.

25% of the net is profitable - The rest is just garbage
5:14 am on July 11, 2017 (gmt 0)

Senior Member

joined:July 29, 2007
posts:1780
votes: 100


My question is: There must be significant money made by deploying so many bots. How are bot writers making so much money, who is paying them and why?


Data collection, plain and simple. You can profit from the data you collect by selling access to it or by creating tools that use it and selling access to that. The worst offenders sell data gathered about your site to the general public, including your competitors. The distinction between "good" vs "bad" bot has been blurred as well. Is a bot that gathers info about your every page good just because it is called Alexa or evil just because it is called by something else? The metric used should be number of visitors generated by allowing a particular bot, if it's zero then you probably should block.

It gets worse when you consider that your site's contents and pages are evaluated without even visiting your site because a cache copy is available elsewhere, it's a bit out of hand in my opinion.

If a bot is not directly involved in driving traffic to my site it's a bad bot, period. Facebook, for example, is pro-actively trying to scan pages to decide if they approve of the morality or political affiliation of a page before allowing it to appear in some areas. This is censorship because it takes the decision making out of their users hands and so I consider all Facebook bots "bad" even though they are listed as good on metrics reports.

I've become heavy handed on this front in fact and only whitelist a handful of search engines. I block a full 35% of the page requests I get at this point which is absurdly high but over 93% of those are not human. Over time you get to know what to look for if you monitor your own server logs and keep records. It's not fun, but unfortunately neccessary.
5:57 pm on July 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 14, 2003
posts: 4309
votes: 35


Careful when you tell suits about this. They do not take kindly to seeing their numbers drop.

They are very happy not knowing this.
11:02 pm on July 17, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


Agreed ogletree. Sadly, even with this knowledge, many will just continue to believe their web properties are getting that many visitors.

When promoting their stats, they may feel at a disadvantage using the *real* numbers since their competitors aren't, so the falsehoods go on and on.
4:04 am on July 18, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


It seems, to me, like the worse kept secret and internet scam about the internet, when bots are goosing up the numbers. People can really get upset if their AWStat or Google Analytics numbers vary. Are site owners really that delusional, or is this more like an adult version of a popularity contest?

Why do site owners not recognize that if you do not control bots to your site, your site visit numbers might be totally bogus? I am often flabbergasted at their naivety, though I can excuse their ignorance.
1:20 pm on Aug 5, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:679
votes: 8


You can filter out bot traffic etc. in GA right? I read somewhere this ... probably try on mine.. which is a huge mess partly of my own oversight too.. which I'm fixing now.
2:44 pm on Aug 5, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


If a bot has an identifiable user agent (UA), then yes, you should be able to filter them out in GA. More often than not bots, especially bad bots, masquerade under the anonymity of Mozilla and pretend to be human with a browser UA. GA will not identify these.
4:49 pm on Aug 5, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts:1063
votes: 243


I see this every day on my signup forms...the bots sit there for 20 minutes trying to get past the captcha before timing out.and leaving dejected.
7:39 pm on Aug 5, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549


You can filter out bot traffic etc. in GA right?
Google Analytics is not website security software. It is a traffic report, and a very poor one at that. It misses quite a lot.

The only way to keep track of what/who is hitting your server is by hourly/daily examination of your server's raw access logs. The more you do this, the better you get at identifying what is malicious, beneficial, or just useless to your interests.

There is a wealth of information in these forums to help you learn all this. Do the reading.
12:24 am on Aug 6, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 19, 2004
posts:679
votes: 8


Yes agreed Keyplyr as I've been saying I'm watching the logs now.. I never knew invalid traffic is a problem with my site until Adsense sent in an alert! :-(
12:54 am on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts:1063
votes: 243


No humans the past 48 hours, but man the traffic shown on GART is crazy....but it's all non converting, non interactive. As posted in the August SERP thread, I had several surges of traffic today to one page. The site was at 1 or 2 visits all day when suddenly that one page jumped to over 100/ Then back to zero in 5 minutes, then again back to about 120 for 5 minutes and back to zero. What is causing this I am wondering. During these surges the GART map showed traffic from all points around the USA, so it was probably not bots, unless they are sophisticated enough to spoof locations.
If it's Google throttling turning off momentarily then it really chaps my a$$ that they are holding that much traffic back.
Insanity!
2:00 am on Aug 8, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3451
votes: 182


probably not bots, unless they are sophisticated enough to spoof locations.
Sounds more like bot-nets. Remotely controlled compromised machines whose owners don't even know they visited. They all might wonder why their machines run so slow. I've seen them in logs and the activity would look like streaks of heavy traffic, but they are requesting different resources from a page from 15 different IPs and they all have the same UA. Clearly not human visitors.
2:05 am on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:1238
votes: 367


@samwest are you filtering traffic in GA by hostname. That is excluding any traffic that has a hostname other than your domain?
3:16 am on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts:1063
votes: 243


@samwest are you filtering traffic in GA by hostname.

No, I'm not filtering anything.
What 'not2easy' says makes sense. The traffic is definitely not human as 100% just hits the page and leaves. If human, a percentage would be clicking on links on that page (there are enough of them) and navigating the site, some possibly buying.
It's like McDonald's when a big tour bus of elderly folks pulls in the lot and they all just come in to use the bathroom.

BTW - I use Cloudflare...shouldn't that filter bot net IP's by validating against DNSBLs?
3:33 am on Aug 8, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10092
votes: 549



Tracking & Logging discussion should be moved to the Tracking & Logging Forum [webmasterworld.com].
1:43 pm on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:1238
votes: 367


@samwest I would recommend going back to the days with the traffic spike in GA and check the landing page report with Hostname (Behaviour/Hostname) set as the secondary dimension.
2:15 pm on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts:1063
votes: 243


@ NickMNS - On that exact day it shows only my hostname, but the session duration is 10 seconds and the bounce rate is 91%. That hardly looks human or is VERY poorly targeted for the term. Appears to be all organic and mostly mobile. Sure did bump my overall session stats...not that it had any value whatsoever. This surge appears every few weeks and almost looks like it is intended just to pad up my stats to make it look like I actually get decent traffic. This has to be non human garbage traffic

@keyplr - we ARE discussing identifying human vs. non human traffic here.
3:11 pm on Aug 8, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 163
votes: 12


I have always wondered if bot operators are being paid to artificially inflate visit numbers, for the purposes of ad revenue. I have no proof, but have seen enough trash bot traffic to wonder what is their objective.
5:30 pm on Aug 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts:1063
votes: 243


wonder what is their objective.

I'm guess that it's just to mess with people for the sake of messing with them. Sad.
This 67 message thread spans 3 pages: 67