Welcome to WebmasterWorld Guest from 54.161.53.213

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Featured Home Page Discussion

Most of Your Traffic is Not Human

     
8:40 pm on Jul 6, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8975
votes: 409


It's a disappointing but eye-opening statistic that most of the traffic to our websites is not from actual people. In fact, well over half of our traffic is not human.

Bot traffic is in an uptrend. Most of this is from bad bots, or at least by bots that are not beneficial to our interests (depending on site model.)

Here's the estimated breakdown*:
28% Search Engine & other good bots
10% Scrapers & Downloaders
5% Hacking tools & scripts
1-3% Automated link spam
12% Other impersonators

Analytics & site reporting software is easily fooled by bots masquerading as human. That's not what they are built to do.

*based on 10k daily page loads (YMMV)
12:20 am on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 102
votes: 7


I find this issue of bad bot killing difficult to sell to customers. Most simply shrug shoulders and say I have a solution looking for a problem. The solution often taken is to pay more for more server. Bot ID and killing takes time and is expensive, and bot makers are creative. These companies do not acknowledge there is a problem, even when I point out a blatant bot (4k server hits in 6 hrs! That is not human). Bots are hidden and do their work in the shadows of the interweb, exposed only briefly by forensic log analysis of a select few.

It is great that you can offer web stats to prove that your sites cater to real humans. That is very novel!
12:31 am on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8975
votes: 409


It is great that you can offer web stats to prove that your sites cater to real humans. That is very novel!
Back to my sites for example, I sell ad space. I do allow some beneficial bots access, but potential clients can see stats showing real human traffic with the bots filtered from those stats.

It's true, many have no idea of the amount of non-human traffic and will compare my stats to other sites selling advertising. It's sometimes difficult to convince potential clients that the other sites are counting all traffic and not just humans.
1:45 am on July 10, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Feb 3, 2014
posts: 981
votes: 202


One easy solution is to set up a free cloudflare account, then use the firewall function of cloudflare to issue a captcha challenge to traffic from suspect countries. For me, this has reduced the number of bot hits dramatically and eased server load. Plus the added CDN bonus. Simple and free.
In my case, I only do business with the US, Canada, UK and Australia. Everything else gets challenged. No more bots.
1:51 am on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8975
votes: 409


samwest - that may work well for your specific interests.

However, consider there's a lot of useful information in raw access logs about which bots, from where, asking for what, using what platform, what methods, etc. This can be extremely valuable (for reasons I won't go into.)
4:12 am on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14499
votes: 337


...the collateral issue - it has to slow down things for actual visitors to have all that background noise constantly in the picture is damage enough.


I agree. WordFence is pretty good at setting thresholds for blocking certain kinds of activities related to hacking which helps to reduce server wide slowdowns. However there is much room for improvement.
1:48 pm on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 102
votes: 7


Site owners don't see a bad bot's attack until they crack the site. Performance degradation is slow and unnoticeable, like slowly boiling a frog, so site owners don't notice. Besides, performance degradation can be attributed to the actual site software. When it gets really bad host providers just tell the customer to upgrade their hosting package.

It is like house plumbing. Of course we know that plumbing is important, but out of sight out of mind, most people won't consider a plumbing upgrade until a catastrophe happens.
1:53 pm on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14499
votes: 337


Site owners don't see a bad bot's attack until they crack the site.


They're visible in 404 errors and many other ways.
1:56 pm on July 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 102
votes: 7


My customers would not know an access log if it hit then in the face. They very rarely even look at AWStat, as their eyes glaze over. For day to day operations they never see a 404, much less know what that even means.
6:01 pm on July 10, 2017 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 23, 2015
posts:240
votes: 72


@keyplyr
Example: A large number of botrunners hosting at AWS are marketing companies that gather data rolled into products they sell to help ecommerce clients develop ad campaigns.

If you publish Adsense or other ads you would want your site data included in these products to facilitate ad placement and drive up bidding. This translates to greater income from ad clicks.


I used to rely on Adsense, but has not been since 2013. The writing was on the wall even then and earlier. I am now opposite , I actually SPEND on Adwords. And such, I don't want any products that facilitate bidding UP. Because Adwords ROI is not there right now, let alone when some dudes bid it up with bots for the rest of us who try to make a real living in the real world of commerce. So AWS is blocked.

If publisher is good and relevant to our niche - and that is less than 1% - we have an affiliate program he can sign up for and send us relevant traffic that we can control. That way I don't have to pay Google a significant cut of my revenue.
7:56 pm on July 10, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8975
votes: 409


@smilie - the example about beneficial bots coming from Amazon hosting was more of a generic statement and not so much specific to your current advertising model.

Point is, blocking with a broad brush isn't a good idea. Neither is whitelisting if you don't diligently watch your hourly/daily logs to see exactly who/what is getting blocked by that filter.

I find new agents almost daily, investigate who they are & if there is any bennefit, decide whether to give them access.

@TorontoBoy - you have an opportunity to provide extended site management to your clients. Just have them read this thread :)
9:10 pm on July 10, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13834
votes: 484


I think I'm more lenient than keyplyr ;) except for a couple of specific categories.

Crudest possible test:
--page requests with 200 response vs. page requests getting 4xx response (I can ignore bona fide 404s because they're vastly outweighed by the manual 404 I serve to select robots, to say nothing of robots asking for long-gone pages).
On my teeny-weeny site it averages 2:1 permitted vs. not permitted for pages alone. When you consider that those 200s include authorized robots, that's a heck of a lot of non-humans. But if I look at all requests, not just pages:
--all requests with 200/304 response (to include static files such as images) vs. etcetera-as-above
the balance shifts to 6:1 in favor of authorized requests. That's an important difference if you're talking about overall server load.

On the other hand, some sites have to think about the truly malign robots, where an unintended 200 is only the beginning of your troubles.
11:41 pm on July 10, 2017 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:July 23, 2004
posts:505
votes: 45


I've been preaching about this for years -- I get a lot of newbies that think they're doing great until I point this fact out to them.
Want a ton of traffic? Just tweet something, and here come the bots - Hundreds of them from all over the board - Want even more traffic than that? Just sign up for every other social networking platform and you'll have bots pouring out of your ears.

Social networking is the scourge to any of your metrics regardless of type, style, or flavor.

25% of the net is profitable - The rest is just garbage
5:14 am on July 11, 2017 (gmt 0)

Senior Member

joined:July 29, 2007
posts:1780
votes: 100


My question is: There must be significant money made by deploying so many bots. How are bot writers making so much money, who is paying them and why?


Data collection, plain and simple. You can profit from the data you collect by selling access to it or by creating tools that use it and selling access to that. The worst offenders sell data gathered about your site to the general public, including your competitors. The distinction between "good" vs "bad" bot has been blurred as well. Is a bot that gathers info about your every page good just because it is called Alexa or evil just because it is called by something else? The metric used should be number of visitors generated by allowing a particular bot, if it's zero then you probably should block.

It gets worse when you consider that your site's contents and pages are evaluated without even visiting your site because a cache copy is available elsewhere, it's a bit out of hand in my opinion.

If a bot is not directly involved in driving traffic to my site it's a bad bot, period. Facebook, for example, is pro-actively trying to scan pages to decide if they approve of the morality or political affiliation of a page before allowing it to appear in some areas. This is censorship because it takes the decision making out of their users hands and so I consider all Facebook bots "bad" even though they are listed as good on metrics reports.

I've become heavy handed on this front in fact and only whitelist a handful of search engines. I block a full 35% of the page requests I get at this point which is absurdly high but over 93% of those are not human. Over time you get to know what to look for if you monitor your own server logs and keep records. It's not fun, but unfortunately neccessary.
5:57 pm on July 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 14, 2003
posts: 4308
votes: 35


Careful when you tell suits about this. They do not take kindly to seeing their numbers drop.

They are very happy not knowing this.
11:02 pm on July 17, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8975
votes: 409


Agreed ogletree. Sadly, even with this knowledge, many will just continue to believe their web properties are getting that many visitors.

When promoting their stats, they may feel at a disadvantage using the *real* numbers since their competitors aren't, so the falsehoods go on and on.
4:04 am on July 18, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 102
votes: 7


It seems, to me, like the worse kept secret and internet scam about the internet, when bots are goosing up the numbers. People can really get upset if their AWStat or Google Analytics numbers vary. Are site owners really that delusional, or is this more like an adult version of a popularity contest?

Why do site owners not recognize that if you do not control bots to your site, your site visit numbers might be totally bogus? I am often flabbergasted at their naivety, though I can excuse their ignorance.
This 46 message thread spans 2 pages: 46
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members