Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Traffic + Bots = Smoke & Mirrors

         

iamlost

11:50 pm on Nov 26, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do like to set the cat among the pigeons...
So...
Double, double toil and trouble;
Fire burn and caldron bubble.

Here goes...

One (of many) dirty little secret of webdev is traffic is much less human much more bot than many/most know or like to admit. The ad networks admit to 20%, a number of academic studies report over 50%.

Further, those studies consistently indicate that as a percentage of site traffic the smaller the site the greater the bot to human ratio, the larger the site the less; graphed from small to large sites the bot percentage drops from over 80% to under 20%.

Similarly the smaller the site typically the less bot aware and less mitigation capable the webdev.
Note: inexpensive web hosts typically only block half and the rest miss a quarter.

Note: the current 4-generations of bots:
1. simple scripts; unable handle cookies, JavaScript.
2. headless browsers, i.e. PhantomJS; can handle cookies, execute JS.
3. full browsers; able simulate human-like behaviours.
4. full browsers; advanced human-like behaviours, random UA, IP.
---may be rotating highjacked computers.
Note: very few methodologies notice this type.

With this as background let's run a scenario: a small site, 100 uniques a day, wholly Google organic.

100 visitors, half are unrecognized bots;
50 visitors, 5% conversion rate;
2.5 conversions a day on average.

Of course on average is not a constant but an statistical calculation over time.

In reality the unrecognised bot percentage fluctuates, the conversion quality of the human traffic fluctuates, etc. So while the monthly number of conversions may stay within a band, i.e. 2.5 * 30 = 75+/-, the daily numbers are susceptible to significant variation.

With an exponentially larger site, 1000 uniques a day, the susceptibility to variation is less in your face:
1000 visitors, half unrecognised bots;
500 visitors, 5% conversion rate;
25 conversions a day on average.

If the first site drops half its conversions: 2 to 3 drops to 0 to 1, which is much more apparent than the second suffering similarly: 25 dropping to 12. Add in simple statistical variation the first is far more likely to see periods of none than the second.

The less trafficked the site plus the greater impact of unrecognised bots on smaller sites the more many reported anomalies may appear. And, as with the Google sandbox of fifteen years ago, not actually exist being but an artifact of other occurrences.

aristotle

1:57 am on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



4. full browsers; advanced human-like behaviours, random UA, IP.
---may be rotating highjacked computers.

Looks like what I like to call "slow-motion botnet activity" . I see at least one of these in the logs of most of my sites. They stay around for years. I've never been able to figure out their purpose. The first time I recognized one, I was afraid that it was about to be used for an impending DDOS attack. But they just hang around doing nothing except using up a little bandwidth and taking up space in the logs.

tangor

3:17 am on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, small sites are getting hammered. Have one where 34.7% are obviously human and 64.7% are obviously bots ... and 0.5% are ambiguous ...

Had a heck of a time dealing with clients who looked at the raw logs I produced (miles long) and then looked at the summary reports (one page each). Always got the question "How is that possible? Look at the numbers!"

Every month I had to explain things all over again. "The index page alone has 6 images and the page... that's seven, and 'this many' viewed it last month, and of those, 'that many' were robots."

Sigh. Save us from the bean counters!

not2easy

3:37 am on Nov 27, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Oh, if my traffic of human visitors would increase proportionally to the increase I can see in non-human traffic, I would be much better off. For sure. Bot traffic grows aggressively, human traffic by small steps. I can attest that your 50% attribution to bots is conservative.

JesterMagic

12:02 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting how bot traffic is more prevalent on small sites. I guess that is because these bot networks are not targeting a specific site but just travelling the net randomly and have a better chance of fining small sites.

What are the purposes of these bots? I figure most must be scrapers of content..

My site doesn't run any PPC so that is not a reason to visit mine to artificially produce clicks that cost/generate money for others.

Other reasons I find include Referral spam, and looking for security holes (usually easy to spot in logs)

brotherhood of LAN

12:16 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>What are the purposes of these bots?

Entirely possible it's competition thinking user signals could potentially affect your site.

Could also be bots engaging in fraud; attempting to appear like legitimate traffic to all the tracking methods across many sites by visiting neutral places. Just ideas.

goodroi

1:48 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Countless purposes. Here are only some of the reasons...
- looking to steal content
- looking for illegal use of trademarks/copyrights so they can sue you
- looking for site vulnerabilities to hack you
- looking for contact forms to spam you
- college kids experimenting on crawling
- market research being conducted by professionals
- looking to scrape contacts for lead generation
- hosting company looking for tos violations
- seos trying to reverse engineer your site
- webmasters trying to sabotage the analytics of the competition
- uptime monitoring service

Plus many, many other reasons.

not2easy

3:21 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Great list there goodroi! This one - looking for site vulnerabilities to hack you might be no surprise to WP webmasters, but the number of requests I get on plain old html sites for "/wp-login.php" and "/xmlrpc.php" by scripted (humanoid UA) bots is amazing. They don't even target WP sites.

That xmlrpc.php file (in case it is unfamiliar) is a default WP file that offers a method to post remotely to the site. It would need to be set up by the site owner and requires a separate login. With scripts and lazy passwords, they must find success or they would not be out in such numbers.

aristotle

4:39 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



goodroi - excellent list
If I remember correctly, one of the early types was called "email harvesters", which looked for email addresses for mailing lists. There may also be some that look for phone numbers.

engine

4:57 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Great list, goodroi, thanks.
Back in the day, log spamming was rife, and i've still seen a few even today, but not so many.

I suspect the number involved in looking for vulnerabilities or content theft to carry out some kind of nefarious activity has increased over the years. It's just a suspicion because it's very difficult to trace to the exact purpose, even if you can trace culprit.

lucy24

6:49 pm on Nov 27, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



bot traffic is more prevalent on small sites
It’s not more prevalent in absolute terms--far from it. It represents a higher proportion of traffic because the number of robots visiting a site is comparatively constant, while the number of humans can vary by many orders of magnitude.

Tiny site: 10 robots : 10 humans in the course of a day
Small site: 20 robots : 100 humans during the same day
and so on up the list. The smaller the site, the greater the robot : human ratio.

Back in the day, log spamming was rife, and i've still seen a few even today
In the case of referer spam, most have shifted over to spamming analytics, since it's less work for the robot with the same (or better) end result. In fact, a fair number of people who ought to know better don’t even understand the concept of referer spam as it applies to raw logs; they think the term is defined as an analytics activity.

Edit: I’m currently keeping extra close track of logs as part of an HTTPS move. One thing I’ve discovered is that, with time, most redirects apply only to refererless human visits. Excluding ones that I already know about, or that are unambiguously human, these redirected to-all-appearances humans tend to come in without any cookies. (Repeat visitors would have something involving piwik.) Since it pushes credibility to suggest that everyone with a page bookmarked--the likeliest explanation for referless requests--is also blocking cookies, this leads to the suspicion that some of those apparent humans are, in fact, very talented robots. Hmph.