Welcome to WebmasterWorld Guest from 3.227.3.146

Forum Moderators: phranque

Message Too Old, No Replies

Bot Mitigation: Is there a Certification?

     
1:39 pm on Jul 11, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


Bot mitigation, which I define as where one monitors the raw access log/request headers, detects rogue bots, determines if these bots are beneficial to your site and if not beneficial then bans them.

Is there some certification for this, or is it just another part of being a webmaster? This mitigation skill does not seem to be very widespread, or am I incorrect?
5:52 pm on July 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15941
votes: 889


By “certification”, do you mean is it something you can put on your CV and About Us page alongside “SEO Services”? Come to think of it, there’s no formal SEO Expert Certification either, so go ahead and put it on your list. Nobody’s going to refute it.

Idle query: What percentage of
(a) webmasters
(b) website owners
have ever once looked at their
(a) raw access logs
(b) request headers
? Obviously < 50% in all cases. Less than 10%? Less than 2%? (This site isn't set up for surveys, and it would be a non-representative sample anyway.)
6:05 pm on July 11, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


It seems like recruiters need pieces of certification paper to believe a skill. Doing the task and experience is no longer enough. I just thought I would ask.

When i ask other webmasters about their raw access log their eyes glaze over.
6:36 pm on July 11, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4566
votes: 366


That's funny because when I first discovered my raw access logs, they fascinated me. That was many years ago, but I still find that they give me far more actionable information than any other metric. They have devolved so you aren't getting search terms as readily, but back then they really helped me understand what visitors were interested in finding. Then I discovered bots. Just as useful, less pleasant.

I can't imagine who or what would certify that skillset though, since one size does not fit all. Maybe look into security groups?
7:15 pm on July 11, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15941
votes: 889


Well, it's like mail-order diplomas, isn't it. You can always find someone to give you a piece of paper. (“Woo hoo! I passed the w3schools course and I’ve got documentation to say so.”) A few decades back, you could work through the programming manual for the language of your choice and it counted for nothing, but if you took an Elementary Programming class--covering exactly the same material as the 100-page manual that you could read for free--then suddenly you’ve got a Certification.
12:38 am on July 12, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


Certified by whom?

One can claim to be anything. Now if you are thinking about going into business as a certified bot hunter ... that's something else. Good Luck! (and happy hunting!)
12:40 am on July 12, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


Chuckles ... had the though after hitting submit.

We need something like the second amendment: The Right To Hunt and Whack Bots Shall Not Be Infringed.
1:01 am on July 12, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


I get the feeling that companies have little appreciation for the effort and knowledge required to keep these bots down to a dull roar. I feel like an insect exterminator. Are we providing a solution to a problem that need not be fixed?

When I try to explain what and why I do this it is as if I have three heads. Some certification would provide legitimacy for the effort.

I got into this body of knowledge because bots were overwhelming my site and resources, and my host was about to shut me down. Was I just unlucky?
1:12 am on July 12, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


Your "certification" is PROVING to would be clients how much they are LOSING when bots run rampant on their sites. If you have the actual numbers and CAN PROVE them, you have a new business potential.

Sadly, the average stuffed shirt, or college grad, has no clue (or if they do, keep mum about it ... being guilty of the same). Uphill battle selling this to any who do not have their own money invested in the web. Burning OPM is a long standing practice.*

*Other People's Money
1:14 am on July 12, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 413


Would certification from a French educational establishment do ?..If So, I can tell you who to make the money order out to :)

Seriously..if your prospective clients are the sort of people who even though they don't understand it, would pay you just because you had a piece of paper, do you really want them as clients ? at some point they would screw up ( unless they let you have total and sole control of their sites, servers , whatever ) and when they did, you would get the blame as they would not understand what they had done wrong, and it is always easier to blame "the other person" whose work an purpose one does not truly understand.

It sounds like you would really be requiring to offer a service that consisted of "bot proof hosting" combined with server / site management and even some system administration..some hosters already offer similar as part of their "fully managed servers" packages.
1:19 am on July 12, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 413


iBill was working on a product / service that was similar a few years back, don't think he ever brought it to a finished thing though..was "Crawl Wall".
1:25 am on July 12, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


I remember that! Nifty idea ... too many variables to be ironed out to be fully functional. Something for another to consider, eh? (Not me, have more interesting things to do, like making money!)
5:09 pm on July 12, 2018 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1352
votes: 444


What the others have said: +n.

Bot wrangling is a lonely existence far out on the world wide web: get along little botties...

Many/most webdevs and even some ad-type-persons love bots because they greatly inflate 'visitor' numbers and bragging rights, particularly on smaller sites. That many/most of these same webdevs and ad-types then complain vocally: the webdevs because apparent revenue regularly gets clawed back, the ad-types because they don't get desired outcomes is an indication of something or other about human nature.

And, of course, identifying bots is not a simple one size fits all exercise. While blocking malware carriers is pretty much considered a good thing some webdevs (and too many marketers) believe that scrapers are beneficial (of course too many webdevs' sites are 'quality' built on scraped and/or mashed content). At the other end there are the SEs and their various crawlers where only a few webdevs think blocking is a valid proposition. And in the middle are all the media bots determining this, that, and the other for themselves and assorted third party clients. How many of which should be blocked and why?

It all comes down to (1) technical competency (in relatively short supply) and (2) business model (too often MIA). Some simply can not do what they'd prefer, most haven't a clue. When Google Analytics is considered 'the' cutting edge and logfiles are beyond the event horizon of 99% of many/most webdevs then bot wrangling is simply a whole other universe.

Back in the day I thought WordPress was the great demon that let the paint by number crowd loose on the web. Maybe I've grown accustomed but the real great demon is the include a dozen complete frameworks, each needed for one or two simple features, none self-hosted, such that a bloated 2MB site is become a 25MB blivet... Yet this 'standard' of webdev is being taught at colleges and universities around the globe. I shudder at what they might teach regarding bots...
Oh, and as bot wrangling well is a significant competitive advantage I prefer it remain an arcane art rather than a popular science!

Note: I've been selling the value of 'bot-mitigation' for over a decade as part of my presentaions to ad agencies, CMOs, et al. By itself it's just another technical 'thang'; as part of a methodology with guarantees of traffic quality and volume it is a whole other 'thing'. Sell the outcome benefits not the mechanics.
7:22 pm on July 12, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


I don't have as many web clients as I did a few years ago since I'm really not that interested in the work any longer.

Convincing clients that most of that big traffic they love to brag about is actually bots, has always proven to be a futile effort. They just don't want the bubble busted.

Most of Your Traffic is Not Human [webmasterworld.com]

Explaining to clients that bot management is not a 'one time install' that they can buy and then rest easy, is also undoable.

Blocking Methods [webmasterworld.com]

As I said, I really don't want to spend the time grepping through client's server logs every day, so those handful of sites I do work on from time to time, I don't even bother to write any type of security code other than a basic robots.txt.

Access methods, UAs, IP ranges, protocols, server configs... all are in constant flux. What I did 6 months ago is often not applicable today.

Every 90 days I go through 6k IP ranges I have rules for, and validate they are still the same. Easily 18% to 26% have completely changed category and I have to reevaluate how to handle the existing rules for them.

New UAs enter the netscape, older ones vanish. Some get repurposed, some just add or change behavior. About 30% pretend to be human, faking browsers.

Search Engine Spider & User Agent ID Forum [webmasterworld.com]

This is something mandated for my own web properties, but I would never want to do for others.
2:30 pm on July 20, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


I believe bot killing helps my site by reducing unnecessary (my view) and potentially harmful traffic, but if the market is unwilling to pay for such a service, I wonder if the problem is a lack of understanding/communicating the benefits or the lack of need for such a service?

Is my time better spent learning to juggle?
2:58 pm on July 20, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4566
votes: 366


There is likely reluctance to turn over raw server logs to a 3rd party, particularly since the entities that might benefit clearly don't ever look at their own logs and likely do not understand at all what information you can gather from them. Commercial entities might be concerned about protecting PII though logs don't give out names, addresses, etc. because they don't know that if they aren't managing their own logs. You might find bloggers who would give it a try - if it's free for the first report. Other than starting at the bottom, I don't see it being a viable 'profession'. Don't forget that since they don't know exactly what you're doing, any stupid moves they caused would be attributed to your work. :(
3:08 pm on July 20, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 413


Is my time better spent learning to juggle?

Depends on what you are going to juggle, how you are going to juggle,, where you are going to perform, and in front of whom.
Could be highly lucrative ;)
6:53 pm on July 20, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


Depends on what you are going to juggle, how you are going to juggle,, where you are going to perform, and in front of whom.
Could be highly lucrative ;)

Haha, I did not see that coming.

If there are more incidences of cyberattacks [nytimes.com ] should not there be more need for mitigation? Or maybe detecting and blocking bots does not slow down or stop these attacks?

This and other cyberattacks should have left an audit trail of reconnaissance, or looking for software and vulnerabilities, should it not? This reconnaissance should have been caught in the raw access log?
8:26 pm on July 20, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15941
votes: 889


Is my time better spent learning to juggle?
Depends what you mean by “better”, doesn’t it. Acquiring a new skill is never a bad thing.

:: vague mental association with Lord Vetinari ::
8:58 pm on July 20, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


Juggle all you like, just keep whackin' the moles! Can there be a biz? Probably. Sort of like protection rackets as there is no "product", but what the hey? If one has the time one can certainly explore the possibilities!
9:04 pm on July 20, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3514
votes: 87


Is my time better spent learning to juggle?


OT but funnily enough learning to juggle three balls is relatively easy, i do it often and i find it very theraputic!
9:15 pm on July 20, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 413


^^^ Indeed, it induces a sort of "zen*" ..whilst juggling**, the "maya" is muted.

*Short word for unfocussed focussing away from distractions.

**There are other ways to mute or avoid / nullify maya.
12:00 am on July 22, 2018 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1352
votes: 444



Is there some certification for this, or is it just another part of being a webmaster? This mitigation skill does not seem to be very widespread, or am I incorrect?

It's just another part of being a webmaster.
It is not even narrowly spread, imo; more like waved vaguely in the general direction of...


...if the market is unwilling to pay for such a service, I wonder if the problem is a lack of understanding/communicating the benefits or the lack of need for such a service?

1. many webdevs are not only unwilling but often unable to pay for pretty much any service.

2. many webdevs are technically incapable of much beyond cut-n-paste 'coding'.

3. webdev denial about traffic quality (including percentage of bots) is as great as that about content quality.

4. given the ad premium I can charge for 'clean' traffic some advertisers are understanding, however I've been banging the drum for over a decade now and it ofttimes felt - feels - like I am beating my head against a brick wall.

I still get told that 'they' can get ad space far cheaper just about everywhere else - the ad variation of my nephew can build a site for far less... too many chase price rather than value, which means ipso facto that they don't get, refuse to see benefit. In all my years of business, both B&M and online I've always been amazed at how many willfully ignorant, niche/business incompetent folk manage to make a success of it.

As mentioned previously Bill Atchison aka incredibill, who recently passed, had been working, years ago, on a commercial bot defence program. Unfortunately, ill health put an end to it's development. Given my experiences developing and maintaining one for my sites (with significant in/direct help from Bill) I believe that the combination of effective, simple, easy, customisable, and inexpensive may be several goals too far.


Are we providing a solution to a problem that need not be fixed?

These aren't the droids we're looking for.
You can go about your business.
Move along... move along.


What happens when the vast majority of sites' traffic is dirty is that advertisers adjust their pricing accordingly. A quick rule of thumb for most SME sites is that at least half their traffic is bots. It can exceed 90%. And that is why most third party content/display ads are so cheap. The marketplace has discounted them. So, if you are wholly invested in third party ad network revenue going to the trouble of 'cleaning' out the bots may have a negative ROI. Your traffic is a better deal but unless that becomes known and advertisers whitelist your site your revenue per remains bog standard but now with less traffic so overall down accordingly.

If you are doing affiliate marketing and/or you are or plan to sell ad space directly then it is well worth doing. Another benefit feature. That needs to be demonstrated and sold.

All that aside, bouncing bots has, as also previously mentioned, the benefits of lowering server connection and bandwidth overhead, decreasing potential scraping, malware infestation, and other nasties. Basically it's the online equivalent of mitigating graffiti, vandalism, shoplifting, robbery... nah, no reason to do any of that!


Is my time better spent learning to juggle?

Once a webdev always a juggler.
1:36 am on July 22, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


Thanks for the unvarnished truth.

I am hoping that, with more cyberattacks occurring that bot mitigation will become more appreciated. I believe that with regular scrutiny of bot activity that we can detect and thwart these attacks. Am I wrong?

It is clear that site owners love the increased unnecessary activity and risk that bots generate, fake or not. Some people prefer the pouffy skirt, even though it is filled with just air. To each her own.
3:56 am on July 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10599
votes: 1127


One reason why this forum exists: [webmasterworld.com...]

I bash bots for much the same reason why I wash after visiting the necessary. It's just the right thing to do.
2:14 pm on July 22, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


I bash bots for much the same reason why I wash after visiting the necessary. It's just the right thing to do.

Obviously then most people do not wash their hands! Don't touch anything, or you might get a ...virus?

Biologically, viruses will grow and attempt to overtake the body (septicaemia), or the body's immune system will grow (white blood cells) and kill the virus. I don't as yet see the white blood cell growth.
5:18 pm on July 22, 2018 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1352
votes: 444


The uncomfortable truth is that aside from a few web publishers no one wants viable bot 'defences'. 20-25% of total web traffic is so-called 'good' bots, which definition typically means SE bots, site monitors, feed fetchers, marketing tool crawlers et al. If there was a practical 'off the shelf' bot defence solution à la Google Analytics how many of those data mining self declared 'good' bots would be blocked?

We know that half, plus or minus, of bots are openly 'bad',
* yet the leader in all things web, Google, hasn't come out with a solution. Is it because:
1. a viable solution is beyond their technical capability?
or
2. a viable solution and their business model are incompatible?

* yet the various webdev tool companies talk all around the subject so that the fact they never offer a proactive solution is hidden by the info noise. Is it because:
1. a viable solution is beyond their technical capability?
or
2. a viable solution and their business model are incompatible?

Remember the ad impressions scandal of 2014 that resulted in impressions aka ads 'served' getting a new sibling the half an ad seen for 1-second to be billable aka 'viewable' impression? It was an open secret that no one who was cleaning up (myself included) spoke about publicly - apparently the broader webdev community was oblivious? I certainly spoke about it privately in my pitches to sell direct ad space, while ensuring that my 'CTR' ads were high on the page and 'CPM' ads were down low where they still brought home the bacon.

Most webdevs are oblivious, a huge step below ignorant, where it's not just that they don't know but, worse, they don't know they don't know important, even critical, information about their revenue sources. Yet, as with impressions it wasn't secret per se, the information wasn't hidden away simply not explicitly laid out and labelled, which is, regrettably, a requirement for too many. It was after all a huge scam including not just the ad networks but ad agencies and all sorts of middlemen agencies as well as publishers, aware or not, skimming it in.

If bot defences are to be made real beyond an individual's custom efforts or those of CDNs/hosts/ISPs (who may have selective ulterior motives) it will take an event such as Kraft's Julie Fleischer's overturning of the impressions apple cart. Humans are great at (minimal) correcting behaviour after a cataclysmic event, few can be bothered prior.

It is difficult to get a man to understand something, when his salary depends on his not understanding it.
---Upton Sinclair
5:50 pm on July 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15941
votes: 889


And then there’s the extra complication: The overwhelming majority of site owners genuinely don’t understand the difference between a GA “visit” report and an actual request to their site. It’s easier if analytics lives on a site you control. If you request a page, piwik and nothing else, with no evidence that you were here the day before yesterday so everything else is still cached, you haven’t fooled me. (Besides, you’ve most likely been blocked in the first place, so there’s no bogus “visit” to display.) Do potential advertisers who insist on seeing GA figures understand this? Probably not.

:: idly wondering if anyone has ever tried running GA in parallel with locally-hosted analytics in order to compare the two sets of putative figures ::
7:01 pm on July 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


has ever tried running GA in parallel with locally-hosted analytics in order to compare the two sets of putative figures
Yes and I posted numerous times how much GA misses and misinterprets. That's why I stopped using GA several years ago, that and the added page load.

If you want the truth, use the server logs.
7:23 pm on July 22, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 579
votes: 60


The corollary of this issue is, could you make more money as a bot runner? It seems like the ad industry, Google, and site owners all really encourage their inflated GA stats, and few of them encourage bot killing and the truth about actual human interactions.

dearth of humans -> cannot find enough humans -> how to create humans? -> bots
This 32 message thread spans 2 pages: 32