Moderators: May really be a "general" question. Your call. Background: Their action:
GET robots.txt
GET front page
GET front page again, with a second user-agent
GET page linked from front page
My action:
Deny from aa.bb.128.0/17
They've been coming around for ages, once a month or so, dismissed as "no skin off my nose". Always ask for robots.txt, front page, and all pages linked directly from front page. Distinguishing trait: picking up a second copy of the front page with a different UA, this one mobile. (Belated thought: If they received different content on that front page, would they ask for duplicates of all pages?)
And your point is...? #1 This is the first time they showed up on my test site. Unlike my real site, this one's robots.txt includes the element
User-Agent: *
Disallow: /
On my real site, the disallowed directories are deeper: at least two steps away from the front page. The robot never gets that far. Here the whole SITE is roboted-out, giving the robot the opportunity to ignore any "keep out" signs.
#2 The offending robot belongs to verisign. But wait! Aren't they supposed to be the good guys? Did I miss a chapter?
The question: Is anyone above the law? Do some automated agents perform a function so important, they're allowed to ignore robots.txt? Maybe not these guys in particular, but
someone. For example, a virus checker isn't much use if it dutifully stays out whenever it meets a "Disallow". (This does not prevent me from blocking AV devices if they annoy me or I suspect they're bogus. But still.)
Punch line: Here's what the link to the inner directory looks like.
.honey {display: none;}
...
<p class = "honey">
<a href = "/directory/"> ...et cetera