Forum Moderators: open
UA: Healthbot/Health_and_Longevity_Project_(HealthHaven.com)
IP: 98.165.214.nnn (dynamic Cox USA)
Robots.txt: No
Still trying to discover how it got in, since the headers were blank.
Not nice!
Distributed bot for anyone to run, with financial incentive.
I previously had it in a whitelist from previous experience and sympathies, which is how it got in. After it ignored robots.txt today and turns out to be a potential grub-type bot it's now banned.
The bot isn't high-profile enough to be used for spoofing and it's fairly easy to spot in the logs, with that unusual UA.
As to cox limiting their users' traffic - maybe they do, but I see a lot of trapped cox IPs from US and Canada.
In a nutshell, since the majority of legit UAs begin with "Mozilla," 403 all UAs that don't. Then selectively whitelist your choice of okay bots/hosts/UAs with names beginning with other than Mozilla. For example.:
RewriteCond %{HTTP_USER_AGENT} !^Google-Sitemaps
RewriteCond %{HTTP_USER_AGENT} !^Googlebot
It's still a lot of work weeding out the bots hiding behind/after ^Mozilla. But it's nice knowing you're preventatively protected from the likes of Healthbot, Java, VRTServers' triplet _viewer bots, and literally scores and scores of bad, non-Mozilla UAs.
The problem anyway isn't non-mozilla UAs but mozilla UAs that are really scrapers, injectors and similar malevolent swine. My trap caters to these as well as to badly behaved bots such as this one which, as I said, I originally decided was a good one.
In a nutshell, since the majority of legit UAs begin with "Mozilla," 403 all UAs that don't. Then selectively whitelist your choice of okay bots/hosts/UAs with names beginning with other than Mozilla. For example.:RewriteCond %{HTTP_USER_AGENT} !^Google-Sitemaps
RewriteCond %{HTTP_USER_AGENT} !^Googlebot
Pfui,
Are you using IP's for the 2nd condition or rather UA's ?does it look something like:
RewriteCond %{HTTP_USER_AGENT} ^name
RewriteCond %{HTTP_USER_AGENT} !^Google-Sitemaps
RewriteRule .* - [F]
TIA