Forum Moderators: open

Message Too Old, No Replies

acapbot/0.1

         

keyplyr

9:41 pm on Feb 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




AWS range blocked of course, but this one gave me a chuckle "treat like Googlebot"


72.21.217.64 - - [02/Feb/2015:10:12:52 -0800] "GET /example.html HTTP/1.1" 403 944 "-" "Mozilla/5.0 (compatible;acapbot/0.1;treat like Googlebot)"
72.21.217.64 - - [02/Feb/2015:10:12:52 -0800] "GET /robots.txt HTTP/1.1" 200 1461 "-" "Apache-HttpClient/4.3 (java 1.5)"

I also block "HTTPClient" as well as "java" but allow everything access to robots.txt.

trintragula

11:07 am on Feb 4, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



From that same subnet, I've seen:

Mozilla/5.0 (compatible; AMZNKAssocBot/4.0 +http://affiliate-program.amazon.com)
Apache-HttpClient/4.3 (java 1.5)
Jakarta Commons-HttpClient/3.0
Mozilla/5.0 (compatible;acapbot/0.1;treat like Googlebot)
Mozilla/5.0 (compatible;acapbot/0.1.;treat like Googlebot)


None of them have been caught behaving badly, however.

I've seen the Jakarta and Apache useragents visiting from other ranges, though generally more recently.

lucy24

7:28 pm on Feb 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



treat like Googlebot

According to The Rules*, this should mean that anything in robots.txt pertaining to user-agent "Googlebot" should also be honored by this unwanted visitor. Whether they themselves follow the rule is, of course, a different question.


* As dredged up by phranque or someone like him.

keyplyr

8:11 pm on Feb 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



None of them have been caught behaving badly, however.

IMO hosting at AWS is behaving badly :)