| 9:04 pm on Jan 26, 2013 (gmt 0)|
I have blocked all UAs containing "nutch" for well over 10 years without any adverse affect.
This generic bot can be used by any unaccountable agent for any unknown purpose, and the accountable ones should customize and rename so their bot UA reflects they are on the level IMO.
| 5:14 pm on Jan 31, 2013 (gmt 0)|
|Note to self... if I ever write a scraper, name it after something universally popular, like a record-breaking Ferrari. |
Scrapers are one thing but when RDNS points to that...
And as always: AS21844 220.127.116.11/15 ThePlanet.com Internet Services, Inc.
| 6:33 am on Feb 6, 2013 (gmt 0)|
Mozilla/5.0 (Windows;) NimbleCrawler 1.12 obeys UserAgent NimbleCrawler For problems contact: crawler@health
Mmmwell... For a given definition of "nimble", anyway ;)
| 2:38 am on Feb 9, 2013 (gmt 0)|
Verbatim-- or rather, litteratim-- again:
18.104.22.168 - - [08/Feb/2013:00:39:40 -0800] "GET /robots.txt HTTP/1.0" 200 1005 "-" "Web front page analyser. robots.txt complaint (email@example.com)"
I can't decide whether I do, or do not, want that to be a typo :(
| 10:34 pm on Feb 9, 2013 (gmt 0)|
Well, 204.236.128/17 is amazon aws and anything with a gmail address is automatically suspicious in my book... Kill it. :)
| 11:09 pm on Feb 9, 2013 (gmt 0)|
|Well, 204.236.128/17 is amazon aws and anything with a gmail address is automatically suspicious in my book... Kill it. :) |
It requested robots.txt. I allow *almost* everything to get robots.txt, even the Amazon ranges so when I looked at this post yesterday, I figured she did also.
| 10:09 am on Feb 10, 2013 (gmt 0)|
The question is academic, because it didn't ask for anything else after robots.txt. (I checked. I do have the range blocked.) And I didn't hear any complaints about it either.
| 7:34 pm on Feb 10, 2013 (gmt 0)|
Print and frame it! An AWS bot that obeys robots.txt! :)
| 12:56 am on Feb 11, 2013 (gmt 0)|
|Print and frame it! An AWS bot that obeys robots.txt! :) |
Chances are that my cat's talking to me will make sense, which to this days sounds MYAU to me.
BTW, Have anybody heard of reliable myau translator web service?
| 9:51 pm on May 12, 2013 (gmt 0)|
22.214.171.124 - - [12/May/2013:07:39:51 -0700] "GET /hovercraft/images/wormapple.jpg HTTP/1.1" 200 32565 "-" "rarely used"
I expect this is perfectly true.
:: detour to raw logs ::
126.96.36.199 - - [11/May/2013:19:20:16 -0700] "GET /rats/images/ourhouse/LivRm5.jpg HTTP/1.1" 301 600 "-" "rarely used"
188.8.131.52 - - [11/May/2013:19:20:16 -0700] "GET /boilerplate/sorry.html HTTP/1.1" 200 1441 "-" "rarely used"
Huh. Fancy that.
:: detour to confirm hunch that these are Ukrainian IPs ::
Nope. They're not even the same country. What gives?
| 7:37 pm on May 13, 2013 (gmt 0)|
184.108.40.206 is vodafone Ireland.
220.127.116.11 is SuddenLink US - all 75.n.n.n are (basically) Arin (USA, Canada etc).
So likely compromised machines on DSL lines running a scan of some kind.
| 9:04 pm on May 13, 2013 (gmt 0)|
Yeah, the 75.108 threw me because I personally know people there; it's one of the local ISPs. But the UA is, uhm, rarely seen ;)
| 4:20 am on Jun 4, 2013 (gmt 0)|
Oi! Moderators! This was supposed to be the kitchen-sink thread. How 'bout you restore the name and kick it down to Foo instead? :)
Happened to pull the Error Log just now to check something else, and met:
[Mon Jun 03 21:10:54 2013] [error] [client 18.104.22.168] client denied by server configuration: /home/user/example.com/, referer: http://r-e-f-e-r-e-r.com/
That about sums it up. (It's a Latvian robot, btw. Lives at 22.214.171.124/21)
| 11:41 am on Jun 4, 2013 (gmt 0)|
You should be particularly wary of any UNTRUSTED Nokia agent coming from cell phone proxies in India, China, or the Middle East. The Iranians always use this agent to hand check sites in their system. It means your site has been targeted for malicious activity. Most likely from operations in Europe.
Block UNTRUSTED and they'll be forced to proxy in through opera-mini. Then you can catch them red handed using X-Forwarded.
| This 44 message thread spans 2 pages: < < 44 ( 1  ) |