| 2:43 am on Nov 9, 2010 (gmt 0)|
NetcraftSurveyAgent has been around for a few years at least, originally hailing from lager.netcraft.com using the same UA and typically HEAD-requesting:
That's path's above the webspace but the file is accessible (and the URI not easily blocked, imho), because the file's an Apache OS image. I think it's really, really sneaky Netcraft probes that way.
Also, Netcraft's been bot-running from AWS since, oh, early this year. Regardless of host, it never asks for robots.txt. Then again, the two robots.txt files they use (search their site for: robots.txt) are syntactically incorrect/ineffectual.
[edited by: Pfui at 2:45 am (utc) on Nov 9, 2010]
| 2:45 am on Nov 9, 2010 (gmt 0)|
AWS-based bot-runners are so clever. Not.
lqqBithnar0 qlmqd yhyc
| 11:14 pm on Nov 9, 2010 (gmt 0)|
Three seconds, three HEAD request hits, zero robots.txt:
All using the site-hosted-on-Amazon (220.127.116.11), social news reader/story aggregator, currently-in-private-beta app:
Summify (Summify/1.0; +http://summify.com)
| 11:50 am on Nov 27, 2010 (gmt 0)|
Mozilla/5.0 (compatible; PaperLiBot/2.1)
| 2:01 pm on Dec 1, 2010 (gmt 0)|
A Chinese bot by any other name...
vik-robot/Nutch-1.0 (vikspider; http://vik.com; email@example.com)
robots.txt? Yes, but after hitting root.
Previously (in this thread; mssg #4052607 by dstiles; 01-2010):
Chen Li/Nutch-1.0 (Nutch spiderman; http://chenli.com.cn; firstname.lastname@example.org)
| 10:03 pm on Dec 24, 2010 (gmt 0)|
New (to me) AWS range at RIPE:
18.104.22.168 - 22.214.171.124
First hit from it today pretending to be Moz/4 MSIE 7 on XP with bad headers.
| 8:27 pm on Dec 25, 2010 (gmt 0)|
+ 126.96.36.199 - 188.8.131.52
Full Amazon range: 46.137/16
| 8:53 pm on Dec 27, 2010 (gmt 0)|
How did I miss that one!
Actually, I'll now block the full /16 as 192/18 is Amazon Data Services Ireland. Must have had an off-day before. :)
| 1:46 am on Dec 28, 2010 (gmt 0)|
Thanks for the Amazon Ireland info :)
| 10:44 pm on Feb 5, 2011 (gmt 0)|
Another Irish Amazon range. This one not listed as AWS/Cloud but the same email domain.
184.108.40.206 - 220.127.116.11
| 4:37 pm on Feb 20, 2011 (gmt 0)|
Fresh scraper UA: Qryos
inetnum: 18.104.22.168 - 22.214.171.124
descr: Amazon Web Services, Elastic Compute Cloud, EC2, SG
| 8:11 pm on Feb 20, 2011 (gmt 0)|
126.96.36.199 - 188.8.131.52
DNS Stuff doesn't give the CIDR on this range, but I'm assuming it is:
| 10:32 pm on Feb 20, 2011 (gmt 0)|
Thanks for the heads-up - first Asian AWS I've got! :)
The full range is actually 122.248.192 - 184.108.40.206
| 9:30 pm on Feb 22, 2011 (gmt 0)|
|The full range is actually 122.248.192 - 220.127.116.11 |
OK then: deny from 18.104.22.168/18
| 4:26 pm on Mar 8, 2011 (gmt 0)|
Another range, this time Singapore (compiled from 4 Whois records):
inetnum: 22.214.171.124 - 126.96.36.199
descr: Amazon Web Services, Elastic Compute Cloud, EC2, SG
remarks: The activity you have detected originates from a dynamic hosting environment.
| 2:18 am on Apr 23, 2011 (gmt 0)|
Ban them as amazonaws.com via htaccess and be done with it.
| 8:12 pm on Apr 23, 2011 (gmt 0)|
On an old Windows 2003 system?
MSDOS was deisgned on the principle: take the best from Linux/Unix; take the best from CPM; throw all that away; now make it up as you go along. It's taken over 30 years for MS to add something even close to rewrite. Sadly, in my early web days I listened to a Microsoft-qualified guy and left Linux. I have too much ASP library code now to return to a linux server.
I wouldn't be sure they would always show up as amazonaws.com anyway.
| 3:44 am on Apr 28, 2011 (gmt 0)|
Here's teh most recent Amazon Ranges:
|Dear Amazon EC2 customer, |
We are pleased to announce that as part of our ongoing expansion, we have added a new public IP range. The current Amazon EC2 public address ranges are:
US East (Northern Virginia):
188.8.131.52/20 (184.108.40.206 - 220.127.116.11)
18.104.22.168/19 (22.214.171.124 - 126.96.36.199)
188.8.131.52/18 (184.108.40.206 - 220.127.116.11)
18.104.22.168/17 (22.214.171.124 - 126.96.36.199)
188.8.131.52/16 (184.108.40.206 - 220.127.116.11)
18.104.22.168/18 (22.214.171.124 - 126.96.36.199)
188.8.131.52/16 (184.108.40.206 – 220.127.116.11)
18.104.22.168/17 (22.214.171.124 - 126.96.36.199)
188.8.131.52/18 (184.108.40.206 - 220.127.116.11)
18.104.22.168/15 (22.214.171.124 - 126.96.36.199)
188.8.131.52/16 (184.108.40.206 - 220.127.116.11)
US West (Northern California):
18.104.22.168/18 (22.214.171.124 - 126.96.36.199)
188.8.131.52/18 (184.108.40.206 – 220.127.116.11)
18.104.22.168/17 (22.214.171.124 - 126.96.36.199)
188.8.131.52/17 (184.108.40.206 - 220.127.116.11)
18.104.22.168/18 (22.214.171.124 - 126.96.36.199)
188.8.131.52/20 (184.108.40.206 - 220.127.116.11)
18.104.22.168/17 (22.214.171.124 - 126.96.36.199)
Asia Pacific (Singapore)
188.8.131.52/18 (184.108.40.206 - 220.127.116.11)
18.104.22.168/18 (22.214.171.124 - 126.96.36.199)
Asia Pacific (Tokyo)
188.8.131.52/18 (184.108.40.206 - 220.127.116.11)
18.104.22.168/19 (22.214.171.124 - 126.96.36.199)NEW
| 7:37 pm on Apr 28, 2011 (gmt 0)|
Got all of those (he said smugly!) :)
I've also lumped in 188.8.131.52 - 184.108.40.206 (Ireland data services unspecified) with the AWS. I don't need them calling.
| 7:51 pm on Apr 28, 2011 (gmt 0)|
I knew you did dstiles (she grinned wickedly)
| 3:19 pm on May 1, 2011 (gmt 0)|
The following hit my site today but although its range belongs to Amazon it is apparently not AWS. I blocked the range anyway.
UA: Jakarta Commons-HttpClient/3.0
Blocked range: 220.127.116.11 - 18.104.22.168 (22.214.171.124/19)
| 5:16 am on May 2, 2011 (gmt 0)|
Jakarta Commons-HttpClient/3.0 is often used to check link validity. Amazon may be verifying links in their A9 index. Just a thought.
| 4:55 pm on May 2, 2011 (gmt 0)|
Could be, but I block that UA anyway (jakarta AND HttpClient), plus most "random" link checkers.
| 11:23 am on May 25, 2011 (gmt 0)|
50.19.134.zz - - [25/May/2011:04:59:21 -0600] "GET / HTTP/1.0" 403 316 "Mysite/MyPage" "Firefox/2.0"
| 9:19 pm on May 25, 2011 (gmt 0)|
126.96.36.199 - 188.8.131.52 (50.16/14)
Firefox 2 is so old it's a danger to everyone, even were it genuine.
| 6:47 pm on Jun 2, 2011 (gmt 0)|
This is a fascinating topic! Thank you Pfui for starting it.
I've been looking at these cloud-based bots for awhile now and the only reason I haven't yet pulled the plug on anything hosted at AWS is because I'm trying to think of any legitimate use of it by someone I'd be interested in (an actual human, Googlebot, msnbot, Slurp).
The only legit use I can think of is that there should be (a bunch of) corporate VPNs with cloud-based access points which may actually have real users behind them.
Am I missing any other legitimate cloud-computing based traffic?
So, do you guys just disallow any IP that belongs to AWS (I am tempted to, to be honest) or do you determine based on IP/behavior? I can think of a situation where Amazon simply recycles IPs after a virtual machine is shut down and so there must be quite a churn of IPs in this huge system.
What's the best way to disallow such a huge swath, anyway : at the firewall level, Apache config or .htaccess ?
| 10:01 pm on Jun 2, 2011 (gmt 0)|
I (and many others) block ALL AWS ranges - in fact I go further and block all Amazon ranges, AWS or not. And any other cloud I can detect (there are quite a few - MS, Google, various ISPs...
If you have a suitable firewall you could use that. If you have an apache server then use .htaccess (I think - I'm not a linux server person). There are others who are better qualified to answer that and I think it's been answered a few times hereabouts.
| 5:29 am on Jun 3, 2011 (gmt 0)|
|What's the best way to disallow such a huge swath, |
It may appear to be large range of IP's, however considering the vastness of the overall www, AWS is a small-fry.
| 4:00 pm on Jun 13, 2011 (gmt 0)|
Just passing by to report yet another --
Mozilla/5.0 (compatible; q1; +http://www.qleeq.com; email@example.com)
| 8:43 pm on Jun 13, 2011 (gmt 0)|
Good to see you around again, pfui!
174.129/16 - block. :)
| 12:03 am on Jun 14, 2011 (gmt 0)|
@ dstiles Just a FYI concerning the IP rage config: 174.129/16
That does not work on all unix/apache set-ups. Out of the 3 hosted servers I use, 2 I must write it like this: 184.108.40.206/16.
Just though I'd post this for those who mistakenly cut'n paste without doing their research.
And agreed, good to see you around again, pfui.