Forum Moderators: open

Message Too Old, No Replies

Amazon AWS Hosts Bad Bots

Continuation Thread

         

incrediBILL

11:16 pm on May 16, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a continuation from the previous thread:
[webmasterworld.com...]

Post about spiders coming from Amazon's AWS hosting.

keyplyr

4:42 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One reason the referrer is important is it's a pretty good indicator the hit came from FB and not another origin (even though referrer can be spoofed.)

According to my logs, in the last 10 days deny from 54.144/12 has blocked a huge number of FB app iPhone users. I'm putting in a rule allowing for the app in this range.

trintragula

8:51 am on Apr 13, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Are they sending any proxy headers?
I'm seeing mostly Silk, some gnip and some squid proxies from within AWS.

I've also let through traffic from MOBICIP - a cloud-based web-filtering app for ipads and iphones that has also appeared from that range. One feature of mobicip is that the cloud service sends XForwardedFor: 127.0.0.1. Unfortunately it doesn't send the actual client IP or a Via header. Here's an example UA:
Mozilla/5.0 (iPad; CPU OS 8_1_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12B440 Version/8.1 Safari/8536.25 Mobicip/NNNNNNNNN
I've obfuscated the trailing number in case it's a customer number or something identifying.

Just a heads-up for those less conservative - there are still plenty of things to block in 54.144/12:
54.144.63.nnn
Sun, 12 Apr 2015 16:21:00 GMT
Mozilla/5.0 (compatible; linkdexbot/2.2; +http: //www.linkdex.com/bots/)

I've also seen recently:
Mozilla/5.0 (compatible; linkdexbot/2.2; +http: //www.linkdex.com/bots/)
Porkbun/Mustache (Website Analysis; http: //porkbun.com; tech@porkbun.com)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http: //flipboard.com/browserproxy)
Mozilla/5.0 (compatible; linkdexbot/2.0; +http: //www.linkdex.com/bots/)
Mozilla/5.0 (compatible; DomainAppender /1.0; +http: //www.profound.net/domainappender)
Pinterest/0.1 +http: //pinterest.com/
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-PT; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
(blank useragent)

keyplyr

10:04 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@trintragula, thanks for the heads-up on Mobicip. I added them to my allowed filter :)

Hits from that FB guy came from all over 54.144/12 during later sessions. Since these AWS client account ranges are acutely dynamic, I suspect my filter will be changing a lot as well, adding more ranges, UAs & referrers.

That's nice. I was getting a bit bored with the other 50 things I had going on at the moment.

dstiles

8:01 pm on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've added part of the relevant UAs to my own amazon range for 54.144 (and will, I suppose, have to extend this to other ranges) BUT this seems a very dangerous allowance. It's easy to forge a UA and all the baddies using amazon are doubtless aware of that. Does anyone have a "safe" way of allowing in real people from this stupid service?

I thought google "utility" IP ranges were bad enough but clouds that allow anything to be on any IP is simply encouraging criminal activity. Surely a few small IP ranges could have been set aside from the millions of IPs amazon owns, as they did for nokia?

trintragula

10:31 pm on Apr 13, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I would do some analysis of the traffic that would be blocked by your other methods if you took down the AWS IP range blocks - it may be better than you think.
I've been doing the same thing in reverse: using collected IP ranges to evaluate and tweak the methods I'm using.

keyplyr

2:37 am on Apr 18, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




@trintragula - to get a little more info on UAs used by Mobicip, I emailed them. This is their reply:
Thank you for getting in touch with us. The following is the User Agent that Mobicip's iOS and Android browsers use.
Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B141

They do not mention the UA you posted which includes "Mobicip."

I replied to them that it would help webmasters allow for their scans if the UA would always include "Mobicip"

trintragula

10:19 am on Apr 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Perhaps one day we'll at least get the good guys to get this right. (yeah, I know, dream on...)

keyplyr

11:43 am on Apr 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RE: Mobicip

Received another email reply to my inquiry regarding UA:
We apologize for the delay in getting back. Our dev team checked and confirmed that the User Agent for Mobicip iOS contains "Mobicip/".
This will be our sample User Agent : "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_3 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Mobile/11B507 Version/7.0 Safari/8536.25 Mobicip/2029575776"

Additionally, we send "X-Forwarded-For" header with the actual user's IP address.

Please let us know if this answers your question. Otherwise we will be happy to assist further.

So they either use more than one UA string or the earlier reply was from someone who did not have the correct info - or - only the iOS contains Mobicpic and the Android does not :)

trintragula

4:59 pm on Apr 29, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



So far I've only seen the one Mobicip visitor since I've been recording proxy headers. That one didn't actually send the client ip, but then with cloud you never really know who you're talking to...
I've seen iPad and iPhone variants, but not any Android yet.
I'll keep my eyes open.

trintragula

8:33 pm on May 2, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



While researching AWS recently, I found this:
[ip-ranges.amazonaws.com...]

which is described here:
[docs.aws.amazon.com...]

keyplyr

9:57 am on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Guess it was only a matter of time before the amazing superhero spun his web my way:

54.159.114.117 - - [05/May/2015:13:08:59 -0700] "GET /robots.txt HTTP/1.1" 200 1511 "-" "SpiderMan/1.0"
54.159.114.117 - - [05/May/2015:13:08:59 -0700] "GET / HTTP/1.1" 403 913 "-" "SpiderMan/1.0"

blend27

5:02 pm on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



While researching AWS recently, I found this:
[ip-ranges.amazonaws.com...]

which is described here:
[docs.aws.amazon.com...]

So they caved in by publishing the Ranges. Excellent move!

I wish all COLOS, HOSTING companies would do this...

Now for the rest of us here is a goldmine(Z)

Get a copy, parse it into ... whatever....


ARIN: ftp://ftp.arin.net/pub/rr/arin.db (BIG FILE)
RIPE: ftp://ftp.ripe.net/ripe/stats/membership/alloclist.txt (BIG File)

Some Rules:

where finalstring is any string concatenated or not..

if
finalstring contains 'host'
or finalstring contains 'cloud'
or finalstring contains 'server'
or finalstring contains 'web'
or finalstring contains 'Dedicated'
or finalstring contains 'Elastic'
or finalstring contains 'mail'
or finalstring contains ' colo'

DIG IN ;)

keyplyr

9:09 pm on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So they caved in by publishing the Ranges. Excellent move!
Not to diminish their importance for our use, but
that forum w/ the AWS IP lists has been up for several years.

trintragula

10:41 pm on May 6, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



From their blog - November 21st 2014:
Many of our customers have asked us for a detailed list of the IP address ranges assigned to and used by AWS. While the use cases vary from customer to customer, they generally involve firewalls and other forms of network access controls. In the past we have met this need by posting human-readable information to the EC2, S3, SNS, and CloudFront Forums.

IP Ranges in JSON Form
I am happy to announce that this information is now available in JSON form at [ip-ranges.amazonaws.com...] The information in this file is generated from our internal system-of-record and is authoritative. You can expect it to change several times per week and should poll accordingly.

What's different about this is that it's automated and machine readable, so the information can be downloaded and used automatically.

keyplyr

11:14 pm on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Except lately there are so many holes in AWS ranges needed to let legit users through, it really takes manual oversight on a daily basis.

trintragula

12:44 pm on May 17, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



As AWS are now reporting their ranges in machine readable format, I thought I'd take advantage of that.

A bit of diff(1) and sed(1) (and, surprisingly, sort(1) and uniq(1)) yields the following new ranges from the last couple of weeks:

52.2.0.0/15
52.76.0.0/17
52.95.52.0/22

These are the additions that occurred on May 14th from the previous update on April 27th.

It would not be difficult, I think, to download these ranges automatically, and integrate them with a block list.

One wrinkle with that is that they are supplied only via https, which is fine for the browser, but difficult for the likes of wget(1) and curl(1).

keyplyr

10:09 pm on May 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@trintragula - I don't know where those dates came from, but I can assure you I've had these ranges blocked for years:

52.0.0.0/11
52.0.0.0 - 52.31.255.255
52.64.0.0/11
52.64.0.0 - 52.95.255.255
52.80.0.0/12
52.80.0.0 - 52.95.255.255

So Amazon had the larger ranges registered, maybe just not the sub-nets designated for specific purpose.

trintragula

7:25 am on May 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Most likely these will be ranges that Amazon has only recently reassigned to be part of AWS. So with these updates we'll see a mixture of ranges that were previously already Amazon but not AWS, and new ranges not seen before.
The dates are at the top of the JSON files. It's the change in the date that prompted me to check for differences.
I mostly wanted to highlight the fact that they are actually maintaining this list and that it can be mechanically processed. I guess to make updates interesting here I'd need to filter out any ranges that are already assigned to Amazon generally. Although of course that too is a moving target...
This 108 message thread spans 4 pages: 108