Forum Moderators: open

Message Too Old, No Replies

Amazon AWS Hosts Bad Bots

Continuation Thread

         

incrediBILL

11:16 pm on May 16, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a continuation from the previous thread:
[webmasterworld.com...]

Post about spiders coming from Amazon's AWS hosting.

keyplyr

4:16 am on Nov 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just a heads-up: I am blocking the Amazon ranges listed in these forums. Occasionally I get reports from Kindle users they are being blocked or that they do not see my site's images, or that certain functions of my sites are not performing properly. These reports only from Kindle users.

The IPs (if they include them in the report) are spread across various Amazon ranges and I now suspect these are being assigned dynamically, for various reasons.

There always were tales of Amazon caching images and doing other resource saving measures for their Kindle clients, but until lately I didn't notice any backlash to blocking AWS. Either my sites are becoming more popular with Kindle users, or Kindle users are growing to effectively make this kind of impact.

dstiles

7:32 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've only seen a few kindles in the past and if they use amazon IPs they're blocked (I block all amazon ranges I know of) but I sometimes see amazon as the forwarding IP of a proxy. Haven't had any complaints yet.

dstiles

4:53 pm on Feb 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New range, registered in January:

52.0.0.0 - 52.31.255.255
52.0.0.0/11

blend27

5:46 pm on Feb 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



JUST FYI:
Amazon Kindle has a few variation of UAs.
[docs.aws.amazon.com...]

I have several users/customers that have placed orders on one of e-com sites - using Kindle(z), all using their own WiFi.

Have not seen that many from Amazon ranges - but a few for sure.

trintragula

4:59 pm on Feb 5, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



The first from that range showed up on my site yesterday. Proximic.
And other today purporting to be Chrome/35, but not behaving like it...
No sign of Silk.

blend27

3:30 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some days I really really wish that there was more of the movement against the kind of behavior that is allowed by Amazon INC. Something in a way that a few developers get together and write some code as a plugin in several different scripting languages that not only blocks requests from all Amazon Ranges but also sends an email to ec2-abuse@amazon.com on every request that is made from their servers to the site where the script is installed.

54.201.86.30 just tried to access every possible combination of every known PHP based Exploits on one of my sites.

72 requests in 4 seconds.

UA: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)

If only Jeffrey Preston "Jeff" Bezos knew how much bad publicity this thread generates in a tech-savy webmaster comunity.....

wilderness

3:51 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



blend,
AWS could care less about what their customers are doing, unless they fail to pay the bills.
AWS made their bones on offering this type of hosting.

blend27

4:02 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know they do and I know they did, but when the rest of the net fails to load on Kindle/Silk browser that is another story. That is hitting them where the monies are.

Imagine buying Kindle Fire HDX 8.9 for Valentines day as a gift to someone only to hear from that someone a few days later that half the Net wont work on it half the time.

I know, wishful thinking.

dstiles

7:28 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've just applied some logic to the Amazon ranges to detect access via a proxy named "HTTP/1.1 silk" in order to allow silk devices into my sites whilst blocking the rest of amazon (apart from nokia ranges mentioned hereabouts).

What is the difference between kindle and silk?

Does anyone know of other browsing devices using a legit proxy through amazon?

wilderness

8:27 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but when the rest of the net fails to load on Kindle/Silk


blend,
Websites blocking server farms (AWS or otherwise) are in the miniscule category.
Most webmasters are not even aware of 'raw access logs'.

keyplyr

8:52 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Silk is kindle, but not all kindles are silk. - Dalai Lama

lucy24

9:50 pm on Feb 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New range, registered in January:
52.0.0.0/11

wtf? They've bought up everything Merck was willing to sell, and now they're starting on DuPont?

There's a dissertation in there somewhere.

keyplyr

12:40 am on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Expanding a little on the Kindle question... I asked my student who works for Amazon. He writes code for some of the Kindles. He said the Silk is the proxy service that Kindle uses for caching images, scripts, etc... those files that use under a certain bit packet size that cache easily. Not for web pages especially, but he didn't know exactly.

This (Amazon range) proxy can be turned on/off in the Kindle web browser. Not all Kindle devices offer this feature. Sorry, don't know which do/don't but I assume only the later built models are equipped with Silk.

I've had this guy access my site using a couple different Kindles. Even though I block all known Amazon EC, EC2 & AWS ranges including any other range operated by Amazon, he has no problem with my site. I assume because his Kindle connects to the internet from his ISP.

When he works from his home computer, but connected to their system, he says he is blocked and gets my custom Forbidden 403 page.

trintragula

11:55 am on Feb 11, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



On an average day recently, I've been seeing about 700 page requests from AWS, and generally they all get blocked - which usually makes them my blocker's biggest customer... Good traffic would get through - though most days there isn't any good traffic from AWS.

I saw a couple of odd Silk UAs last year:
"Mozilla/5.0 (Linux; U; Android android-version; locale; product-model Build/product-build) AppleWebKit/webkit-version (KHTML, like Gecko) Silk/browser-version like Chrome/chrome-version Safari/webkit-version"

"Mozilla/5.0 (Linux; U; locale; product-model Build/product-build) AppleWebKit/webkit-version (KHTML, like Gecko) Silk/browser-version Safari/webkit-version Silk-Accelerated=cloud-browsing-state"

I've not altered the strings, and they came with the quotes... and were from AWS.

lucy24

3:12 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've not altered the strings, and they came with the quotes

It's the bot-runner's equivalent of "I comma insert your name here comma".

dstiles

8:18 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the comments on silk, guys.

One more amazon-related question:

fliboard and flipboardproxy, accessing from amazon. Do these provide a useful service - ie are they herding traffic towards our sites? I notice that flipboardproxy is not actually a proxy access. It's just part of the UA.

blend27

3:17 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New range, registered in January:

And already trying to send its 007 agent to my sites:

52.16.142.153
ec2-52-16-142-153.eu-west-1.compute.amazonaws.com
UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36

keyplyr

8:23 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've always blocked fliboard et al. Correct, not a proxy. IMO just a social parasite wanting to take direct traffic from my site.

lucy24

10:47 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got a couple of mysterious favicon requests from the new 52 range:
52.0.150.166
52.1.98.62
Two unrelated humanoid UAs, no other requests (this is a side site, meaning that log entries can be minutes or even hours apart, no possibility of overlap or ambiguity). Are they running some kind of proxy on the side?

keyplyr

11:28 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



AWS does offer proxy addons.

keyplyr

10:39 am on Mar 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Was 54.64.0.0/15
Now is 54.64.0.0/13

Don't know when this happened exactly. Maybe everyone else here had the range updated except me. Noticed it when this thing got through:

54.69.5.150 - - [21/Mar/2015:00:01:11 -0700] "GET /example.html HTTP/1.1" 200 7668 "http://news.google.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)"

The referrer is fake.
Robots.txt: no

Scraped 6 web pages along with browser sniffing & bootstrap scripting & css for mobile, but none of the other files associated with these pages... seems odd, but it's blocked now.

This is not something that most webmasters will want accessing their sites. It's been around for a few years, coming from a couple different hosts. There are a couple WW threads about it. Bad news IMO.

keyplyr

11:31 am on Mar 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had been poking several holes in AWS 54.192.0.0 - 54.255.255.255 for Nokia Express mobile networks:
54.209.248.0 - 54.209.251.255
54.236.252.0 - 54.236.255.255
54.244.56.0 - 54.244.63.255
54.246.252.0 - 54.246.255.255

WHOIS still says these ranges are Nokia Express, but lately I have been seeing a lot of bad actors. I have decided to close it up and deny the entire AWS range: 54.192.0.0/10. Sorry Nokia, but you need to move to a better neighborhood.

lucy24

8:23 pm on Mar 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Was 54.64.0.0/15
Now is 54.64.0.0/13

Well, ###! I'd missed even the /15, and still had the whole thing down as Merck. But the good news is, the remainder of 72-95 is Amazon Ireland (why always Ireland? Is it a tax thing?) so most people can now proceed directly to
54.64.0.0/11

Quick detour to free lookup says that
54.0.0.0/10
and
54.96.0.0/11
54.128.0.0/12
are still Merck. At least this week.

Obvious follow-up leads to:
Yikes!
When I wasn't looking,
52.64.0.0/12
went over to Amazon Australia
52.80.0.0/12
(in /13 and /14 subsegments) also Amazon.

The remainder (that is, 96 and up) is still DuPont. At least this week.

keyplyr

11:02 pm on Mar 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



why always Ireland? Is it a tax thing?

I assume it is. Why else would Zuckerberg put all his eggs there.

I agree with 54.64/11, thanks

dstiles

2:27 pm on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another amazon range. There is a world shortage of ipv4 IPs and this company is hoarding the things! :(

52.64.0.0 - 52.95.255.255
52.64.0.0/12
52.80.0.0/14
52.88.0.0/13
52.84.0.0/14

keyplyr

11:43 pm on Apr 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



* Possible Game Changer *

Just had a guy complain that he was being blocked when following a Facebook link to my site via his iPhone, so I set up a test page and had him try again so I could get his browser/IP info:

54.149.96.34 - - [12/Apr/2015:16:31:08 -0700] "GET /TEST.html HTTP/1.1" 403 968 "http://m.facebook.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12D508 [FBAN/MessengerForiOS;FBAV/25.0.0.4.14;FBBV/8936291;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iPhone OS;FBSV/8.2;FBSS/2; FBCR/Verizon;FBID/phone;FBLC/en_US;FBOP/5]"

The AWS range I had blocked is:

54.144.0.0/12
54.144.0.0 - 54.159.255.255

So not only are Kindle users sometimes blocked because of AWS filtering, now it's iPhone.

[UPDATE]
Spoke with this guy again. He says he's using the native Facebook app on his iPhone. So it looks as though it is this app that resides within AWS ranges.

Pfui

12:43 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, drat. Yet another hoop to jump through. (frowns)

Hmm. A quick bit that might help distinguish, uh, something... My 'plain' Fb referrers include a trailing slash:

http://m.facebook.com/


BUT -- hits from that stupid-crazy-long Fb phone UA (like your tester's REF) do not:

http://m.facebook.com


Huh. Usually hits from non-trailing-slash Hosts are fake.

Great. Now I'm even more confused.

Pfui

12:49 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Spoke with this guy again. He says he's using the native Facebook app on his iPhone. So it looks as though it is this app that resides within AWS ranges.

Apparently not all, fortunately. I just noted fresh iPhone/Fb app/no-trailing-slash hits from two other ISPs:

AT&T Mobility
Orange Home UK

(Does he work for Amazon?)

[edited by: Pfui at 12:51 am (utc) on Apr 13, 2015]

keyplyr

12:50 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Trailing slash (or not) may be an admin config for log files.

No, I'm almost certain he does not work for Amazon. His mobile network is Verizon (attribute in UA.) It's the app that connects to FB via AWS.

This actually does not surprise me. Most of the hits I've been blocking are from apps that lease server space at AWS. But when it is a native FB app that comes installed on iPhones, it's a very big deal.

A normal day's 403s for my site is 1500 to 2k; too many to manually verify. The expendable collateral damage margin is a slippery slope.

lucy24

3:51 am on Apr 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why do you need to consider the referer when the UA itself says it's the Facebook app? (Can't remember how long ago it was, but I've met it before.)
This 108 message thread spans 4 pages: 108