Welcome to WebmasterWorld Guest from 54.162.157.249

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Amazon AWS Hosts Bad Bots

amazonaws.com

     
12:45 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



1.) Back in 2008, I noticed a lot of bad bots hailing from amazonaws.com and by January, 2009, I started a thread about what hid behind that early cloud:

amazonaws.com plays host to wide variety of bad bots [webmasterworld.com...]

Since that time, 270-plus reports/messages further document that the Amazon AWS Host name and Amazon AWS's countless IPs continue to be what forum mod IncrediBILL aptly termed:

"Cesspool."

This thread continues the saga of amazonaws.com and its spawn.

2.) The AWS cesspool is home to countless hundreds of bots, the vast majority of which ignore robots.txt. Home to hundreds more bots cloaked as regular UAs. Home to infected machines and bad programming, and all the ills to others that cloud anonymity affords.

And in recent weeks, home to bots with no UA at all... [webmasterworld.com...] Note the double-quotes at the end where a UA, or at least a hyphen, should be:

ec2-50-17-87-218.compute-1.amazonaws.com - - [00/Sep/2011:00:00:00] "GET /dir/filename.html HTTP/1.1" 403 1471 "-" ""

Today, the 'blank bot' -- what I've started thinking of as the AWSbot -- was the most frequent AWS 'visitor' to my main site. Four Hosts, four hits to different files, four 403s. robots.txt? NO
1:42 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



the 'blank bot' -- what I've started thinking of as the AWSbot

Gee, that's funny. I always think of it as the faviconbot ;)

How 'bout the new browser [webmasterworld.com]?

We sought from the start to tap into the power and capabilities of the AWS infrastructure

Now there's a sales pitch to make your blood run cold. And, as noted in that thread, it means messing about with your Allows and Denys so you don't end up locking out unsuspecting humans.
1:52 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I didn't want to mix up AWS bad bot sitings/reports in this thread with discussions of AWS (ww)world domination, Amazon's new Silk and Fire, etc. Check out the just-posted:

Amazon AWS gunning for Google? [webmasterworld.com...]
3:10 am on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Two hits to html files, ~15 secs apart.

ec2-50-19-197-197.compute-1.amazonaws.com
HTTP_Request2/2.0.0RC1 (http://pear.php.net/package/http_request2) PHP/5.3.2-1ubuntu4.9

robots.txt? NO
3:26 am on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Not all of AWS's UAs are obvious bots:

ec2-184-72-188-54.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729; .NET4.0E)

robots.txt? NO
4:00 pm on Oct 9, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-184-73-116-52.compute-1.amazonaws.com
Mozilla

robots.txt? NO

ec2-50-19-197-197.compute-1.amazonaws.com
HTTP_Request2/2.0.0RC1 (http://pear.php.net/package/http_request2) PHP/5.3.2-1ubuntu4.9

robots.txt? NO
12:24 am on Oct 17, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Today's worst AWS assault:

10 amazonaws.com servers
=> 26 unique, non-contiguous .html files, 1 .cgi file, 0 robots.txt
=> 27 403s in 9 secs

FWIW, sorted by server (per log program) thus times overlap. All ostensibly using:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1

ec2-50-18-13-33.us-west-1.compute.amazonaws.com
07:07:16 /dir2/file.html

ec2-50-18-27-118.us-west-1.compute.amazonaws.com
07:07:16 /dir/file27.html
07:07:16 /dir/file47.html
07:07:16 /dir1/file11.html

ec2-184-72-19-151.us-west-1.compute.amazonaws.com
07:07:14 /dir/file30.html
07:07:15 /dir/file42.html
07:07:16 /dir/file52.html

ec2-50-18-140-3.us-west-1.compute.amazonaws.com
07:07:14 /dir/file25.html
07:07:15 /dir/file38.html
07:07:16 /dir/file45.html

ec2-204-236-189-32.us-west-1.compute.amazonaws.com
07:07:13 /dir/file29.html
07:07:13 /dir/file13.html
07:07:14 /dir/file07.html
07:07:15 /dir/file41.html
07:07:15 /dir/file40.html
07:07:16 /dir/file48.html

ec2-50-18-85-139.us-west-1.compute.amazonaws.com
07:07:12 /dir2/dir/file.cgi
07:07:16 /dir/file51.html
07:07:16 /dir3/file.html

ec2-50-18-30-123.us-west-1.compute.amazonaws.com
07:07:09 /dir/file14.html
07:07:14 /dir/file19.html
07:07:16 /dir/file49.html

ec2-204-236-175-96.us-west-1.compute.amazonaws.com
07:07:08 /dir4/file.html

ec2-204-236-181-50.us-west-1.compute.amazonaws.com
07:07:07 /dir/file08.html

ec2-184-72-10-186.us-west-1.compute.amazonaws.com
07:07:07 /
07:07:15 /dir/file35.html
07:07:15 /dir/file32.html

##
2:32 am on Oct 17, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Must be my lucky day for hits from 50.18. --

ec2-50-18-23-16.us-west-1.compute.amazonaws.com
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; MALC)

16:42:59 /
16:43:00 /index.php
16:43:00 /index.php
16:43:01 /index.html
16:43:02 /index.html

Pure probe. There are no files by those names in that directory.
2:12 pm on Oct 19, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Two seconds apart to the same rarely directly-hit file. Coincidence?

ec2-204-236-161-233.us-west-1.compute.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)

02:38:30 /dir/filename.html
robots.txt? NO

ec2-50-16-74-139.compute-1.amazonaws.com
Mozilla/5.0 (compatible; Topicmarks/1.0)

02:38:32 /dir/filename.html
robots.txt? NO

Diffbot (old-timer): [google.com...]
Topicmarks (just posted): [webmasterworld.com...]
10:39 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had a visit from a log spammer coming from a new (to me) aws range : 107.20.0.0 - 107.23.255.255

Though it's not a crawler per se I thought I'd mention the range.
10:53 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



What were its IP and UA, please? TIA
1:10 pm on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



IP : 107.22.51.16
UA : Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0)

and, irony, log spamming for a web site for webmasters
1:09 am on Oct 22, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Thanks for the details, Staffa.
---
This next sighting makes sense seeing as how Amazon owns Alexa:

ec2-174-129-237-157.compute-1.amazonaws.com
ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

robots.txt? Yes
2:35 am on Oct 22, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've now completed the final step to leading a 100% Amazon free life style. A very liberating feeling ;)

Since the events a few months ago when Amazon abandoned their California sale affiliates (causing me a long week's work to re-architecture 3 good size web sites) added to the never-ending AWS nuisance, added to bogus Alexa ranking practices, added to the announcement that the Amazon market place would no longer give A-Z guarantees beyond 5 events, added to the rate increase w/ Amazon CC, added to their unwillingness to credit my card when one of their vendors reneged on a sale, ad infinitum...

All AWS IP ranges blocked, all Amazon IP ranges blocked, all Alexa IP ranges blocked, accounts of any Amazon affiliates doing business with us closed, all Amazon customer accounts closed/deleted, all contact info, browser favorites and any other connection to Amazon now deleted.

[edited by: keyplyr at 2:46 am (utc) on Oct 22, 2011]

9:44 pm on Oct 22, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Well done! Next week, google. :)
7:45 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-184-72-115-86.compute-1.amazonaws.com
DuckDuckPreview/1.0; (+http://duckduckgo.com/duckduckpreview.html)

robots.txt? NO

Previously, about DuckDuckBot: [webmasterworld.com...]

The UA's URL says they "grab pages on behalf of our users and display to them parts of those pages most relevant to their queries." Not. DuckDuckGo's hair-splitting 'not crawler, not spider' claims to the contrary, that AWS bot hit was not a real-time query "user."
9:46 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Always had that one down as a goodie (although not preview, which is a new one on me). Had an email exchange with the owner a while ago, as well, which seemed to go well.

If they've moved operations to AWS they won't find me again, though.
11:46 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Got twitter-swarmed a bit ago. In addition to a boatload of AWS bots, two particularly bad ones:

ec2-184-73-108-194.compute-1.amazonaws.com
MetaURI API/2.0 +metauri.com
robots.txt? NO
ERROR: Client sent malformed Host header <-- x2

ec2-50-18-24-18.us-west-1.compute.amazonaws.com
percbotspider
robots.txt? NO
ERROR: Client sent malformed Host header <-- x2
12:45 am on Oct 26, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



DuckDuck got some mention here when they first launched; seemed like a clever start-up. Too bad they're now coming from AWS.
9:28 pm on Oct 26, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Got several hits today on one site with the UA:

Test Spider 0.2

Imaginative! Hit with requests for a few long-standing pages, some long-missing pages and some never-there sitemap files. Blocked, of course.
10:59 am on Oct 30, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



New Twitter-swarmer:

ec2-50-18-170-80.us-west-1.compute.amazonaws.com
NewsTrust

robots.txt? NO

And MetaURI is getting worse. Out of five hits, it blew THREE errors this time:

ec2-50-17-88-207.compute-1.amazonaws.com
MetaURI API/2.0 +metauri.com

[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header
[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header
[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header
8:28 pm on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-174-129-37-252.compute-1.amazonaws.com
wf_crawler (http://www.websitefigures.com)

robot.txt? NO

More details in the just-posted "wf_crawler" [webmasterworld.com...]
8:33 pm on Nov 12, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yet another AWS somethingorother:

ec2-175-41-250-151.ap-northeast-1.compute.amazonaws.com
ceron.jp/1.0

robots.txt? NO

[robtex.com...]
12:16 pm on Nov 19, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



rDNS: ec2-50-112-27-181.us-west-2.compute.amazonaws.com
UA: Mozilla/5.0 (compatible; Bender; http://benderthewebrobot.tumblr.com)
robots.txt: no

Image scraper. I didn't have this range blocked. Maybe new?

50.112.0.0 - 50.112.255.255
50.112.0.0/16
3:09 pm on Nov 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the heads up

175.41 and 50.112 are ranges new to me
8:16 pm on Nov 19, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



FWIW: [webmasterworld.com...]
9:01 pm on Nov 19, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



The 50.112/16 was new to me, too. Thanks.
3:04 am on Nov 21, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yet another you-know-whatter:

ec2-174-129-32-219.compute-1.amazonaws.com
TweetReports.com

robts.txt? NO
1:57 pm on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Same bot, same behavior, two minutes apart:

ec2-184-72-68-95.compute-1.amazonaws.com
ec2-50-17-154-105.compute-1.amazonaws.com
SemrushBot/0.9

robots.txt? Yes

Previously about SemrushBot... [webmasterworld.com...]
9:37 pm on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



A month ago AWS made an announcement they've added a new US West (Oregon) Region. Possible new IP ranges?

[aws.amazon.com...]
This 88 message thread spans 3 pages: 88
 

Featured Threads

Hot Threads This Week

Hot Threads This Month