homepage Welcome to WebmasterWorld Guest from 54.237.213.31
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 88 message thread spans 3 pages: 88 ( [1] 2 3 > >     
Amazon AWS Hosts Bad Bots
amazonaws.com
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 12:45 am on Sep 30, 2011 (gmt 0)

1.) Back in 2008, I noticed a lot of bad bots hailing from amazonaws.com and by January, 2009, I started a thread about what hid behind that early cloud:

amazonaws.com plays host to wide variety of bad bots [webmasterworld.com...]

Since that time, 270-plus reports/messages further document that the Amazon AWS Host name and Amazon AWS's countless IPs continue to be what forum mod IncrediBILL aptly termed:

"Cesspool."

This thread continues the saga of amazonaws.com and its spawn.

2.) The AWS cesspool is home to countless hundreds of bots, the vast majority of which ignore robots.txt. Home to hundreds more bots cloaked as regular UAs. Home to infected machines and bad programming, and all the ills to others that cloud anonymity affords.

And in recent weeks, home to bots with no UA at all... [webmasterworld.com...] Note the double-quotes at the end where a UA, or at least a hyphen, should be:

ec2-50-17-87-218.compute-1.amazonaws.com - - [00/Sep/2011:00:00:00] "GET /dir/filename.html HTTP/1.1" 403 1471 "-" ""

Today, the 'blank bot' -- what I've started thinking of as the AWSbot -- was the most frequent AWS 'visitor' to my main site. Four Hosts, four hits to different files, four 403s. robots.txt? NO

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4368965 posted 1:42 am on Sep 30, 2011 (gmt 0)

the 'blank bot' -- what I've started thinking of as the AWSbot

Gee, that's funny. I always think of it as the faviconbot ;)

How 'bout the new browser [webmasterworld.com]?

We sought from the start to tap into the power and capabilities of the AWS infrastructure

Now there's a sales pitch to make your blood run cold. And, as noted in that thread, it means messing about with your Allows and Denys so you don't end up locking out unsuspecting humans.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 1:52 am on Sep 30, 2011 (gmt 0)

I didn't want to mix up AWS bad bot sitings/reports in this thread with discussions of AWS (ww)world domination, Amazon's new Silk and Fire, etc. Check out the just-posted:

Amazon AWS gunning for Google? [webmasterworld.com...]

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 3:10 am on Oct 8, 2011 (gmt 0)

Two hits to html files, ~15 secs apart.

ec2-50-19-197-197.compute-1.amazonaws.com
HTTP_Request2/2.0.0RC1 (http://pear.php.net/package/http_request2) PHP/5.3.2-1ubuntu4.9

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 3:26 am on Oct 8, 2011 (gmt 0)

Not all of AWS's UAs are obvious bots:

ec2-184-72-188-54.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729; .NET4.0E)

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 4:00 pm on Oct 9, 2011 (gmt 0)

ec2-184-73-116-52.compute-1.amazonaws.com
Mozilla

robots.txt? NO

ec2-50-19-197-197.compute-1.amazonaws.com
HTTP_Request2/2.0.0RC1 (http://pear.php.net/package/http_request2) PHP/5.3.2-1ubuntu4.9

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 12:24 am on Oct 17, 2011 (gmt 0)

Today's worst AWS assault:

10 amazonaws.com servers
=> 26 unique, non-contiguous .html files, 1 .cgi file, 0 robots.txt
=> 27 403s in 9 secs

FWIW, sorted by server (per log program) thus times overlap. All ostensibly using:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1

ec2-50-18-13-33.us-west-1.compute.amazonaws.com
07:07:16 /dir2/file.html

ec2-50-18-27-118.us-west-1.compute.amazonaws.com
07:07:16 /dir/file27.html
07:07:16 /dir/file47.html
07:07:16 /dir1/file11.html

ec2-184-72-19-151.us-west-1.compute.amazonaws.com
07:07:14 /dir/file30.html
07:07:15 /dir/file42.html
07:07:16 /dir/file52.html

ec2-50-18-140-3.us-west-1.compute.amazonaws.com
07:07:14 /dir/file25.html
07:07:15 /dir/file38.html
07:07:16 /dir/file45.html

ec2-204-236-189-32.us-west-1.compute.amazonaws.com
07:07:13 /dir/file29.html
07:07:13 /dir/file13.html
07:07:14 /dir/file07.html
07:07:15 /dir/file41.html
07:07:15 /dir/file40.html
07:07:16 /dir/file48.html

ec2-50-18-85-139.us-west-1.compute.amazonaws.com
07:07:12 /dir2/dir/file.cgi
07:07:16 /dir/file51.html
07:07:16 /dir3/file.html

ec2-50-18-30-123.us-west-1.compute.amazonaws.com
07:07:09 /dir/file14.html
07:07:14 /dir/file19.html
07:07:16 /dir/file49.html

ec2-204-236-175-96.us-west-1.compute.amazonaws.com
07:07:08 /dir4/file.html

ec2-204-236-181-50.us-west-1.compute.amazonaws.com
07:07:07 /dir/file08.html

ec2-184-72-10-186.us-west-1.compute.amazonaws.com
07:07:07 /
07:07:15 /dir/file35.html
07:07:15 /dir/file32.html

##

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 2:32 am on Oct 17, 2011 (gmt 0)

Must be my lucky day for hits from 50.18. --

ec2-50-18-23-16.us-west-1.compute.amazonaws.com
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; MALC)

16:42:59 /
16:43:00 /index.php
16:43:00 /index.php
16:43:01 /index.html
16:43:02 /index.html

Pure probe. There are no files by those names in that directory.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 2:12 pm on Oct 19, 2011 (gmt 0)

Two seconds apart to the same rarely directly-hit file. Coincidence?

ec2-204-236-161-233.us-west-1.compute.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)

02:38:30 /dir/filename.html
robots.txt? NO

ec2-50-16-74-139.compute-1.amazonaws.com
Mozilla/5.0 (compatible; Topicmarks/1.0)

02:38:32 /dir/filename.html
robots.txt? NO

Diffbot (old-timer): [google.com...]
Topicmarks (just posted): [webmasterworld.com...]

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4368965 posted 10:39 am on Oct 21, 2011 (gmt 0)

I had a visit from a log spammer coming from a new (to me) aws range : 107.20.0.0 - 107.23.255.255

Though it's not a crawler per se I thought I'd mention the range.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 10:53 am on Oct 21, 2011 (gmt 0)

What were its IP and UA, please? TIA

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4368965 posted 1:10 pm on Oct 21, 2011 (gmt 0)

IP : 107.22.51.16
UA : Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0)

and, irony, log spamming for a web site for webmasters

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 1:09 am on Oct 22, 2011 (gmt 0)

Thanks for the details, Staffa.
---
This next sighting makes sense seeing as how Amazon owns Alexa:

ec2-174-129-237-157.compute-1.amazonaws.com
ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

robots.txt? Yes

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4368965 posted 2:35 am on Oct 22, 2011 (gmt 0)

I've now completed the final step to leading a 100% Amazon free life style. A very liberating feeling ;)

Since the events a few months ago when Amazon abandoned their California sale affiliates (causing me a long week's work to re-architecture 3 good size web sites) added to the never-ending AWS nuisance, added to bogus Alexa ranking practices, added to the announcement that the Amazon market place would no longer give A-Z guarantees beyond 5 events, added to the rate increase w/ Amazon CC, added to their unwillingness to credit my card when one of their vendors reneged on a sale, ad infinitum...

All AWS IP ranges blocked, all Amazon IP ranges blocked, all Alexa IP ranges blocked, accounts of any Amazon affiliates doing business with us closed, all Amazon customer accounts closed/deleted, all contact info, browser favorites and any other connection to Amazon now deleted.

[edited by: keyplyr at 2:46 am (utc) on Oct 22, 2011]

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4368965 posted 9:44 pm on Oct 22, 2011 (gmt 0)

Well done! Next week, google. :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 7:45 pm on Oct 25, 2011 (gmt 0)

ec2-184-72-115-86.compute-1.amazonaws.com
DuckDuckPreview/1.0; (+http://duckduckgo.com/duckduckpreview.html)

robots.txt? NO

Previously, about DuckDuckBot: [webmasterworld.com...]

The UA's URL says they "grab pages on behalf of our users and display to them parts of those pages most relevant to their queries." Not. DuckDuckGo's hair-splitting 'not crawler, not spider' claims to the contrary, that AWS bot hit was not a real-time query "user."

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4368965 posted 9:46 pm on Oct 25, 2011 (gmt 0)

Always had that one down as a goodie (although not preview, which is a new one on me). Had an email exchange with the owner a while ago, as well, which seemed to go well.

If they've moved operations to AWS they won't find me again, though.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 11:46 pm on Oct 25, 2011 (gmt 0)

Got twitter-swarmed a bit ago. In addition to a boatload of AWS bots, two particularly bad ones:

ec2-184-73-108-194.compute-1.amazonaws.com
MetaURI API/2.0 +metauri.com
robots.txt? NO
ERROR: Client sent malformed Host header <-- x2

ec2-50-18-24-18.us-west-1.compute.amazonaws.com
percbotspider
robots.txt? NO
ERROR: Client sent malformed Host header <-- x2

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4368965 posted 12:45 am on Oct 26, 2011 (gmt 0)

DuckDuck got some mention here when they first launched; seemed like a clever start-up. Too bad they're now coming from AWS.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4368965 posted 9:28 pm on Oct 26, 2011 (gmt 0)

Got several hits today on one site with the UA:

Test Spider 0.2

Imaginative! Hit with requests for a few long-standing pages, some long-missing pages and some never-there sitemap files. Blocked, of course.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 10:59 am on Oct 30, 2011 (gmt 0)

New Twitter-swarmer:

ec2-50-18-170-80.us-west-1.compute.amazonaws.com
NewsTrust

robots.txt? NO

And MetaURI is getting worse. Out of five hits, it blew THREE errors this time:

ec2-50-17-88-207.compute-1.amazonaws.com
MetaURI API/2.0 +metauri.com

[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header
[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header
[21:30:23 2011] [error] [client 50.17.88.207] Client sent malformed Host header

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 8:28 pm on Nov 12, 2011 (gmt 0)

ec2-174-129-37-252.compute-1.amazonaws.com
wf_crawler (http://www.websitefigures.com)

robot.txt? NO

More details in the just-posted "wf_crawler" [webmasterworld.com...]

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 8:33 pm on Nov 12, 2011 (gmt 0)

Yet another AWS somethingorother:

ec2-175-41-250-151.ap-northeast-1.compute.amazonaws.com
ceron.jp/1.0

robots.txt? NO

[robtex.com...]

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4368965 posted 12:16 pm on Nov 19, 2011 (gmt 0)

rDNS: ec2-50-112-27-181.us-west-2.compute.amazonaws.com
UA: Mozilla/5.0 (compatible; Bender; http://benderthewebrobot.tumblr.com)
robots.txt: no

Image scraper. I didn't have this range blocked. Maybe new?

50.112.0.0 - 50.112.255.255
50.112.0.0/16

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4368965 posted 3:09 pm on Nov 19, 2011 (gmt 0)

Thanks for the heads up

175.41 and 50.112 are ranges new to me

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 8:16 pm on Nov 19, 2011 (gmt 0)

FWIW: [webmasterworld.com...]

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4368965 posted 9:01 pm on Nov 19, 2011 (gmt 0)

The 50.112/16 was new to me, too. Thanks.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 3:04 am on Nov 21, 2011 (gmt 0)

Yet another you-know-whatter:

ec2-174-129-32-219.compute-1.amazonaws.com
TweetReports.com

robts.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4368965 posted 1:57 pm on Nov 25, 2011 (gmt 0)

Same bot, same behavior, two minutes apart:

ec2-184-72-68-95.compute-1.amazonaws.com
ec2-50-17-154-105.compute-1.amazonaws.com
SemrushBot/0.9

robots.txt? Yes

Previously about SemrushBot... [webmasterworld.com...]

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4368965 posted 9:37 pm on Nov 25, 2011 (gmt 0)

A month ago AWS made an announcement they've added a new US West (Oregon) Region. Possible new IP ranges?

[aws.amazon.com...]

This 88 message thread spans 3 pages: 88 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved