Welcome to WebmasterWorld Guest from 54.166.112.74

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

   
3:04 am on Jan 18, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

4:08 am on Jan 16, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Long story short? Block .amazonaws.com :)

IP range-wise, the IPs are 'in' the Host names.* Now as to how many there are, let alone what they are, I'm sorry but I'll have to leave that compilation as a sweat equity exercise for the bot-curious/obsessed at this time. Suffice it to say that akin to any country -- and numbering more than many countries'! -- Amazon's cloud-related IPs are neither contiguous nor non-expanding.

.
*The second post in the MetaURI [webmasterworld.com] thread shows more detail, including an atypical example of the same UA using the exact same AWS IP over a period of time.

11:17 pm on Jan 16, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Emphasis mine. See link below for more info:

ec2-174-129-120-104.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)

robots.txt? NO
URI: /favicon.ico
Related: Yahoo's cloaked crawler(s) [webmasterworld.com]

4:41 pm on Jan 17, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-174-129-237-42.compute-1.amazonaws.com
OpenCalaisSemanticProxy

robots.txt? NO

4:55 pm on Jan 18, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-204-236-247-88.compute-1.amazonaws.com
@hourlypress

robots.txt? NO

Twitter-related.

6:39 pm on Jan 18, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Two hits 20 seconds apart; UA not too cleverly cloaked:

ec2-75-101-147-15.compute-1.amazonaws.com
Firefox

robots.txt? NO

6:02 am on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it?

I see amazonaws.com in many of my sites referral logs with lots of visits week after week and month after month.

Are they owned by amazon.com? Why do they want to visit my sites in the first place? How do they know about my url's? Sometimes they visit even before anyone else does or the site has time to get listed in Google.

How would they even know my url was just put online? Sometimes they are my #1 traffic source with both newer and older sites. How and why is this happening? Who are they?

7:02 am on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it?

Read the thread. It's explained.
11:20 pm on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Long thread short, the title tells it all:

amazonaws.com plays host to wide variety of bad bots

That statement was true exactly one year+three days ago when I started this thread. Now, 125-plus posts later, and scores and scores and scores of bad, rude, iffy, test, and/or worthless bot hits later, that statement's an understatement.

So if your main traffic source is amazonaws.com-related, I'm sorry but none of that resource-eating traffic -- NONE of it -- is real people visiting in real time.

Block amazonaws.com and you'll stop feeding its plague of locusts.

5:14 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Not sure how this fits into the Amazon scenario:

IP: 216.113.169.nnn
Standard Headers: Accept All only
UA: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2a1pre) Gecko/20090402 Firefox/3.6a1pre (.NET CLR 3.5.30729)
Referer: http:// www. google. com
HTTP_CONNECTION: Keep-Alive
Robots: haven't checked

The hit (home page of one site only) was trapped on several points that usually indicate an exploit attempt.

eBay, Inc
OrgID: EBAY
NetRange: 216.113.160.0 - 216.113.191.255

8:13 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Hmm. I may be missing something but I don't see a connection between your info and amazonaws.com or anything Amazon.

If you suspect a cloaked/rogue spider, you could start a new thread with more details about the visit. Alternatively, It could've just been a zombied machine belonging to an eBay employee.

9:48 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Hmm. I'll just go and bury my head in a bucket, then, shall I?

Not being a user of either, I totally confused Amazon and Ebay. Sorry. :(

2:15 am on Jan 22, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member




Alas, because amazonaws.com is a never-ending source of new bad bots, I suspect this thread -- and my many bot-sightings in it -- will continue (...ad nauseam, sorry:)
8:55 pm on Jan 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a note about a bot identifying itself as PostRank. Somebody above mentioned that it made one HEAD request and then left.

However, I've just seen that bot (same AWS IP range) request /wp-admin/install.php. I can't see any good reason for a bot to want that...

-- b

10:55 pm on Jan 26, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I see a lot of php accesses through folders such as admin. All, as far as I've seen, are attempts to gain access to the server through faults (eg no password) in admin whatever files, often for phpmyadmin or other control panels.

As I noted elswhere (I think this thread): I have seen AWS used either directly (via logged on account) or indirectly (via infected servers) for "standard botnet" exploit attempts.

2:06 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



The enormous number of "php" requests via the discussed ip range(s) is why I've been intrigued, and why I finally nuked it a week or so back. I could have approached it the other way around via referer and accomplished same thing, but life is too short and there's too many rogue bots on that service.
2:25 am on Jan 30, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-75-101-243-212.compute-1.amazonaws.com
curl/7.18.2 (i486-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18

robots.txt? NO

9:09 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-204-236-242-36.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes

[Above info also posted as new thread.]

Curiously, per WHOIS, the bot-runner's site appears to be hosted by Amazonaws.com:

seoprofiler.com => 174.129.8.145 => Amazon.com/amazonaws.com
[Amazon Web Services, Elastic Compute Cloud, EC2]

Interesting how a company can claim a dynamically assigned IP as its permanent address...
7:13 pm on Feb 7, 2010 (gmt 0)

5+ Year Member



New: AMAZON-EC2-7 = 184.72.0.0/15
8:21 pm on Feb 7, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Well spotted! My very first ban of the 184 range. :)
10:57 am on Feb 11, 2010 (gmt 0)

5+ Year Member



These are the CIDR blocks I have for amazonaws's bad bots at this point. Does this seem to cover all of them or are there more?

67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots
75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots
79.125.0.0/18 # "do not delete" - amazonaws.com's bad bots - Ireland
174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots
184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots
204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots

Thomas
1:11 pm on Feb 11, 2010 (gmt 0)

5+ Year Member



+
216.182.224.0/20
72.44.32.0/19
7:06 pm on Feb 11, 2010 (gmt 0)

5+ Year Member



My laundry list of ec2 trash includes the following.

deny from 67.202.0.0/18 "Amazon ec2-Cloud"
deny from 72.44.32.0/19 "Amazon ec2-Cloud"
deny from 75.101.128.0/17 "Amazon ec2-Cloud"
deny from 79.125.0.0/18 "Amazon ec2-Cloud"
deny from 174.129.0.0/16 "Amazon ec2-Cloud"
deny from 184.72.0.0/15 "Amazon ec2-Cloud"
deny from 204.74.108.0/24 "Amazon ec2-Cloud"
deny from 204.236.128.0/17 "Amazon ec2-Cloud"
deny from 204.74.108.0/24 "Amazon ec2-Cloud"

I have seen virtually every real and fake user agent included and NEVER once have I been able to figure out why I should allow them to index my site. I spoke with Amazon Services about a year ago and explained to them why I am blocking them and requested that they force a user-ID tag of some sort to identify the user for abuse reasons and they explained to me that this will never happen....and so I stated that's too bad and your services will always be blocked as well. I honestly do not think they care and do not understand that they are only enabling web scrapers, spammers and other trash to ruin their integrity. Again I do not think they care so long as they are getting paid.
8:04 pm on Feb 11, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



CLOUD: Creepy Litigious Outrageous User-agent Dwelling
8:30 pm on Feb 11, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I've got the Ireland one 79.125.0.0/127

204.74.108.0/24 (which you list twice) resolves here to:
UltraDNS Corp ULTRADNS-GLOBAL-2
204.74.96.0 - 204.74.108.255
108 seems to be mostly unused apart from loads of name servers on 1.
4:44 pm on Feb 14, 2010 (gmt 0)

5+ Year Member



Thanks thetrasher for those 2 and dstiles the Ireland one was 79.125.0.0/17 not 18 like I had. I assumed you meant 17 not 127?

67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots
72.44.32.0/19 # "do not delete" - amazonaws.com's bad bots
75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots
79.125.0.0/17 # "do not delete" - amazonaws.com's bad bots - Ireland
174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots
184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots
204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots
216.182.224.0/20 # "do not delete" - amazonaws.com's bad bots

That's a lot of IP addresses but I can not think of a single reason not to block them all.

Thomas
5:27 pm on Feb 14, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



> I assumed you meant 17 not 127?

Sorry. It had been a long day. :)
6:06 pm on Feb 15, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-75-101-245-135.compute-1.amazonaws.com
Twisted PageGetter

robots.txt? NO

See also: Twisted PageGetter [webmasterworld.com] (09/2009)
3:24 am on Feb 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



CLOUD: Creepy Litigious Outrageous User-agent Dwelling


CLOUD: 403
1:00 am on Feb 18, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-174-129-153-217.compute-1.amazonaws.com
HTMLParser/2.0

robots.txt? NO
1:15 am on Feb 18, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-174-129-167-253.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728)

robots.txt? NO

What bothers me about this one is not just its non-bot UA, but that its hits only went to two html files and some of their graphics files, and knew just where to look before arriving. I've blocked amazonaws.com for over a year -- basically from the first day I saw it -- so the directory paths didn't come from my server. Hmm.
This 278 message thread spans 10 pages: 278