Welcome to WebmasterWorld Guest from 184.72.177.182

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

     
3:04 am on Jan 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

4:08 am on Jan 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Long story short? Block .amazonaws.com :)

IP range-wise, the IPs are 'in' the Host names.* Now as to how many there are, let alone what they are, I'm sorry but I'll have to leave that compilation as a sweat equity exercise for the bot-curious/obsessed at this time. Suffice it to say that akin to any country -- and numbering more than many countries'! -- Amazon's cloud-related IPs are neither contiguous nor non-expanding.

.
*The second post in the MetaURI [webmasterworld.com] thread shows more detail, including an atypical example of the same UA using the exact same AWS IP over a period of time.

11:17 pm on Jan 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Emphasis mine. See link below for more info:

ec2-174-129-120-104.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)

robots.txt? NO
URI: /favicon.ico
Related: Yahoo's cloaked crawler(s) [webmasterworld.com]

4:41 pm on Jan 17, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-174-129-237-42.compute-1.amazonaws.com
OpenCalaisSemanticProxy

robots.txt? NO

4:55 pm on Jan 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-204-236-247-88.compute-1.amazonaws.com
@hourlypress

robots.txt? NO

Twitter-related.

6:39 pm on Jan 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Two hits 20 seconds apart; UA not too cleverly cloaked:

ec2-75-101-147-15.compute-1.amazonaws.com
Firefox

robots.txt? NO

6:02 am on Jan 20, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 20, 2004
posts:703
votes: 0


Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it?

I see amazonaws.com in many of my sites referral logs with lots of visits week after week and month after month.

Are they owned by amazon.com? Why do they want to visit my sites in the first place? How do they know about my url's? Sometimes they visit even before anyone else does or the site has time to get listed in Google.

How would they even know my url was just put online? Sometimes they are my #1 traffic source with both newer and older sites. How and why is this happening? Who are they?

7:02 am on Jan 20, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5820
votes: 64


Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it?

Read the thread. It's explained.
11:20 pm on Jan 20, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Long thread short, the title tells it all:

amazonaws.com plays host to wide variety of bad bots

That statement was true exactly one year+three days ago when I started this thread. Now, 125-plus posts later, and scores and scores and scores of bad, rude, iffy, test, and/or worthless bot hits later, that statement's an understatement.

So if your main traffic source is amazonaws.com-related, I'm sorry but none of that resource-eating traffic -- NONE of it -- is real people visiting in real time.

Block amazonaws.com and you'll stop feeding its plague of locusts.

5:14 pm on Jan 21, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


Not sure how this fits into the Amazon scenario:

IP: 216.113.169.nnn
Standard Headers: Accept All only
UA: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2a1pre) Gecko/20090402 Firefox/3.6a1pre (.NET CLR 3.5.30729)
Referer: http:// www. google. com
HTTP_CONNECTION: Keep-Alive
Robots: haven't checked

The hit (home page of one site only) was trapped on several points that usually indicate an exploit attempt.

eBay, Inc
OrgID: EBAY
NetRange: 216.113.160.0 - 216.113.191.255

8:13 pm on Jan 21, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Hmm. I may be missing something but I don't see a connection between your info and amazonaws.com or anything Amazon.

If you suspect a cloaked/rogue spider, you could start a new thread with more details about the visit. Alternatively, It could've just been a zombied machine belonging to an eBay employee.

9:48 pm on Jan 21, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


Hmm. I'll just go and bury my head in a bucket, then, shall I?

Not being a user of either, I totally confused Amazon and Ebay. Sorry. :(

2:15 am on Jan 22, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1



Alas, because amazonaws.com is a never-ending source of new bad bots, I suspect this thread -- and my many bot-sightings in it -- will continue (...ad nauseam, sorry:)
8:55 pm on Jan 26, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts:728
votes: 0


Just a note about a bot identifying itself as PostRank. Somebody above mentioned that it made one HEAD request and then left.

However, I've just seen that bot (same AWS IP range) request /wp-admin/install.php. I can't see any good reason for a bot to want that...

-- b

10:55 pm on Jan 26, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


I see a lot of php accesses through folders such as admin. All, as far as I've seen, are attempts to gain access to the server through faults (eg no password) in admin whatever files, often for phpmyadmin or other control panels.

As I noted elswhere (I think this thread): I have seen AWS used either directly (via logged on account) or indirectly (via infected servers) for "standard botnet" exploit attempts.

2:06 am on Jan 27, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6163
votes: 284


The enormous number of "php" requests via the discussed ip range(s) is why I've been intrigued, and why I finally nuked it a week or so back. I could have approached it the other way around via referer and accomplished same thing, but life is too short and there's too many rogue bots on that service.
2:25 am on Jan 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-243-212.compute-1.amazonaws.com
curl/7.18.2 (i486-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18

robots.txt? NO

9:09 pm on Feb 2, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-204-236-242-36.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes

[Above info also posted as new thread.]

Curiously, per WHOIS, the bot-runner's site appears to be hosted by Amazonaws.com:

seoprofiler.com => 174.129.8.145 => Amazon.com/amazonaws.com
[Amazon Web Services, Elastic Compute Cloud, EC2]

Interesting how a company can claim a dynamically assigned IP as its permanent address...
7:13 pm on Feb 7, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:June 25, 2005
posts:179
votes: 1


New: AMAZON-EC2-7 = 184.72.0.0/15
8:21 pm on Feb 7, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


Well spotted! My very first ban of the 184 range. :)
10:57 am on Feb 11, 2010 (gmt 0)

New User

5+ Year Member

joined:Mar 14, 2009
posts:14
votes: 0


These are the CIDR blocks I have for amazonaws's bad bots at this point. Does this seem to cover all of them or are there more?

67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots
75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots
79.125.0.0/18 # "do not delete" - amazonaws.com's bad bots - Ireland
174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots
184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots
204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots

Thomas
1:11 pm on Feb 11, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:June 25, 2005
posts:179
votes: 1


+
216.182.224.0/20
72.44.32.0/19
7:06 pm on Feb 11, 2010 (gmt 0)

New User

5+ Year Member

joined:Jan 27, 2009
posts:9
votes: 0


My laundry list of ec2 trash includes the following.

deny from 67.202.0.0/18 "Amazon ec2-Cloud"
deny from 72.44.32.0/19 "Amazon ec2-Cloud"
deny from 75.101.128.0/17 "Amazon ec2-Cloud"
deny from 79.125.0.0/18 "Amazon ec2-Cloud"
deny from 174.129.0.0/16 "Amazon ec2-Cloud"
deny from 184.72.0.0/15 "Amazon ec2-Cloud"
deny from 204.74.108.0/24 "Amazon ec2-Cloud"
deny from 204.236.128.0/17 "Amazon ec2-Cloud"
deny from 204.74.108.0/24 "Amazon ec2-Cloud"

I have seen virtually every real and fake user agent included and NEVER once have I been able to figure out why I should allow them to index my site. I spoke with Amazon Services about a year ago and explained to them why I am blocking them and requested that they force a user-ID tag of some sort to identify the user for abuse reasons and they explained to me that this will never happen....and so I stated that's too bad and your services will always be blocked as well. I honestly do not think they care and do not understand that they are only enabling web scrapers, spammers and other trash to ruin their integrity. Again I do not think they care so long as they are getting paid.
8:04 pm on Feb 11, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5820
votes: 64


CLOUD: Creepy Litigious Outrageous User-agent Dwelling
8:30 pm on Feb 11, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


I've got the Ireland one 79.125.0.0/127

204.74.108.0/24 (which you list twice) resolves here to:
UltraDNS Corp ULTRADNS-GLOBAL-2
204.74.96.0 - 204.74.108.255
108 seems to be mostly unused apart from loads of name servers on 1.
4:44 pm on Feb 14, 2010 (gmt 0)

New User

5+ Year Member

joined:Mar 14, 2009
posts:14
votes: 0


Thanks thetrasher for those 2 and dstiles the Ireland one was 79.125.0.0/17 not 18 like I had. I assumed you meant 17 not 127?

67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots
72.44.32.0/19 # "do not delete" - amazonaws.com's bad bots
75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots
79.125.0.0/17 # "do not delete" - amazonaws.com's bad bots - Ireland
174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots
184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots
204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots
216.182.224.0/20 # "do not delete" - amazonaws.com's bad bots

That's a lot of IP addresses but I can not think of a single reason not to block them all.

Thomas
5:27 pm on Feb 14, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


> I assumed you meant 17 not 127?

Sorry. It had been a long day. :)
6:06 pm on Feb 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-245-135.compute-1.amazonaws.com
Twisted PageGetter

robots.txt? NO

See also: Twisted PageGetter [webmasterworld.com] (09/2009)
3:24 am on Feb 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1667
votes: 36


CLOUD: Creepy Litigious Outrageous User-agent Dwelling


CLOUD: 403
1:00 am on Feb 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-174-129-153-217.compute-1.amazonaws.com
HTMLParser/2.0

robots.txt? NO
1:15 am on Feb 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-174-129-167-253.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728)

robots.txt? NO

What bothers me about this one is not just its non-bot UA, but that its hits only went to two html files and some of their graphics files, and knew just where to look before arriving. I've blocked amazonaws.com for over a year -- basically from the first day I saw it -- so the directory paths didn't come from my server. Hmm.
This 278 message thread spans 10 pages: 278