Welcome to WebmasterWorld Guest from 34.204.173.45

Forum Moderators: Ocean10000

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

     
3:04 am on Jan 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2065
votes: 2


ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

12:03 am on June 14, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


@ dstiles Just a FYI concerning the IP rage config: 174.129/16

That does not work on all unix/apache set-ups. Out of the 3 hosted servers I use, 2 I must write it like this: 174.129.0.0/16.

Just though I'd post this for those who mistakenly cut'n paste without doing their research.

And agreed, good to see you around again, pfui.
6:39 am on June 14, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10481
votes: 1100


pfui! Dear Heart, happy to see you posting again! Missed you.
1:30 pm on June 14, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5507
votes: 5


keyplr,
Regarding "174.129.0.0/16" ?

FWIW, all that trailing crap is not necessary for entire Class Groups (in this example a Class B).

174.129. will function the same as 174.129.0.0/16
9:19 pm on June 14, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


I was not saying what the "code" should be, keyplyr, as I don't use .htaccess. I was just reporting the IP class. My own system requires a full range - eg 174.129.0.0 - 174.129.255.255. Tedious but not as bad as you might suppose. One day I may automate data entry to deal with /16 and such. :)
7:22 am on June 15, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


174.129. will function the same as 174.129.0.0/16 - wilderness

Thanks, I'm aware of that. The reason I write the full code is so I know the breadth of the block, especially when the host later splits the range to include new companies. Not so obvious in the above examples, but in more specific blocks very helpful.
2:07 am on July 17, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


ec2-174-129-79-111.compute-1.amazonaws.com
HTTP_Request2/2.0.0RC1 (http://pear.php.net/package/http_request2) PHP/5.3.2-1ubuntu4.9

robots.txt? NO
2:21 am on July 17, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


ec2-174-129-171-219.compute-1.amazonaws.com
PostPost/1.0 (+http://postpo.st/crawlers)

robots.txt? Yes

Cutesy URL TLD .st = Saint Vincent and the Grenadines. Twitter-traveler. More info in "PostPost" thread.
8:39 pm on July 17, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


And now, from the "Twitter-Swarmer Still Won't Take "No" for an Answer" department:

In 25 seconds, the following three hosts used two different-version UAs to hit the same file four times (& botbait three times) via GET and HEAD and all despite 302s and 403s:

ec2-50-16-177-215.compute-1.amazonaws.com
ec2-107-20-14-9.compute-1.amazonaws.com
ec2-174-129-57-129.compute-1.amazonaws.com

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; +http://flipboard.com/browserproxy)
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)

Oh, almost forgot...

All that came after successfully getting -- and getting fully Disallowed in -- robots.txt in the first second. Sheesh, if you ain't gonna heed it, why bother to read it?

FWIW, I chide flipboard.com and AWS for such shoddy, erm, webmanship. (Read: "What jerks.")

See also, from Aug. 17th of last year: "flipboard" [webmasterworld.com...]
9:05 pm on July 17, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


Repeat offender Twitter-swarmer.

ec2-184-73-218-115.compute-1.amazonaws.com
Strawberryj.am

robots.txt? NO

Another cutesy, and poorly coded, UA. TLD .am = Armenia. See also "Strawberryj.am" thread.
5:00 pm on July 21, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


New Amazon range to block, first seen today:

107.20.0.0 - 107.23.255.255

Initial access on 107.20.15.169 with no UA. A good start to a new IP range.

Neither my Linux Network Tools Whois nor robtex could resolve the IP range, although it was registered in March this year. Nice to know the internet and its tools are up to date. :(

Arin's DNS for the range says the following, for anyone wanting to report Amazon's many badly-behaved bots...

---------------------
The activity you have detected originates from a dynamic hosting environment.
For fastest response, please submit abuse reports at [aws-portal.amazon.com...]
For more information regarding EC2 see:
[ec2.amazonaws.com...]
All reports MUST include:
* src IP
* dest IP (your IP)
* dest port
* Accurate date/timestamp and timezone of activity
* Intensity/frequency (short log extracts)
* Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
---------------------
8:38 am on July 22, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Thanks dstiles
3:04 am on July 26, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


ec2-184-72-178-50.compute-1.amazonaws.com
Mozilla/5.0 (compatible; Bender; http://benderthewebrobot.tumblr.com)
robots.txt? Yes

See also: Bender the web crawler [webmasterworld.com...]
8:03 pm on July 26, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


I wouldn't ever see that one. 184.72/15 is blocked in the IIS "firewall".
4:20 am on July 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


ec2-46-137-129-41.eu-west-1.compute.amazonaws.com
Zend_Http_Client
robots.txt? NO

See also just-started "Zend_Http_Client" thread.
9:33 am on July 28, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


RE: ec2-46-137-129-41.eu-west-1.compute.amazonaws.com

I've had 46.137.0.0/16 blocked but have been unsure if there's a more specific range. Can't find much info. Anyone?

See also just-started "Zend_Http_Client" thread.

Looked, didn't find it.
2:09 pm on July 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


I started a separate "Zend_Http_Client" [webmasterworld.com...] thread right before I posted in this one. As of this writing, the standalone's still pending mod approval.
8:47 pm on July 28, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


keyplr - I blocked the whole /16 - it's AWS in Ireland.
1:20 am on July 29, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Thanks dstiles
9:22 pm on July 29, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


From the zdnet security blog today...

"Amazon's cloud services systematically exploited by cybercriminals

"Security researchers from Kaspersky Labs have spotted yet another SpyEye crimeware variant using Amazonís Simple Storage Service (Amazon S3) for command and control purposes.

"...Does crimeware in the cloud have a future? Most certainly..."
6:14 pm on Aug 5, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


I don't know which is worse, that this UA is fake, or real...

ec2-184-73-173-106.compute-1.amazonaws.com
ia_archiver

So much for reading and heeding robots.txt...

08/05 nn:34:50 /robots.txt 200
08/05 nn:34:50 /sitemap.xml 403
08/05 nn:34:50 /sitemap_index.xml 403
08/05 nn:34:51 /sitemap.xml.gz 403
08/05 nn:34:51 /sitemap_index.xml.gz 403
08/05 nn:34:52 /sitemap.txt 403
08/05 nn:34:52 /sitemap.rss 403
08/05 nn:34:53 /sitemap.atom 403
08/05 nn:34:53 / 403

Jerks.
9:56 pm on Aug 5, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Several new IP Ranges are listed on the AWS blog itself: [forums.aws.amazon.com...]
11:40 pm on Aug 5, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


1.) For folks denying the IP cesspool that is AWS, I hope you're also using --

deny from amazonaws

-- and/or mod_rewriting amazonaws because AWS does not include ALL of their 'current public address ranges' on keyplyr's handy link. For example:

ec2-50-17-0-111.compute-1.amazonaws.com
a.k.a. Comment Spammer 50.17.0.111 [projecthoneypot.org...]
a.k.a. AWS Net Range 50.16.0.0 - 50.19.255.255

If you don't do reverse IP lookups...

2.) The following list -- numerical, so I can easily eyeball entries in.htaccess -- consists of the IPs in AWS's 07-29-11 geographically-arrayed announcement, minus 50.17.0.0/16 (& unknown others of AWS's gazillion IPs). You might want to add 50.17.0.0/16, or even better, consolidate all the 50s into:

deny from 50.16.0.0/14

## 07-29-11: Amazon EC2 Public IP Ranges
## [forums.aws.amazon.com...]
deny from 46.51.128.0/18
deny from 46.51.192.0/20
deny from 46.51.216.0/21
deny from 46.51.224.0/19
deny from 46.137.0.0/17
deny from 46.137.128.0/18
deny from 46.137.224.0/19
deny from 50.16.0.0/15
deny from 50.18.0.0/16
deny from 50.19.0.0/16
deny from 67.202.0.0/18
deny from 72.44.32.0/19
deny from 75.101.128.0/17
deny from 79.125.0.0/17
deny from 103.4.8.0/21
deny from 107.20.0.0/15
deny from 122.248.192.0/18
deny from 174.129.0.0/16
deny from 175.41.128.0/18
deny from 175.41.192.0/18
deny from 176.32.64.0/19
deny from 176.34.128.0/17
deny from 184.72.0.0/18
deny from 184.72.64.0/18
deny from 184.72.128.0/17
deny from 184.73.0.0/16
deny from 204.236.128.0/18
deny from 204.236.192.0/18
deny from 216.182.224.0/20
##
9:52 pm on Aug 6, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


I have a slightly different list which excluded 103 (only recently released for use) and 176, both of which now added - many thanks!

Total list, including ranges shown in DNS as Amazon (not AWS)...

8.18.144.0 - 8.18.145.255
46.51.128.0 - 46.51.255.255
46.137.0.0 - 46.137.255.255
50.16.0.0 - 50.19.255.255
67.202.0.0 - 67.202.63.255
72.44.32.0 - 72.44.63.255
75.101.128.0 - 75.101.255.255
79.125.0.0 - 79.125.127.255
87.238.80.0 - 87.238.87.255
103.4.8.0 - 103.4.15.255
107.20.0.0 - 107.23.255.255
122.248.192.0 - 122.248.255.255
174.129.0.0 - 174.129.255.255
175.41.128.0 - 175.41.255.255
176.32.64.0 - 176.32.127.255
176.34.128.0 - 176.34.255.255
184.72.0.0 - 184.73.255.255
199.255.192.0 - 199.255.195.255
204.236.128.0 - 204.236.255.255
207.171.160.0 - 207.171.191.255
216.182.224.0 - 216.182.239.255
3:30 am on Aug 13, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


Among other not-okay things, note the 400 error (bad request/syntax):

ec2-204-236-194-99.compute-1.amazonaws.com - - [1n/Aug/2011:12:34:56 -0700] "HEAD HTTP/1.1" 400 - "-" "-"
4:44 pm on Aug 18, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


ec2-184-73-8-96.compute-1.amazonaws.com
AlexionResearchBot/Nutch-1.3

robots.txt? Yes
11:48 pm on Aug 19, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


Twitter swarmer/reader/whatever.

ec2-175-41-196-238.ap-northeast-1.compute.amazonaws.com
Crowsnest/0.5 (+http://www.crowsnest.tv/)

robots.txt? NO

See also: Crowsnest [webmasterworld.com...]
2:36 pm on Aug 24, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3282
votes: 19


Another IP range detected today...

72.21.192.0 - 72.21.223.255

The IP 72.21.217.n was used as a proxy for a UA of MSIE-6, which is in itself highly deprecated. Headers were consistent with either a battened-down proxy or a bot.

The Forwarded-For IP was 208.53.158.nnn which is an IP belonging to FDC Servers - already banned because it... um... er... servers?

208.53.128.0 - 208.53.191.255 (and others)
7:31 pm on Aug 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


A quick scan of my notes for that 72.21.217. shows that last June, when Amazon-owned IMDb confirmed a site-related entry, this combo hit root:

72.21.217.0
Mozilla/4.0

Then in July, another confirmation, and a neighboring IP plus a bad UA --

72.21.217.64
libwww-perl/5.805

(Problem is, I never know what Amazon/IMDb/AWS is going to send, or when. So either I leave the door wide open all the time to anything they want and anything they use, or -- not.)
9:06 am on Sept 9, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2065
votes: 2


FWIW... Hit 20 minutes apart, faking two really, really old UAs:

ec2-46-137-71-213.eu-west-1.compute.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1

ec2-75-101-129-73.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.8.1) Gecko/20061010 Firefox/2.0

No robots.txt, of course.
11:27 pm on Sept 15, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


204.236.143.78 percbotspider

rDNS: ec2-204-236-143-78.us-west-1.compute.amazonaws.com
204.236.128.0 to 204.236.255.255
204.236.128.0/17

robots.txt: no
This 278 message thread spans 10 pages: 278