Welcome to WebmasterWorld Guest from 54.145.144.101

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

   
3:04 am on Jan 18, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

9:11 am on Sep 21, 2011 (gmt 0)

5+ Year Member



dstiles thanks for your list. You wouldn't happen to have that list ready to go with cidr ranges by any chance? :)

<added>
ok, i worked it out, i have:</added>
8.18.144.0/23
46.51.128.0/17
46.137.0.0/16
50.16.0.0/14
67.202.0.0/18
72.21.192.0/19
72.44.32.0/19
75.101.128.0/17
79.125.0.0/17
87.238.80.0/21
103.4.8.0/21
107.20.0.0/14
122.248.192.0/18
174.129.192.0/18
175.41.128.0/17
176.32.64.0/18
176.34.128.0/17
184.72.0.0/15
199.255.192.0/22
204.236.128.0/17
207.171.128.0/18
216.182.224.0/20
6:19 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Speaking of 107.20.0.0/14 [107.20.0.0 - 107.23.255.255] --

I can't recall ever seeing a visit from anybody where the UA was nothing, nada, zip at the server log level. Usually, Apache (pre-v2) inserts a hyphen when the field's empty.

Leave it to one of amazonaws's slimier denizens to get around that in the last set of quotes:

ec2-107-20-87-100.compute-1.amazonaws.com - - [00/Sep/2011:00:00:00 -0n00] "GET /dir/filename.html HTTP/1.1" 403 1453 "-" ""
9:29 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Sorry, Santapaws, my database runs on ip-low to ip-high, not cidr. When I quote cidr I have to either work it out or quote directly from a DNS report.
12:09 am on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



One of the niftiest geekiest free services ever: http://ip2cidr.com/
12:48 am on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It's mirror is also good for faster entries:

http://www.ip2cidr.info/convert_ip_to_cidr.htm
8:21 pm on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I run Internet Protocol Calculator on Ubuntu. Even faster. :)
8:53 pm on Sep 22, 2011 (gmt 0)

5+ Year Member



I've had this nifty, free tool installed on my computer for years:

[kgsoft.com...]

Screenshot [i1-win.softpedia-static.com]
10:30 pm on Sep 23, 2011 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



OK, this thread is too long in the tooth, time to start a new one
This 278 message thread spans 10 pages: 278
 

Featured Threads

Hot Threads This Week

Hot Threads This Month