Forum Moderators: open
- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:
<html>
<head>
</head>
<body>
</body>
</html>
----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:
NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO
feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO
Twitturly / v0.5
robots.txt? NO
YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO
YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes
Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO
PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES
EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO
TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO
Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO
Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES
yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO
Mozilla/5.0
robots.txt? NO
Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES
TinEye
robots.txt? NO
Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES
nnn/ttt (n)
robots.txt? YES
AideRSS/1.0 (aiderss.com)
robots.txt? NO
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO
----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO
WebClient
robots.txt? YES
----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:
Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.
[edited by: Pfui at 2:45 am (utc) on Nov 9, 2010]
Dear Amazon EC2 customer,
We are pleased to announce that as part of our ongoing expansion, we have added a new public IP range. The current Amazon EC2 public address ranges are:
US East (Northern Virginia):
216.182.224.0/20 (216.182.224.0 - 216.182.239.255)
72.44.32.0/19 (72.44.32.0 - 72.44.63.255)
67.202.0.0/18 (67.202.0.0 - 67.202.63.255)
75.101.128.0/17 (75.101.128.0 - 75.101.255.255)
174.129.0.0/16 (174.129.0.0 - 174.129.255.255)
204.236.192.0/18 (204.236.192.0 - 204.236.255.255)
184.73.0.0/16 (184.73.0.0 – 184.73.255.255)
184.72.128.0/17 (184.72.128.0 - 184.72.255.255)
184.72.64.0/18 (184.72.64.0 - 184.72.127.255)
50.16.0.0/15 (50.16.0.0 - 50.17.255.255)
50.19.0.0/16 (50.19.0.0 - 50.19.255.255)
US West (Northern California):
204.236.128.0/18 (216.236.128.0 - 216.236.191.255)
184.72.0.0/18 (184.72.0.0 – 184.72.63.255)
50.18.0.0/17 (50.18.0.0 - 50.18.127.255)
EU (Ireland):
79.125.0.0/17 (79.125.0.0 - 79.125.127.255)
46.51.128.0/18 (46.51.128.0 - 46.51.191.255)
46.51.192.0/20 (46.51.192.0 - 46.51.207.255)
46.137.0.0/17 (46.137.0.0 - 46.137.127.255)
Asia Pacific (Singapore)
175.41.128.0/18 (175.41.128.0 - 175.41.191.255)
122.248.192.0/18 (122.248.192.0 - 122.248.255.255)
Asia Pacific (Tokyo)
175.41.192.0/18 (175.41.192.0 - 175.41.255.255)
46.51.224.0/19 (46.51.224.0 - 46.51.255.254)NEW