Forum Moderators: open
- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:
<html>
<head>
</head>
<body>
</body>
</html>
----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:
NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO
feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO
Twitturly / v0.5
robots.txt? NO
YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO
YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes
Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO
PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES
EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO
TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO
Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO
Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES
yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO
Mozilla/5.0
robots.txt? NO
Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES
TinEye
robots.txt? NO
Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES
nnn/ttt (n)
robots.txt? YES
AideRSS/1.0 (aiderss.com)
robots.txt? NO
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO
----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO
WebClient
robots.txt? YES
----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:
Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.
IP range-wise, the IPs are 'in' the Host names.* Now as to how many there are, let alone what they are, I'm sorry but I'll have to leave that compilation as a sweat equity exercise for the bot-curious/obsessed at this time. Suffice it to say that akin to any country -- and numbering more than many countries'! -- Amazon's cloud-related IPs are neither contiguous nor non-expanding.
.
*The second post in the MetaURI [webmasterworld.com] thread shows more detail, including an atypical example of the same UA using the exact same AWS IP over a period of time.
ec2-174-129-120-104.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)
robots.txt? NO
URI: /favicon.ico
Related: Yahoo's cloaked crawler(s) [webmasterworld.com]
I see amazonaws.com in many of my sites referral logs with lots of visits week after week and month after month.
Are they owned by amazon.com? Why do they want to visit my sites in the first place? How do they know about my url's? Sometimes they visit even before anyone else does or the site has time to get listed in Google.
How would they even know my url was just put online? Sometimes they are my #1 traffic source with both newer and older sites. How and why is this happening? Who are they?
amazonaws.com plays host to wide variety of bad bots
That statement was true exactly one year+three days ago when I started this thread. Now, 125-plus posts later, and scores and scores and scores of bad, rude, iffy, test, and/or worthless bot hits later, that statement's an understatement.
So if your main traffic source is amazonaws.com-related, I'm sorry but none of that resource-eating traffic -- NONE of it -- is real people visiting in real time.
Block amazonaws.com and you'll stop feeding its plague of locusts.
IP: 216.113.169.nnn
Standard Headers: Accept All only
UA: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2a1pre) Gecko/20090402 Firefox/3.6a1pre (.NET CLR 3.5.30729)
Referer: http:// www. google. com
HTTP_CONNECTION: Keep-Alive
Robots: haven't checked
The hit (home page of one site only) was trapped on several points that usually indicate an exploit attempt.
eBay, Inc
OrgID: EBAY
NetRange: 216.113.160.0 - 216.113.191.255
If you suspect a cloaked/rogue spider, you could start a new thread with more details about the visit. Alternatively, It could've just been a zombied machine belonging to an eBay employee.
As I noted elswhere (I think this thread): I have seen AWS used either directly (via logged on account) or indirectly (via infected servers) for "standard botnet" exploit attempts.