Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

   
3:04 am on Jan 18, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

11:24 am on Jul 11, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Here is some other info, not sure if it was posted before, but I see lots of ips from amazonaws used as tor proxy servers. These maybe transparent proxies serving spam/scrap worldwide.

URL
www DOT torproxylist DOT com
without spaces and real dots instead of DOT.

5:12 pm on Jul 11, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



enigma, I'm confused. Do you have a log entry you could post, please? TIA
7:31 pm on Jul 11, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I'm a Tor [en.wikipedia.org] noob but a quick search of related info yielded 14 publicly accessible amazonaws.com Hosts/IPs running open proxies, arguably in violation of the AWS Customer Agreement, e.g., section 5.4.5. Network:

You may not operate network services such as:
Open proxies.

(etc.)

Two of the following IPs, the 79s, map to --

ec2-[yada-yada].eu-west-1.compute.amazonaws.com

-- and the remainder to this thread's (in)famous:

ec2-[yada-yada].compute-1.amazonaws.com

67.202.11.nnn
67.202.30.nn
67.202.44.nnn
67.202.47.nn
67.202.37.nnn
75.101.155.nnn
75.101.201.nn
79.125.50.nn
79.125.60.nn
174.129.110.nnn
174.129.140.nnn
174.129.156.nnn
174.129.145.nnn
174.129.210.nnn

11:46 am on Jul 12, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Pfui, If you check the list on that site I posted, you will see there quite a few domains that belong to amazonaws (as well as on various hosts and isps).

So basically someone runs the tor on his system or server and provides a portal to others. Now your server and my server all they see is the ip of the portal/proxy with no indication of anything else as these are transparent.

I just caught one doing it because it used the standard http ports, so when I scanned port 80 it did respond. When I searched some info about the particular ip I found that site with the tor list. And among them lists serveral amazonaws ips.

7:35 pm on Jul 31, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



And the bots go on and on and on and on. Multiple-file request sessions x2 days on backwater sites. Per usual, undeterred by 403s, ditto 404s, even 301s (to 127.0.0.1:)

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3

07/30 04:18:09 /
07/30 04:18:28 /
07/30 04:19:03 /m/
07/30 04:19:05 /mobile/
07/30 04:19:06 /mobi/
07/30 04:19:06 /iphone/
07/30 04:19:09 /pda/
07/30 04:19:25 /m/
07/30 04:19:28 /mobile/
07/30 04:19:32 /mobi/
07/30 04:19:33 /iphone/
07/30 04:19:33 /pda/

[edited by: Pfui at 7:46 pm (utc) on July 31, 2009]

7:46 pm on Jul 31, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Another example of the kind of activity that ticks me off no matter who or what is doing it. I don't mind reddit per se. I DO mind no robots.txt then 20 requests to the exact same file. All 403'd to no avail, per usual w/ amazonaws.

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (compatible; redditbot/1.0; +http://www.reddit.com/feedback)

07/27 09:59:07
07/27 09:59:09
07/27 10:00:11
07/27 10:00:12
07/27 10:01:06
07/27 10:01:08
07/27 10:02:08
07/27 10:02:09
07/27 10:03:08
07/27 10:03:09
07/27 10:04:07
07/27 10:04:08
07/27 10:05:08
07/27 10:05:09
07/27 10:06:11
07/27 10:06:12
07/27 10:07:13
07/27 10:07:14
07/27 10:08:08
07/27 10:08:10

8:06 pm on Jul 31, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



And one more, sadly inevitable what with so many AWS servers in play...

Here's a zombied [en.wikipedia.org] amazonaws.com machine that was part of a small spam-botnet with Chinese fellow travelers:

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:17

121.28.7.nnn
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:20

210.52.58.nn
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:25

(Botnets adore that UA, so much so that I 403 it from the get-go.)

12:58 am on Aug 4, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada]-159.compute-1.amazonaws.com
OMGCrawler 1.0

robots.txt? YES

3:21 am on Aug 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OMGCrawler visited me on the 3rd too. I kicked it out cause it's from AWS.
6:56 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)

robots.txt? YES

See also the GingerCrawler thread: GingerCrawler/1.0 [webmasterworld.com]

5:27 am on Aug 17, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada]-98.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727;)

robots.txt? NO

Note the misconfigured UA 'ending.'

9:23 am on Aug 28, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
Acquia Crawler

robots.txt? Yes BUT... Three minutes after home page grab.

This just in (from -0700)... 403s to all files but robots.txt do not dissuade this new pest hailing from multiple AWS hosts:

08/28 00:40:22 /
08/28 00:43:43 /robots.txt
08/28 01:30:59 /
08/28 01:34:06 /robots.txt
08/28 01:51:42 /

3:57 am on Sep 15, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
LWP::Simple/5.808

robots.txt? NO

Twitter-related.

[edited by: Pfui at 4:03 am (utc) on Sep. 15, 2009]

3:59 am on Sep 15, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
bitlybot

robots.txt? NO

Twitter-related.

4:00 am on Sep 15, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
PycURL/7.18.2

robots.txt? NO

Twitter-related.

4:48 pm on Sep 21, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-[yada-yada].compute-1.amazonaws.com
LargeSmall Crawler (LargeSmall; [onespot.com;...] info@onespot.com)

robots.txt? Yes

11:59 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



LargeSmall [webmasterworld.com] is a pest but this kind of activity is just asinine --

ec2-174-129-236-193.compute-1.amazonaws.com
larbin_2.6.3 (larbin2.6.3@unspecified.mail)

09/24 13:56:58 /robots.txt
09/24 14:00:55 /robots.txt
09/24 14:03:55 /robots.txt
09/24 14:09:55 /robots.txt
09/24 14:13:22 /robots.txt
09/24 14:23:35 /robots.txt
09/24 14:50:42 /robots.txt
09/24 15:01:36 /robots.txt
09/24 15:08:40 /robots.txt
09/24 15:12:41 /robots.txt

O, if only I had a nickel for every useless, log-filling hit from amazonaws.com!

3:01 pm on Sep 27, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



This time bitlybot requested robots.txt -- and just as promptly ignored it.

ec2-174-129-227-79.compute-1.amazonaws.com
bitlybot
09/27 04:49:16/robots.txt
09/27 04:49:17/dir/filename.html

2:14 pm on Oct 1, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-75-101-138-11.compute-1.amazonaws.com
mefashpesh (pishpush.com)

robots.txt? NO

4:44 pm on Oct 1, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-174-129-158-130.compute-1.amazonaws.com
taptubot *** please read [taptu.com...] ***

robots.txt? Yes

4:49 pm on Oct 4, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Firefox/1.6a1? Oh, please.

ec2-67-202-51-187.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.2 x64; en-US; rv:1.9a1) Gecko/20060214 Firefox/1.6a1

robots.txt? No

9:23 am on Oct 6, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



UA: MetaURI API +metauri.com
rDNS: ec2-75-101-232-27.compute-1.amazonaws.com. [Verified]
robots.txt: No
2:41 am on Oct 23, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



ec2-75-101-221-99.compute-1.amazonaws.com
ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

robots.txt? Yes BUT -- ignored it.

Last Feb. (up-thread; mssg.#: 3848081), the preceding UA was A-OK w/ robots.txt. No longer, at least not when run by amazonaws.com.

Still fully compliant when run from archive.org using this one:

ia310738.us.archive.org
ia_archiver-web.archive.org

9:28 pm on Oct 23, 2009 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Why do you think this is genuine ia_archiver? I don't accept the UA anyway but could it be fake?
10:07 pm on Oct 23, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I just report 'em as I see 'em emerge from amazonaws.com's cloud cover. I don't know if they're fake or not.

Given the current "google.com -- spoof? spider? botnet zombie? employee? [webmasterworld.com]" mystery sightings, I guess everything could be fake.

5:34 pm on Oct 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[api.samepoint.com;...] admin@samepoint.com
174.129.119.nnn
ec2-174-129-119-nnn.compute-1.amazonaws.com
-----
Address: Amazon Web Services, Elastic Compute Cloud, EC2
NetRange: 174.129.0.0 - 174.129.255.255
-----
ROBOTS.TXT? No
-----

Took the default root page and one xml file then left.

5:37 pm on Oct 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PostRank/2.0 (postrank.com)
174.129.141.nnn
ec2-174-129-141-nnn.compute-1.amazonaws.com
-----
Address: Amazon Web Services, Elastic Compute Cloud, EC2
NetRange: 174.129.0.0 - 174.129.255.255
-----
ROBOTS.TXT? No
-----

Did one HEAD request and left.

4:05 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have per page set to the Max allowed.

A Proposed WebmasterWorld Feature, for this thread particulary, if I click on #3 from the main nav menu from the main treads link list in this part of universe, take me to the last post on page 3 minus 1.

amazonaws is tracked and 403d as always here.

7:25 pm on Nov 6, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Another botlike smackdown from amazonaws. Atypical pattern for most bots -- single files hit twice -- but clearly going off a hit list because not all files in /dir were hit (ditto any of thousands of files on the site). Most of the hit pages had been Twitter mentions/tweeted, but not all.

ec2-174-129-193-62.compute-1.amazonaws.com
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.5) Gecko/2008120121 Firefox/3.0.5

robots.txt? NO

18:45:31 /dir/file07.html
18:45:32 /dir/file07.html
18:45:33 /dir/file01.html
18:45:34 /dir/file01.html
18:45:36 /dir/file06.html
18:45:36 /dir/file06.html
18:45:38 /dir/file04.html
18:45:38 /dir/file04.html
18:45:39 /dir/file02.html
18:45:40 /dir/file02.html
18:45:41 /dir/file05.html
18:45:42 /dir/file05.html
18:45:43 /dir/file03.html
18:45:44 /dir/file03.html
18:45:45 /dir/file09.html
18:45:46 /dir/file09.html
18:45:48 /dir/file08.html
18:45:48 /dir/file08.html
18:45:50 /dir/file10.html
18:45:51 /dir/file10.html

FWIW: Alleged UA is old; Mac FF is currently 3.5.5.

2:35 am on Nov 11, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



(Emphasis mine...)

ec2-174-129-58-178.compute-1.amazonaws.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10

robots.txt? NO
Fake ref? YES: http://www.google.com/search?q=sitename.com/

Aside:

UAs with that User-Agent: intro swarmed out of nowhere about a year ago, as I recall. Used to see multiple scores a day; now maybe once or twice, tops. (Never did figure out who/what miscoded the string and made its hits so easy to send packing.) UAs ran the gamut. Here's a very partial listing:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.2.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; InfoPath.2)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)

This 278 message thread spans 10 pages: 278
 

Featured Threads

Hot Threads This Week

Hot Threads This Month