Welcome to WebmasterWorld Guest from 50.19.190.144

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

amazonaws.com plays host to wide variety of bad bots

Most recently seen: Gnomit

     
3:04 am on Jan 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-67-202-57-30.compute-1.amazonaws.com
Mozilla/5.0 (compatible; X11; U; Linux i686 (x86_64); en-US; +http://gnomit.com/) Gecko/2008092416 Gnomit/1.0"

- robots.txt? NO
- Uneven apostrophes in UA (only closing)
- site in UA yields this oh-so-descriptive info:

<html>
<head>
</head>
<body>
</body>
</html>

----- ----- ----- ----- -----
FWIW, bona fide amazonaws.com hosts spewed at least 33 bots on two of my sites in recent months. (Does someone get paid per bot or something?) Some bots may be new to some of you; or newly renamed. Here are the actual UA strings; in no particular order:

NetSeer/Nutch-0.9 (NetSeer Crawler; [netseer.com;...] crawler@netseer.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
[Note ru.]
robots.txt? NO

feedfinder/1.371 Python-urllib/1.16 +http://www.aaronsw.com/2002/feedfinder/
robots.txt? NO

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
robots.txt? NO

Twitturly / v0.5
robots.txt? NO

YebolBot (compatible; Mozilla/5.0; MSIE 7.0; Windows NT 6.0; rv:1.8.1.11; mailTo:thunder.chang@gmail.com)
robots.txt? NO

YebolBot (Email: yebolbot@gmail.com; If the web crawling affects your web service, or you don't like to be crawled by us, please email us. We'll stop crawling immediately.)
[Whattaya think robots.txt is for, huh?]
robots.txt? YES ... Four times in 45 minutes

Attributor/Dejan-1.0-dev (Test crawler; [attributor.com;...] info at attributor com)
robots.txt? NO

PRCrawler/Nutch-0.9 (data mining development project)
robots.txt? YES

EnaBot/1.2 (http://www.enaball.com/crawler.html)
robots.txt? YES

Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
[Note spaced-out closing parens]
robots.txt? YES

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09
robots.txt? NO

TheRarestParser/0.2a (http://therarestwords.com/)
robots.txt? NO

Mozilla/5.0 (compatible; D1GArabicEngine/1.0; crawlmaster@d1g.com)
robots.txt? NO

Clustera Crawler/Nutch-1.0-dev (Clustera Crawler; [crawler.clustera.com;...] cluster@clustera.com)
robots.txt? YES

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
robots.txt? YES

yacybot (i386 Linux 2.6.16-xenU; java 1.6.0_02; America/en) [yacy.net...]
robots.txt? NO

Mozilla/5.0
robots.txt? NO

Spock Crawler (http://www.spock.com/crawler)
robots.txt? YES

TinEye
robots.txt? NO

Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; [netseer.com...] crawler@netseer.com)
robots.txt? YES

nnn/ttt (n)
robots.txt? YES

AideRSS/1.0 (aiderss.com)
robots.txt? NO

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
robots.txt? NO

----- ----- ----- ----- -----
These two UAs alternated multiple times one afternoon:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
robots.txt? NO

WebClient
robots.txt? YES

----- ----- ----- ----- -----
And finally, way too many offerings from "Paul," who's apparently unable to make up his mind, UA name-wise:

Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com
robots.txt? NO

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)
robots.txt? YES

Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com) [email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]
robots.txt? YES

Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com,crawl@powerset.com]
robots.txt? NO

-----
Slippery little suckers indeed. Thank goodness I block amazonaws.com no matter what.

11:24 am on July 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 30, 2007
posts:1394
votes: 0


Here is some other info, not sure if it was posted before, but I see lots of ips from amazonaws used as tor proxy servers. These maybe transparent proxies serving spam/scrap worldwide.

URL
www DOT torproxylist DOT com
without spaces and real dots instead of DOT.

5:12 pm on July 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


enigma, I'm confused. Do you have a log entry you could post, please? TIA
7:31 pm on July 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


I'm a Tor [en.wikipedia.org] noob but a quick search of related info yielded 14 publicly accessible amazonaws.com Hosts/IPs running open proxies, arguably in violation of the AWS Customer Agreement, e.g., section 5.4.5. Network:

You may not operate network services such as:
Open proxies.

(etc.)

Two of the following IPs, the 79s, map to --

ec2-[yada-yada].eu-west-1.compute.amazonaws.com

-- and the remainder to this thread's (in)famous:

ec2-[yada-yada].compute-1.amazonaws.com

67.202.11.nnn
67.202.30.nn
67.202.44.nnn
67.202.47.nn
67.202.37.nnn
75.101.155.nnn
75.101.201.nn
79.125.50.nn
79.125.60.nn
174.129.110.nnn
174.129.140.nnn
174.129.156.nnn
174.129.145.nnn
174.129.210.nnn

11:46 am on July 12, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 30, 2007
posts:1394
votes: 0


Pfui, If you check the list on that site I posted, you will see there quite a few domains that belong to amazonaws (as well as on various hosts and isps).

So basically someone runs the tor on his system or server and provides a portal to others. Now your server and my server all they see is the ip of the portal/proxy with no indication of anything else as these are transparent.

I just caught one doing it because it used the standard http ports, so when I scanned port 80 it did respond. When I searched some info about the particular ip I found that site with the tor list. And among them lists serveral amazonaws ips.

7:35 pm on July 31, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


And the bots go on and on and on and on. Multiple-file request sessions x2 days on backwater sites. Per usual, undeterred by 403s, ditto 404s, even 301s (to 127.0.0.1:)

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3

07/30 04:18:09 /
07/30 04:18:28 /
07/30 04:19:03 /m/
07/30 04:19:05 /mobile/
07/30 04:19:06 /mobi/
07/30 04:19:06 /iphone/
07/30 04:19:09 /pda/
07/30 04:19:25 /m/
07/30 04:19:28 /mobile/
07/30 04:19:32 /mobi/
07/30 04:19:33 /iphone/
07/30 04:19:33 /pda/

[edited by: Pfui at 7:46 pm (utc) on July 31, 2009]

7:46 pm on July 31, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Another example of the kind of activity that ticks me off no matter who or what is doing it. I don't mind reddit per se. I DO mind no robots.txt then 20 requests to the exact same file. All 403'd to no avail, per usual w/ amazonaws.

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (compatible; redditbot/1.0; +http://www.reddit.com/feedback)

07/27 09:59:07
07/27 09:59:09
07/27 10:00:11
07/27 10:00:12
07/27 10:01:06
07/27 10:01:08
07/27 10:02:08
07/27 10:02:09
07/27 10:03:08
07/27 10:03:09
07/27 10:04:07
07/27 10:04:08
07/27 10:05:08
07/27 10:05:09
07/27 10:06:11
07/27 10:06:12
07/27 10:07:13
07/27 10:07:14
07/27 10:08:08
07/27 10:08:10

8:06 pm on July 31, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


And one more, sadly inevitable what with so many AWS servers in play...

Here's a zombied [en.wikipedia.org] amazonaws.com machine that was part of a small spam-botnet with Chinese fellow travelers:

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:17

121.28.7.nnn
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:20

210.52.58.nn
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
07/31 09:50:25

(Botnets adore that UA, so much so that I 403 it from the get-go.)

12:58 am on Aug 4, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada]-159.compute-1.amazonaws.com
OMGCrawler 1.0

robots.txt? YES

3:21 am on Aug 5, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


OMGCrawler visited me on the 3rd too. I kicked it out cause it's from AWS.
6:56 pm on Aug 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
GingerCrawler/1.0 (Language Assistant for Dyslexics; www.gingersoftware.com/crawler_agent.htm; support at ginger software dot com)

robots.txt? YES

See also the GingerCrawler thread: GingerCrawler/1.0 [webmasterworld.com]

5:27 am on Aug 17, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada]-98.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727;)

robots.txt? NO

Note the misconfigured UA 'ending.'

9:23 am on Aug 28, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
Acquia Crawler

robots.txt? Yes BUT... Three minutes after home page grab.

This just in (from -0700)... 403s to all files but robots.txt do not dissuade this new pest hailing from multiple AWS hosts:

08/28 00:40:22 /
08/28 00:43:43 /robots.txt
08/28 01:30:59 /
08/28 01:34:06 /robots.txt
08/28 01:51:42 /

3:57 am on Sept 15, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
LWP::Simple/5.808

robots.txt? NO

Twitter-related.

[edited by: Pfui at 4:03 am (utc) on Sep. 15, 2009]

3:59 am on Sept 15, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
bitlybot

robots.txt? NO

Twitter-related.

4:00 am on Sept 15, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
PycURL/7.18.2

robots.txt? NO

Twitter-related.

4:48 pm on Sept 21, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-[yada-yada].compute-1.amazonaws.com
LargeSmall Crawler (LargeSmall; [onespot.com;...] info@onespot.com)

robots.txt? Yes

11:59 pm on Sept 24, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


LargeSmall [webmasterworld.com] is a pest but this kind of activity is just asinine --

ec2-174-129-236-193.compute-1.amazonaws.com
larbin_2.6.3 (larbin2.6.3@unspecified.mail)

09/24 13:56:58 /robots.txt
09/24 14:00:55 /robots.txt
09/24 14:03:55 /robots.txt
09/24 14:09:55 /robots.txt
09/24 14:13:22 /robots.txt
09/24 14:23:35 /robots.txt
09/24 14:50:42 /robots.txt
09/24 15:01:36 /robots.txt
09/24 15:08:40 /robots.txt
09/24 15:12:41 /robots.txt

O, if only I had a nickel for every useless, log-filling hit from amazonaws.com!

3:01 pm on Sept 27, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


This time bitlybot requested robots.txt -- and just as promptly ignored it.

ec2-174-129-227-79.compute-1.amazonaws.com
bitlybot
09/27 04:49:16/robots.txt
09/27 04:49:17/dir/filename.html

2:14 pm on Oct 1, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-138-11.compute-1.amazonaws.com
mefashpesh (pishpush.com)

robots.txt? NO

4:44 pm on Oct 1, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-174-129-158-130.compute-1.amazonaws.com
taptubot *** please read [taptu.com...] ***

robots.txt? Yes

4:49 pm on Oct 4, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Firefox/1.6a1? Oh, please.

ec2-67-202-51-187.compute-1.amazonaws.com
Mozilla/5.0 (Windows; U; Windows NT 5.2 x64; en-US; rv:1.9a1) Gecko/20060214 Firefox/1.6a1

robots.txt? No

9:23 am on Oct 6, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5819
votes: 64


UA: MetaURI API +metauri.com
rDNS: ec2-75-101-232-27.compute-1.amazonaws.com. [Verified]
robots.txt: No
2:41 am on Oct 23, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


ec2-75-101-221-99.compute-1.amazonaws.com
ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

robots.txt? Yes BUT -- ignored it.

Last Feb. (up-thread; mssg.#: 3848081), the preceding UA was A-OK w/ robots.txt. No longer, at least not when run by amazonaws.com.

Still fully compliant when run from archive.org using this one:

ia310738.us.archive.org
ia_archiver-web.archive.org

9:28 pm on Oct 23, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


Why do you think this is genuine ia_archiver? I don't accept the UA anyway but could it be fake?
10:07 pm on Oct 23, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


I just report 'em as I see 'em emerge from amazonaws.com's cloud cover. I don't know if they're fake or not.

Given the current "google.com -- spoof? spider? botnet zombie? employee? [webmasterworld.com]" mystery sightings, I guess everything could be fake.

5:34 pm on Oct 26, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


[api.samepoint.com;...] admin@samepoint.com
174.129.119.nnn
ec2-174-129-119-nnn.compute-1.amazonaws.com
-----
Address: Amazon Web Services, Elastic Compute Cloud, EC2
NetRange: 174.129.0.0 - 174.129.255.255
-----
ROBOTS.TXT? No
-----

Took the default root page and one xml file then left.

5:37 pm on Oct 26, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


PostRank/2.0 (postrank.com)
174.129.141.nnn
ec2-174-129-141-nnn.compute-1.amazonaws.com
-----
Address: Amazon Web Services, Elastic Compute Cloud, EC2
NetRange: 174.129.0.0 - 174.129.255.255
-----
ROBOTS.TXT? No
-----

Did one HEAD request and left.

4:05 am on Oct 27, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1667
votes: 36


I have per page set to the Max allowed.

A Proposed WebmasterWorld Feature, for this thread particulary, if I click on #3 from the main nav menu from the main treads link list in this part of universe, take me to the last post on page 3 minus 1.

amazonaws is tracked and 403d as always here.

7:25 pm on Nov 6, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Another botlike smackdown from amazonaws. Atypical pattern for most bots -- single files hit twice -- but clearly going off a hit list because not all files in /dir were hit (ditto any of thousands of files on the site). Most of the hit pages had been Twitter mentions/tweeted, but not all.

ec2-174-129-193-62.compute-1.amazonaws.com
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.5) Gecko/2008120121 Firefox/3.0.5

robots.txt? NO

18:45:31 /dir/file07.html
18:45:32 /dir/file07.html
18:45:33 /dir/file01.html
18:45:34 /dir/file01.html
18:45:36 /dir/file06.html
18:45:36 /dir/file06.html
18:45:38 /dir/file04.html
18:45:38 /dir/file04.html
18:45:39 /dir/file02.html
18:45:40 /dir/file02.html
18:45:41 /dir/file05.html
18:45:42 /dir/file05.html
18:45:43 /dir/file03.html
18:45:44 /dir/file03.html
18:45:45 /dir/file09.html
18:45:46 /dir/file09.html
18:45:48 /dir/file08.html
18:45:48 /dir/file08.html
18:45:50 /dir/file10.html
18:45:51 /dir/file10.html

FWIW: Alleged UA is old; Mac FF is currently 3.5.5.

2:35 am on Nov 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


(Emphasis mine...)

ec2-174-129-58-178.compute-1.amazonaws.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10

robots.txt? NO
Fake ref? YES: http://www.google.com/search?q=sitename.com/

Aside:

UAs with that User-Agent: intro swarmed out of nowhere about a year ago, as I recall. Used to see multiple scores a day; now maybe once or twice, tops. (Never did figure out who/what miscoded the string and made its hits so easy to send packing.) UAs ran the gamut. Here's a very partial listing:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.2.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; InfoPath.2)
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)

This 278 message thread spans 10 pages: 278