Pfui

msg:4061855 | 4:08 am on Jan 16, 2010 (gmt 0) |
Long story short? Block .amazonaws.com :) IP range-wise, the IPs are 'in' the Host names.* Now as to how many there are, let alone what they are, I'm sorry but I'll have to leave that compilation as a sweat equity exercise for the bot-curious/obsessed at this time. Suffice it to say that akin to any country -- and numbering more than many countries'! -- Amazon's cloud-related IPs are neither contiguous nor non-expanding. . *The second post in the MetaURI [webmasterworld.com] thread shows more detail, including an atypical example of the same UA using the exact same AWS IP over a period of time.
|
Pfui

msg:4062180 | 11:17 pm on Jan 16, 2010 (gmt 0) |
Emphasis mine. See link below for more info: ec2-174-129-120-104.compute-1.amazonaws.com Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE) robots.txt? NO URI: /favicon.ico Related: Yahoo's cloaked crawler(s) [webmasterworld.com]
|
Pfui

msg:4062445 | 4:41 pm on Jan 17, 2010 (gmt 0) |
ec2-174-129-237-42.compute-1.amazonaws.com OpenCalaisSemanticProxy robots.txt? NO
|
Pfui

msg:4062932 | 4:55 pm on Jan 18, 2010 (gmt 0) |
ec2-204-236-247-88.compute-1.amazonaws.com @hourlypress robots.txt? NO Twitter-related.
|
Pfui

msg:4062992 | 6:39 pm on Jan 18, 2010 (gmt 0) |
Two hits 20 seconds apart; UA not too cleverly cloaked: ec2-75-101-147-15.compute-1.amazonaws.com Firefox robots.txt? NO
|
trader

msg:4064036 | 6:02 am on Jan 20, 2010 (gmt 0) |
Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it? I see amazonaws.com in many of my sites referral logs with lots of visits week after week and month after month. Are they owned by amazon.com? Why do they want to visit my sites in the first place? How do they know about my url's? Sometimes they visit even before anyone else does or the site has time to get listed in Google. How would they even know my url was just put online? Sometimes they are my #1 traffic source with both newer and older sites. How and why is this happening? Who are they?
|
keyplyr

msg:4064056 | 7:02 am on Jan 20, 2010 (gmt 0) |
| Just found this long thread. Don't undestand why amazonaws.com is visiting so many sites and so often. Can someone please explain it? |
| Read the thread. It's explained.
|
Pfui

msg:4064604 | 11:20 pm on Jan 20, 2010 (gmt 0) |
Long thread short, the title tells it all: amazonaws.com plays host to wide variety of bad bots That statement was true exactly one year+three days ago when I started this thread. Now, 125-plus posts later, and scores and scores and scores of bad, rude, iffy, test, and/or worthless bot hits later, that statement's an understatement. So if your main traffic source is amazonaws.com-related, I'm sorry but none of that resource-eating traffic -- NONE of it -- is real people visiting in real time. Block amazonaws.com and you'll stop feeding its plague of locusts.
|
dstiles

msg:4065128 | 5:14 pm on Jan 21, 2010 (gmt 0) |
Not sure how this fits into the Amazon scenario: IP: 216.113.169.nnn Standard Headers: Accept All only UA: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2a1pre) Gecko/20090402 Firefox/3.6a1pre (.NET CLR 3.5.30729) Referer: http:// www. google. com HTTP_CONNECTION: Keep-Alive Robots: haven't checked The hit (home page of one site only) was trapped on several points that usually indicate an exploit attempt. eBay, Inc OrgID: EBAY NetRange: 216.113.160.0 - 216.113.191.255
|
Pfui

msg:4065242 | 8:13 pm on Jan 21, 2010 (gmt 0) |
Hmm. I may be missing something but I don't see a connection between your info and amazonaws.com or anything Amazon. If you suspect a cloaked/rogue spider, you could start a new thread with more details about the visit. Alternatively, It could've just been a zombied machine belonging to an eBay employee.
|
dstiles

msg:4065286 | 9:48 pm on Jan 21, 2010 (gmt 0) |
Hmm. I'll just go and bury my head in a bucket, then, shall I? Not being a user of either, I totally confused Amazon and Ebay. Sorry. :(
|
Pfui

msg:4065417 | 2:15 am on Jan 22, 2010 (gmt 0) |
Alas, because amazonaws.com is a never-ending source of new bad bots, I suspect this thread -- and my many bot-sightings in it -- will continue (...ad nauseam, sorry:)
|
bedlam

msg:4068421 | 8:55 pm on Jan 26, 2010 (gmt 0) |
Just a note about a bot identifying itself as PostRank. Somebody above mentioned that it made one HEAD request and then left. However, I've just seen that bot (same AWS IP range) request /wp-admin/install.php. I can't see any good reason for a bot to want that... -- b
|
dstiles

msg:4068538 | 10:55 pm on Jan 26, 2010 (gmt 0) |
I see a lot of php accesses through folders such as admin. All, as far as I've seen, are attempts to gain access to the server through faults (eg no password) in admin whatever files, often for phpmyadmin or other control panels. As I noted elswhere (I think this thread): I have seen AWS used either directly (via logged on account) or indirectly (via infected servers) for "standard botnet" exploit attempts.
|
tangor

msg:4068657 | 2:06 am on Jan 27, 2010 (gmt 0) |
The enormous number of "php" requests via the discussed ip range(s) is why I've been intrigued, and why I finally nuked it a week or so back. I could have approached it the other way around via referer and accomplished same thing, but life is too short and there's too many rogue bots on that service.
|
Pfui

msg:4070839 | 2:25 am on Jan 30, 2010 (gmt 0) |
ec2-75-101-243-212.compute-1.amazonaws.com curl/7.18.2 (i486-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18 robots.txt? NO
|
Pfui

msg:4073031 | 9:09 pm on Feb 2, 2010 (gmt 0) |
ec2-204-236-242-36.compute-1.amazonaws.com Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ ) robots.txt? Yes [Above info also posted as new thread.] Curiously, per WHOIS, the bot-runner's site appears to be hosted by Amazonaws.com: seoprofiler.com => 174.129.8.145 => Amazon.com/amazonaws.com [Amazon Web Services, Elastic Compute Cloud, EC2] Interesting how a company can claim a dynamically assigned IP as its permanent address...
|
thetrasher

msg:4075879 | 7:13 pm on Feb 7, 2010 (gmt 0) |
New: AMAZON-EC2-7 = 184.72.0.0/15
|
dstiles

msg:4075914 | 8:21 pm on Feb 7, 2010 (gmt 0) |
Well spotted! My very first ban of the 184 range. :)
|
tpeacock

msg:4078220 | 10:57 am on Feb 11, 2010 (gmt 0) |
These are the CIDR blocks I have for amazonaws's bad bots at this point. Does this seem to cover all of them or are there more? 67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots 75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots 79.125.0.0/18 # "do not delete" - amazonaws.com's bad bots - Ireland 174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots 184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots 204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots Thomas
|
thetrasher

msg:4078296 | 1:11 pm on Feb 11, 2010 (gmt 0) |
+ 216.182.224.0/20 72.44.32.0/19
|
Lain_se

msg:4078570 | 7:06 pm on Feb 11, 2010 (gmt 0) |
My laundry list of ec2 trash includes the following. deny from 67.202.0.0/18 "Amazon ec2-Cloud" deny from 72.44.32.0/19 "Amazon ec2-Cloud" deny from 75.101.128.0/17 "Amazon ec2-Cloud" deny from 79.125.0.0/18 "Amazon ec2-Cloud" deny from 174.129.0.0/16 "Amazon ec2-Cloud" deny from 184.72.0.0/15 "Amazon ec2-Cloud" deny from 204.74.108.0/24 "Amazon ec2-Cloud" deny from 204.236.128.0/17 "Amazon ec2-Cloud" deny from 204.74.108.0/24 "Amazon ec2-Cloud" I have seen virtually every real and fake user agent included and NEVER once have I been able to figure out why I should allow them to index my site. I spoke with Amazon Services about a year ago and explained to them why I am blocking them and requested that they force a user-ID tag of some sort to identify the user for abuse reasons and they explained to me that this will never happen....and so I stated that's too bad and your services will always be blocked as well. I honestly do not think they care and do not understand that they are only enabling web scrapers, spammers and other trash to ruin their integrity. Again I do not think they care so long as they are getting paid.
|
keyplyr

msg:4078601 | 8:04 pm on Feb 11, 2010 (gmt 0) |
CLOUD: Creepy Litigious Outrageous User-agent Dwelling
|
dstiles

msg:4078615 | 8:30 pm on Feb 11, 2010 (gmt 0) |
I've got the Ireland one 79.125.0.0/127 204.74.108.0/24 (which you list twice) resolves here to: UltraDNS Corp ULTRADNS-GLOBAL-2 204.74.96.0 - 204.74.108.255 108 seems to be mostly unused apart from loads of name servers on 1.
|
tpeacock

msg:4080067 | 4:44 pm on Feb 14, 2010 (gmt 0) |
Thanks thetrasher for those 2 and dstiles the Ireland one was 79.125.0.0/17 not 18 like I had. I assumed you meant 17 not 127? 67.202.0.0/18 # "do not delete" - amazonaws.com's bad bots 72.44.32.0/19 # "do not delete" - amazonaws.com's bad bots 75.101.128.0/17 # "do not delete" - amazonaws.com's bad bots 79.125.0.0/17 # "do not delete" - amazonaws.com's bad bots - Ireland 174.129.0.0/16 # "do not delete" - amazonaws.com's bad bots 184.72.0.0/15 # "do not delete" - amazonaws.com's bad bots 204.236.128.0/17 # "do not delete" - amazonaws.com's bad bots 216.182.224.0/20 # "do not delete" - amazonaws.com's bad bots That's a lot of IP addresses but I can not think of a single reason not to block them all. Thomas
|
dstiles

msg:4080073 | 5:27 pm on Feb 14, 2010 (gmt 0) |
> I assumed you meant 17 not 127? Sorry. It had been a long day. :)
|
Pfui

msg:4080665 | 6:06 pm on Feb 15, 2010 (gmt 0) |
ec2-75-101-245-135.compute-1.amazonaws.com Twisted PageGetter robots.txt? NO See also: Twisted PageGetter [webmasterworld.com] (09/2009)
|
blend27

msg:4081004 | 3:24 am on Feb 16, 2010 (gmt 0) |
| CLOUD: Creepy Litigious Outrageous User-agent Dwelling |
| CLOUD: 403
|
Pfui

msg:4082662 | 1:00 am on Feb 18, 2010 (gmt 0) |
ec2-174-129-153-217.compute-1.amazonaws.com HTMLParser/2.0 robots.txt? NO
|
Pfui

msg:4082671 | 1:15 am on Feb 18, 2010 (gmt 0) |
ec2-174-129-167-253.compute-1.amazonaws.com Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728) robots.txt? NO What bothers me about this one is not just its non-bot UA, but that its hits only went to two html files and some of their graphics files, and knew just where to look before arriving. I've blocked amazonaws.com for over a year -- basically from the first day I saw it -- so the directory paths didn't come from my server. Hmm.
|
blend27

msg:4083220 | 11:54 pm on Feb 18, 2010 (gmt 0) |
ec2-184-73-16-198.compute-1.amazonaws.com Nutch/Nutch-1.0-dev+(A+Nutch-based+crawler.;+http://lucene.apache.org/nutch/bot.html;+nutch-agent+AT+lucene.apache.org) robots.txt? Yes - ignored it. Went after Homepage and left with a fat 403.
|
|