Forum Moderators: open

Message Too Old, No Replies

Blocking all traffic from amazonaws.com

I want to block all traffic

         

Kratos

5:47 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi guys, so if one was to block ALL traffic coming from amazonaws.com would this be the correct .htaccess code?

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC]
RewriteRule ^(.*)$ - [F]

I found the above code in another thread from some years ago (only one user mentioned the code), and I take it that the following lines are the main ones blocking the traffic from amazonaws.com

RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]

The other UAs are from what was thought in that thread to be bots from amazonaws.com

I'm simply interested in blocking anything and everything coming from amazonaws.com even if it's humans. Would the above .htaccess code work or would I need something else?

Thanks!

wilderness

7:04 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]


These are not defined with efficient syntax.

Additionally, the lookup for the COM, will have two adverse effects.
1) it will slow your site down because each request will be do a lookup for the COM site
2) using the domain lookup changes the format of standard raw logs from IP's to domains, which most don't care for as it requires extra work (and a different method)of looking up IP's.

Here's a link to the North American Amazon IP's [whois.arin.net], PFUI's lengthy Amazon AWS thread I'm pretty sure includes the none-NA IP's.

lucy24

8:13 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If humans are using amazonaws, that's their problem. Don't use %{REMOTE_HOST} if you can possibly help it.

It is generally far more efficient to block via IP ranges, using the formulation

Deny from 54

Caution! This is not literally true, since some parts of 54 still belong to Merck and there are a few legitimate human bits having to do with some mobile provider or other.

:: shuffling papers ::

I've currently got

Deny from 54.72.0.0/13 54.80.0.0/12 54.144.0.0/12 54.160.0.0/12 54.176.0.0/12 54.192.0.0/10

but I may have missed a few pieces. There's also AWS at

:: further shuffling of papers ::

Deny from 23.20.0.0/14
Deny from 50.16.0.0/14 50.112.0.0/15
Deny from 67.202.0.0/18
Deny from 75.101.128.0/17
Deny from 96.127.0.0/17
Deny from 107.20.0.0/14
Deny from 174.129.0.0/17 174.129.128.0/18
Deny from 176.34
Deny from 184.72.0.0/15 184.169.128.0/17
Deny from 223.16.0.0/14

(All of 174.129 range is actually AWS; I've poked a hole for the Wayback Machine. ymmv)

Kratos

8:20 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks for the reply!

I will consider the IPs as I was reading several threads prior to posting this thread and my head was spinning from all the stuff being discussed. I'm a noob when it comes to .htaccess and I just want to block everything from amazonaws.com on some very small sites.

Shall I incude the IP ranges in the link in my Cpanel deny IP box? Anything that is put there is put on .htaccess.

If I were to simply want to ban anything coming from amazonaws.com without doing the IPs, would there be any simple way to tell the server to deny anything from amazonaws.com and serve a 403? No IP rages, simply come from amazonaws.com to my site and banned.

While I will go through more threads and probably end up using those IP ranges on other sites, I'd like to find a way if possible for the easiest most-direct way to block amazonaws.com

Thanks again for any replies!

Kratos

8:21 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks for the reply, Lucy24! I just saw your reply after I posted my reply. I actually have seen that you're very active on the threads blocking AmazonAWS and I have read from you before the 54. issue you mention. It looks like you guys have collected some serious data on the bad traffic from amazonaws.com. In my case, I just don't want to bother with any Kindle readers who may be wanting to browse the sites I want to use the code on as these sites are very small and in a language not many Kindle readers use.

Shall I simply then paste those IP ranges and code you posted to stop all amazonaws.com traffic?

Thanks

keyplyr

4:17 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All Amazon and ready to go:

Order deny,allow
SetEnvIf Request_URI ^/robots\.txt$ allowall
deny from 23.20.0.0/14 46.51.128.0/17 46.137.0.0/16 50.16.0.0/14 50.112.0.0/16 52.0.0.0/11 54.64.0.0/15 54.66.0.0/16 54.72.0.0/13 54.80.0.0/12 54.144.0.0/12 54.160.0.0/11 54.192.0.0/10 67.202.0.0/18 72.21.192.0/19 72.44.32.0/19 75.101.128.0/17 79.125.0.0/18 87.238.80.0/21 87.238.84.0/23 103.4.8.0/21 107.20.0.0/14 122.248.192.0/18 156.154.64.0/22 156.154.68.0/23 174.129.0.0/16 175.41.128.0/18 175.41.192.0/18 175.41.224.0/19 176.32.64.0/19 176.34.0.0/16 178.236.0.0/20 184.72.0.0/15 184.169.128.0/17 185.48.120.0/22 204.236.128.0/17 216.182.224.0/20
allow from env=allowall

lucy24

6:25 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Kratos, note that keyplyr's sites operate in "Deny,Allow" mode, meaning that "Allow" can override "Deny". If instead you use "Allow,Deny" --most people do, ongoing argument, different thread-- you have to use a different mechanism to poke a hole for robots.txt:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

Or, if you prefer, "Deny,Allow". Weird but true, in this context it doesn't matter.

keyplyr

6:35 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If instead... most people do

Comforting to be reassured that I am not most people ;)

wilderness

7:17 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Comforting to be reassured that I am not most people


keyplr,
If you, I or any other forum participant fit the 'normal category', we'd NOT even be participating in this forum.

keyplyr

8:27 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness... well I knew you weren't normal :)

Kratos

11:33 am on Mar 11, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hey guys, thanks for the replies. What I don't understand is the robots.txt bit for .htaccess as suggested by both @keyplir and @lucy24

Is that piece of code/directive that I reference used to allow the bots coming from those Amazonaws IPs to go through robots.txt and, if they're blocked through robots.txt, then they will be blocked by .htaccess (and viceversa)?

So if a bad bot from amazonaws comes and isn't listed in robots.txt as free to go in, it will be blocked. But if a bot from amazonaws is listed in robots.txt as allowed to go in, then .htaccess will not block it?

Is there any actual bot traffic from amazonaws that is useful anyway? All I can think of is the Google Page Speed tool using amazonaws IPs (haven't tested that though, read about it here), but aside from that, all amazonaws traffic is either hacking bots or those pesky SEO bots that take snapshots of your site and all your HTML attributes. Not to mention scrapers, but I find most scraping traffic comes from Russia with love.

wilderness

11:49 am on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So if a bad bot from amazonaws comes and isn't listed in robots.txt as free to go in, it will be blocked. But if a bot from amazonaws is listed in robots.txt as allowed to go in, then .htaccess will not block it?


Apples and oranges.

robots.txt is a request to compliant bots to follow your wishes.

htaccess is a server-side function in which the visitor (or bot) must adhere to (at least as served)

Kratos

12:07 pm on Mar 11, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, I'm aware of the differences between robots.txt and .htaccess (and how robots.txt is futile at stopping bad bots). What I was referring to in my previous reply was what meant the following line:

SetEnvIf Request_URI ^/robots\.txt$ allowall

Also, as lucy24 posted:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

I don't know what the above 2 lines of code are for, would it be possible to please tell me what that is for so I can learn? Thanks

wilderness

12:14 pm on Mar 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is intended to all allow robots.txt to be viewable by everyone, and regardless of later lines that disallow all other access.

Kratos

1:14 pm on Mar 11, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Great! Thanks again for the help.