Welcome to WebmasterWorld Guest from 54.205.209.95

Forum Moderators: phranque

Message Too Old, No Replies

Block amazonaws.com with .htaccess?

Can't block 'em

   
7:23 pm on Feb 13, 2009 (gmt 0)

5+ Year Member



I've tried the following but I can't seem to block amazonaws.com visitors to my sites. This is in .htaccess in my root on a hosted server public_html:

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} amazonaws\.com [NC]
RewriteRule .* - [F]

<Files GET POST PUT>
order allow, deny
deny from .amazonaws.com
</files>

Any other ideas? It seems that scammers and such are using the Amazon cloud for cheap power to steal sites, etc.

- jim

7:40 pm on Feb 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looking at my blocked log they are coming in using the following UA:

AISearchBot
woriobot
heritrix
NetSeer
Nutch

I have all the blocked blocked via .htaccess anyways.

9:34 pm on Feb 13, 2009 (gmt 0)

5+ Year Member



Hmmm, don't know what UA is or what to do with those :-)
1:57 am on Feb 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteEngine On

RewriteCond %{HTTP_REFERER} amazonaws\.com [OR]
RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC,OR]
RewriteRule ^.*$ - [F]

3:26 am on Feb 14, 2009 (gmt 0)

5+ Year Member



Ahh, User Agent = UA. I'm studying this stuff but it is far from coming together for me. Still I understand what you've given me and I pasted it into my .htaccess. THANKS! I'll get back to you how it worked in a couple of days.

- jim

6:32 pm on Feb 17, 2009 (gmt 0)

5+ Year Member



I put the .htaccess file in the main root of the hosted service, in /www, and in /www/mysite and no results. I'm working with a programmer but he seems a bit light on the subject.
5:11 pm on Feb 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Make sure your hosting company allows/supports .htacces and allows overides from httpd.conf file
5:21 pm on Feb 18, 2009 (gmt 0)

5+ Year Member



I started wondering that this morning because we are getting nowhere. Thanks for the comment because that helps me explain it to them.
5:31 pm on Feb 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Easy way to test if your blocking by User Agent is working. Use FireFox and install the plugin "User Agent Switcher" add one of the UA strings you are trying to block and visit your site.
6:35 pm on Feb 19, 2009 (gmt 0)

5+ Year Member



My programmer tried your suggestion. It seems that the amazonaws apps are faking browsers. I'm really clueless but we can't seem to stop them. He found amazonaws IP's so we are trying that now.

Maybe 30% of my page views are now from sites like ec-2-174-129-115-45.compute-1.amazonaws.com and there are dozens of these addresses in the log.

- jim

7:19 pm on Feb 21, 2009 (gmt 0)

5+ Year Member



This code finally stopped the amazonaws.com accesses to our site:

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC]
RewriteRule ^(.*)$ - [F]

Thanks for your help The Contractor! This thread should be useful for others encountering this issue, and all sites probably will eventually. Scammers love the cheap power of cloud computing.

- jim

10:43 pm on Feb 21, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Jim,

You're off to a good start with those, and you may also want to check the Search Engine Spider Identification [webmasterworld.com] forum for a lot more on AmazonAWS