homepage Welcome to WebmasterWorld Guest from 54.161.155.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Block amazonaws.com with .htaccess?
Can't block 'em
jmpreston




msg:3848990
 7:23 pm on Feb 13, 2009 (gmt 0)

I've tried the following but I can't seem to block amazonaws.com visitors to my sites. This is in .htaccess in my root on a hosted server public_html:

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} amazonaws\.com [NC]
RewriteRule .* - [F]

<Files GET POST PUT>
order allow, deny
deny from .amazonaws.com
</files>

Any other ideas? It seems that scammers and such are using the Amazon cloud for cheap power to steal sites, etc.

- jim

 

The Contractor




msg:3849007
 7:40 pm on Feb 13, 2009 (gmt 0)

Looking at my blocked log they are coming in using the following UA:

AISearchBot
woriobot
heritrix
NetSeer
Nutch

I have all the blocked blocked via .htaccess anyways.

jmpreston




msg:3849106
 9:34 pm on Feb 13, 2009 (gmt 0)

Hmmm, don't know what UA is or what to do with those :-)

The Contractor




msg:3849181
 1:57 am on Feb 14, 2009 (gmt 0)

RewriteEngine On

RewriteCond %{HTTP_REFERER} amazonaws\.com [OR]
RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC,OR]
RewriteRule ^.*$ - [F]

jmpreston




msg:3849213
 3:26 am on Feb 14, 2009 (gmt 0)

Ahh, User Agent = UA. I'm studying this stuff but it is far from coming together for me. Still I understand what you've given me and I pasted it into my .htaccess. THANKS! I'll get back to you how it worked in a couple of days.

- jim

jmpreston




msg:3851669
 6:32 pm on Feb 17, 2009 (gmt 0)

I put the .htaccess file in the main root of the hosted service, in /www, and in /www/mysite and no results. I'm working with a programmer but he seems a bit light on the subject.

The Contractor




msg:3852434
 5:11 pm on Feb 18, 2009 (gmt 0)

Make sure your hosting company allows/supports .htacces and allows overides from httpd.conf file

jmpreston




msg:3852449
 5:21 pm on Feb 18, 2009 (gmt 0)

I started wondering that this morning because we are getting nowhere. Thanks for the comment because that helps me explain it to them.

The Contractor




msg:3852457
 5:31 pm on Feb 18, 2009 (gmt 0)

Easy way to test if your blocking by User Agent is working. Use FireFox and install the plugin "User Agent Switcher" add one of the UA strings you are trying to block and visit your site.

jmpreston




msg:3853315
 6:35 pm on Feb 19, 2009 (gmt 0)

My programmer tried your suggestion. It seems that the amazonaws apps are faking browsers. I'm really clueless but we can't seem to stop them. He found amazonaws IP's so we are trying that now.

Maybe 30% of my page views are now from sites like ec-2-174-129-115-45.compute-1.amazonaws.com and there are dozens of these addresses in the log.

- jim

jmpreston




msg:3854907
 7:19 pm on Feb 21, 2009 (gmt 0)

This code finally stopped the amazonaws.com accesses to our site:

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC]
RewriteRule ^(.*)$ - [F]

Thanks for your help The Contractor! This thread should be useful for others encountering this issue, and all sites probably will eventually. Scammers love the cheap power of cloud computing.

- jim

caribguy




msg:3855015
 10:43 pm on Feb 21, 2009 (gmt 0)

Jim,

You're off to a good start with those, and you may also want to check the Search Engine Spider Identification [webmasterworld.com] forum for a lot more on AmazonAWS

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved