The Contractor

msg:3849007 | 7:40 pm on Feb 13, 2009 (gmt 0) |
Looking at my blocked log they are coming in using the following UA: AISearchBot woriobot heritrix NetSeer Nutch I have all the blocked blocked via .htaccess anyways.
|
jmpreston

msg:3849106 | 9:34 pm on Feb 13, 2009 (gmt 0) |
Hmmm, don't know what UA is or what to do with those :-)
|
The Contractor

msg:3849181 | 1:57 am on Feb 14, 2009 (gmt 0) |
RewriteEngine On RewriteCond %{HTTP_REFERER} amazonaws\.com [OR] RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC,OR] RewriteRule ^.*$ - [F]
|
jmpreston

msg:3849213 | 3:26 am on Feb 14, 2009 (gmt 0) |
Ahh, User Agent = UA. I'm studying this stuff but it is far from coming together for me. Still I understand what you've given me and I pasted it into my .htaccess. THANKS! I'll get back to you how it worked in a couple of days. - jim
|
jmpreston

msg:3851669 | 6:32 pm on Feb 17, 2009 (gmt 0) |
I put the .htaccess file in the main root of the hosted service, in /www, and in /www/mysite and no results. I'm working with a programmer but he seems a bit light on the subject.
|
The Contractor

msg:3852434 | 5:11 pm on Feb 18, 2009 (gmt 0) |
Make sure your hosting company allows/supports .htacces and allows overides from httpd.conf file
|
jmpreston

msg:3852449 | 5:21 pm on Feb 18, 2009 (gmt 0) |
I started wondering that this morning because we are getting nowhere. Thanks for the comment because that helps me explain it to them.
|
The Contractor

msg:3852457 | 5:31 pm on Feb 18, 2009 (gmt 0) |
Easy way to test if your blocking by User Agent is working. Use FireFox and install the plugin "User Agent Switcher" add one of the UA strings you are trying to block and visit your site.
|
jmpreston

msg:3853315 | 6:35 pm on Feb 19, 2009 (gmt 0) |
My programmer tried your suggestion. It seems that the amazonaws apps are faking browsers. I'm really clueless but we can't seem to stop them. He found amazonaws IP's so we are trying that now. Maybe 30% of my page views are now from sites like ec-2-174-129-115-45.compute-1.amazonaws.com and there are dozens of these addresses in the log. - jim
|
jmpreston

msg:3854907 | 7:19 pm on Feb 21, 2009 (gmt 0) |
This code finally stopped the amazonaws.com accesses to our site: RewriteEngine On RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR] RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} "AISearchBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "woriobot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "NetSeer" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Nutch" [NC] RewriteRule ^(.*)$ - [F] Thanks for your help The Contractor! This thread should be useful for others encountering this issue, and all sites probably will eventually. Scammers love the cheap power of cloud computing. - jim
|
caribguy

msg:3855015 | 10:43 pm on Feb 21, 2009 (gmt 0) |
Jim, You're off to a good start with those, and you may also want to check the Search Engine Spider Identification [webmasterworld.com] forum for a lot more on AmazonAWS
|
|