Forum Moderators: coopster & phranque

Message Too Old, No Replies

annoying bots

dissallowing bots

         

dave_h

6:56 pm on Mar 4, 2006 (gmt 0)

10+ Year Member



Hi

I guess the old thread was closed so I hope it's not a problem to repost this info.

Re: .htaccess below...

Can I use it just as is in my directory?

I am trying to ban bots from indexing images there.

Also can I remove my robots.txt I presently have there now or is it ok to let them both reside in the same directory?

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule!^http://[^/.]\.your-site.com.* - [F]

Thanks
Dave

Lord Majestic

7:01 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also can I remove my robots.txt I presently have there now or is it ok to let them both reside in the same directory?

It is a good idea to keep robots.txt in place AND actually allow ANY IP or USER-AGENT retrieve it - you have nothing to lose and it can safe you resources on having to filter non-compliant bots on the fly because those of them that will happily comply with your robots.txt will be able to do so.

DamonHD

7:14 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I block bad bots by:

1) Screening any whose IP addresses are in (eg) the SPAMHAUS xbl-sbl list of compromised/SPAMmer machines. (About 10% of my bandwidth saved immediately.)

2) By observing their behaviour and eventually squeezing the badly-behaved ones out. (Also catches generally-good bots such as Google when they are having a bad-hair day and go mad.)

I have very very few hard-wired rules, and I would not trust anything in the UA string as it is trivial to lie.

Rgds

Damon

perl_diver

8:07 pm on Mar 4, 2006 (gmt 0)

10+ Year Member



maybe your other thread was closed because this is the perl forum and this is not a perl question. You should probably be asking this question in the apache webserver forum.

DamonHD

8:32 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

OK, but the OP can fix this in their Perl code rather than at the Apache level, and it's dead easy (at least checking the DNS BLs is).

Rgds

Damon