homepage Welcome to WebmasterWorld Guest from 23.21.9.44
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
mass banning spiders
I am considering blocking all spiders...
han solo




msg:402347
 9:50 pm on Nov 15, 2000 (gmt 0)

Does anyone have any experience with mass banning or blocking spiders that seem to have no use in the server other than to take up bandwidth?

I've been having trouble with load balancing, with my new system, and I have been thinking, DIIbot, and others, don't work for search engines, so why should I let them have anything?

Any comments, shared stories, or lies :) would be greatly appreciated. Thanks all!

Cheers,
Han_Solo

 

PeteU




msg:402348
 2:21 am on Nov 16, 2000 (gmt 0)

Most efficient way to block unwanted spiders
is to put following lines in your /etc/httpd/conf/access.conf file

order allow,deny
allow from all
deny from xxx.xxx.xxx.xxx
deny from xxx.xxx.xxx.

replace x with IP numbers of offenders
notice how you can block a whole C-class range in the second deny example
this will block them server wide for all domains hosted on the machine.

another good way of doing it is to add these lines into
VirtualHost declarations in your /etc/httpd/conf/httpd.conf file

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*DIIbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BandwidthWaster.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SomeUselessSpider.*
RewriteRule ^(.*)$ - [F]

This will work on per domain basis and is also very efficient

Both of above methods can be used in your .htaccess file
but take up somewhat more of server resources, still very helpful if you don't have server admin access

han solo




msg:402349
 2:28 pm on Nov 16, 2000 (gmt 0)

Thanks Pete. I actually didn't need any help with the technical aspect of it, I was thinking more along the lines of potential reprecussions on my pages and clients.

Have you had any experience with this? Although I do appreciate the clear explanation of how to work with, that was Apache mod re_write, right?

Do spam harvesters, etc. tend to just start trying harder to get into a system after their blocked, thinking that they must have found something good? If anyone else cares to share their thoughts, I'd appreciate it. Thanks!

Cheers,
Han Solo

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved