Banning spiders except for a few I want, via .htaccess

With regard to spiders and rogue bots... and controlling their access to my charity web sites

Given that rogue spiders often ignore robots.txt, what methods can be used (eg: .htaccess statements) to restrict access to the whole site to a very small number of recognised large legit search engines?

I'm not worried about commercial SEO requirements, I have very very restricted "search" requirements for my sites - which is that people searching on the very obvious terms relating to the name of the site/charity, or specific popular page titles, find us, and that seems to work fine on our current keyword and title and content strategy. I've found that getting a listing in Open Directory/dmoz.org and sorting keywords and content and page title metatags seems to satisfy most of my ranking requirements.

But I would like to save a bit of bandwidth and improve security by a robust blunderbuss banning of all spiders except the ones I choose - which would probably be google, yahoo slurp, bing, and one or two others.

I already run bot traps via .htaccess which result in some rogue bots collecting automatic IP address bans for themselves - but that does make for a very big .htaccess file as the list grows.

Anyone got any suggestions for appropriate global banning .htaccess statements and also specific allow statements?

And a list of mainstream web search engines which it would be sensible to allow?

Many thanks to any who can help.

Banning spiders except for a few I want, via .htaccess

.htaccess earch engine spider control

revrob

tangor

revrob

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week