Forum Moderators: phranque
SetEnvIf User-Agent ^Mozilla/4\.0 (*wildcard*)let_me_in
SetEnvIf User-Agent ^Mozilla/5\.0 (*wildcard*)let_me_in
<Directory /docroot>
Order Deny,Allow
Deny from all
Allow from env=let_me_in
</Directory>
Is this possible? or do I need to list every possible User-Agent for Win,Mac,Linux? I really don't need any search engines at all Not even the good ones, is there any way to block EVERYTHING except a true browser? does anybody have a whitelist only .Htaccess like this that denys everything but a real browser? Thanks for any help , im turning gray trying to find a better way to say no to all bots and only allow verified browsers.
Don't waste time blocking bots, OPT-IN bots and control your content
blacklisting is a no-win endless game
Some of us are already chatting in that thread [webmasterworld.com] right now so feel free to join in and compare notes and strategies!
that post from incrediBILL
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well
<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>
is exactly what I needed, just this alone should deny every bot with non matching Mozilla User Agent I would asume making the constant bad bot rewrite unecessary like
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
etc...
etc...
this can still be spoofed by the unlazy but it sounds like a real good start, it would also be good to block known bad IP ranges to reduce spoofing of the User Agent. Thanks alot sounds better then what Im doing now, and will allow alot smaller .Htaccess file. Ill keep watching your ideas, thanks for the Input.
Already in my Htacess file I have in this order
a hotlinking rewrite order
then
custom errordocuments order
then
I attempted to place the above script
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>
and the server rejected this and gave me a internal error, so maybe my order of these statements is wrong? But I do know before I pasted in this order to replace this order
SetEnvIfNoCase User-Agent "^Alexibot" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
Etc
Etc
<Limit GET POST PUT HEAD>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>
which works fine, so I thought maybe if I replace
<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>
from incrediBILL script with
<Limit GET POST PUT HEAD>
order deny,allow
deny from all
allow from env=good_pass
</Limit>
will work the same? it works :) with no internal error, but Im not a expert at .Htaccess, and the big question is is <Limit GET POST PUT HEAD> as effective as <Files *> or am I leaving holes using <Limit GET POST PUT HEAD> instead of <Files *> as incrediBILL used?
Another question is couldn't this be refined even tighter by narrowing
BrowserMatchNoCase ^Mozilla good_pass (Which allows any browser that starts with Mozilla)
down to
BrowserMatchNoCase ^Mozilla/[4-5]\.0 good_pass (Which only allows Mozilla and IE User Agents, Im not sure if my syntax is correct)
the 2nd would be even better to narrow it down to Mozilla 4.0 or 5.0 User-Agents, Let me know what you think? this wouldn't be great for everybody, but since our site is really only tested with IE or Mozilla this would be ideal for us.