Forum Moderators: phranque

Message Too Old, No Replies

Why not skip blacklisting Bots and Whitelist Browsers Only?

         

stressedoutStaff

3:05 pm on Mar 20, 2006 (gmt 0)

10+ Year Member



I want to whitelist only, it seems to be madness keeping up with a blacklist in htaccess, it seems it would be much easier to just whitelist only

SetEnvIf User-Agent ^Mozilla/4\.0 (*wildcard*)let_me_in
SetEnvIf User-Agent ^Mozilla/5\.0 (*wildcard*)let_me_in

<Directory /docroot>
Order Deny,Allow
Deny from all
Allow from env=let_me_in
</Directory>

Is this possible? or do I need to list every possible User-Agent for Win,Mac,Linux? I really don't need any search engines at all Not even the good ones, is there any way to block EVERYTHING except a true browser? does anybody have a whitelist only .Htaccess like this that denys everything but a real browser? Thanks for any help , im turning gray trying to find a better way to say no to all bots and only allow verified browsers.

bcolflesh

3:15 pm on Mar 20, 2006 (gmt 0)

Pfui

3:22 pm on Mar 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



GMTA (great minds think alike:) incrediBILL started a similar thread in this forum last week:

Don't waste time blocking bots, OPT-IN bots and control your content
blacklisting is a no-win endless game

Some of us are already chatting in that thread [webmasterworld.com] right now so feel free to join in and compare notes and strategies!

stressedoutStaff

4:38 pm on Mar 20, 2006 (gmt 0)

10+ Year Member



Thanks bcolflesh and pfui

that post from incrediBILL

BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass

#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well
<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>

is exactly what I needed, just this alone should deny every bot with non matching Mozilla User Agent I would asume making the constant bad bot rewrite unecessary like
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
etc...
etc...

this can still be spoofed by the unlazy but it sounds like a real good start, it would also be good to block known bad IP ranges to reduce spoofing of the User Agent. Thanks alot sounds better then what Im doing now, and will allow alot smaller .Htaccess file. Ill keep watching your ideas, thanks for the Input.

stressedoutStaff

6:41 am on Mar 21, 2006 (gmt 0)

10+ Year Member



Well I tested the above script by incrediBILL today and I got a Internal Error, this could be my own problem since im not a .htacess expert and maybe since my .htaccess is quite detailed I have the order in which these statments need to be listed wrong.

Already in my Htacess file I have in this order
a hotlinking rewrite order
then
custom errordocuments order
then
I attempted to place the above script

BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>

and the server rejected this and gave me a internal error, so maybe my order of these statements is wrong? But I do know before I pasted in this order to replace this order
SetEnvIfNoCase User-Agent "^Alexibot" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
Etc
Etc
<Limit GET POST PUT HEAD>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>

which works fine, so I thought maybe if I replace

<Files *>
Order Deny, Allow
Deny from all
Allow from env=good_pass
</Files>

from incrediBILL script with

<Limit GET POST PUT HEAD>
order deny,allow
deny from all
allow from env=good_pass
</Limit>

will work the same? it works :) with no internal error, but Im not a expert at .Htaccess, and the big question is is <Limit GET POST PUT HEAD> as effective as <Files *> or am I leaving holes using <Limit GET POST PUT HEAD> instead of <Files *> as incrediBILL used?

Another question is couldn't this be refined even tighter by narrowing

BrowserMatchNoCase ^Mozilla good_pass (Which allows any browser that starts with Mozilla)

down to

BrowserMatchNoCase ^Mozilla/[4-5]\.0 good_pass (Which only allows Mozilla and IE User Agents, Im not sure if my syntax is correct)

the 2nd would be even better to narrow it down to Mozilla 4.0 or 5.0 User-Agents, Let me know what you think? this wouldn't be great for everybody, but since our site is really only tested with IE or Mozilla this would be ideal for us.

Pfui

5:47 pm on Mar 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



stressedoutStaff, it's really, really difficult for people to help (ditto lurkers to follow along) when you're talking about one thread's code/comments in a totally separate thread -- that's why bcolflesh and I referred you to the other thread in the first place:) So please just reply/follow-up in incrediBILL's original thread. Thanks!

stressedoutStaff

5:35 am on Mar 22, 2006 (gmt 0)

10+ Year Member



thanks Pfui,

ill ask over at incrediBILL's original thread

larryhatch

5:39 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This may sound dumb (or may just be dumb)

If you whitelist a group of 'good' robots etc., and only those,
aren't you in effect blacklisting everyone else?
.. I mean like legitimate organic traffic etc.?