Forum Moderators: open
All those ancient big long bot lists do is slow your server down processing them.
Does anyone have a decent whitelist or know of one on this site?
Many others prefer white-listing over black-listing entirely.
Every time I go to take a closer look at whitelisting instructions, it turns out to involve a robot identifying itself upfront,
The theory is to DENY ALL, and than make exceptions for the visitors you choose.
How do you distinguish between a browser and a robot? You can't
Fourth, filter by behavior, header content, loads CSS, js, images, etc.
The question was, how can your htaccess (acting as bouncer) identify a new robot that it hasn't met before?
I say make everyone do a CAPTCHA - LOL
I've never comprehended the use of testing headers myself!
Using an after the fact "bot catcher" doesn't work in these cases. By the time I and/or my filters/programs have caught an IP and locked it out, the bot has moved on to a new IP, switched to another legit UA, etc. I end up adding all this processing overhead with diminishing results... and still see 30%+ of my bandwidth going out to them.
Thank you!
REMOTE_ADDR{'101.80.225.183'}
SERVER_DATE{'Thu Mar 15 07:50:00 2012'}
HTTP_REFERER{'http://www.example.com/example.htm'}
HTTP_USER_AGENT{'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1068.0 Safari/536.3'}
REMOTE_ADDR{'101.80.225.183'}
SERVER_DATE{'Thu Mar 15 07:50:00 2012'}
HTTP_ACCEPT_CHARSET{'ISO-8859-1,utf-8;q=0.7,*;q=0.3'}
HTTP_ACCEPT_ENCODING{'gzip,deflate,sdch'}
HTTP_ACCEPT_LANGUAGE{'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4'}
HTTP_CONNECTION{'keep-alive'}
HTTP_HOST{'www.example.com'}
HTTP_REFERER{'http://www.example.com/example.htm'}
HTTP_USER_AGENT{'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1068.0 Safari/536.3'}
REMOTE_ADDR{'101.80.225.183'}
REMOTE_HOST{'101.80.225.183'}
SERVER_DATE{'Thu Mar 15 07:50:00 2012'}
JS_date{'Thu Mar 15 2012 22:50:01 GMT+0800 (%u4E2D%u56FD%u6807%u51C6%u65F6%u95F4)'}
JS_appCodeName{'Mozilla'}
JS_appName{'Netscape'}
JS_appVersion{'5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1068.0 Safari/536.3'}
JS_platform{'Win32'}
JS_product{'Gecko'}
JS_productSub{'20030107'}
JS_userAgent{'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1068.0 Safari/536.3'}
JS_vendor{'Google Inc.'}
JS_onLine{'true'}
JS_language{'en-US'}
JS_cookieEnabled{'true'}
JS_javaEnabled{'true'}
JS_plugins{'Remoting Viewer, Native Client, Chrome PDF Viewer, Shockwave Flash, Microsoft® DRM, Microsoft® DRM, Windows Media Player Plug-in Dynamic Link Library, Google Update, PowerEnter Plug-in for SPDB, Kingsoft@Firefox ActiveX Comm'}
JS_webkitHidden{'true'}
JS_webkitVisibilityState{'hidden'}
JS_domain{'www.example.com'}
JS_referer{'http%3A//www.google.com/url%3Fsa%3Dt%26rct%3Dj%26q%3D%26esrc%3Ds%26source%3Dweb%26cd%3D6%26ved%3D0CE0QFjAF%26url%3D
http%253A%252F%252Fwww.example.com
%252Fexample.htm%26ei%3DdQFiT53pMI2UiAef9InjBQ%26usg
%3DAFQjCNGyC2cBy9CH9KQWztPxA0fu9xh4Tg'}
JS_historyLength{'1'}
JS_topLocation{'http%3A//www.example.com/example.htm'}
JS_colorDepth{'32'}
JS_pixelDepth{'32'}
JS_availHeight{'770'}
JS_availWidth{'1280'}
JS_height{'800'}
JS_width{'1280'}
JS_innerHeight{'709'}
JS_innerWidth{'1280'}
JS_locationbar{'true'}
JS_menubar{'true'}
JS_personalbar{'true'}
JS_scrollbars{'true'}
JS_statusbar{'true'}
JS_toolbar{'true'}
HTTP_ACCEPT{'*/*'}
HTTP_ACCEPT_CHARSET{'ISO-8859-1,utf-8;q=0.7,*;q=0.3'}
HTTP_ACCEPT_ENCODING{'gzip,deflate,sdch'}
HTTP_ACCEPT_LANGUAGE{'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4'}
HTTP_CONNECTION{'keep-alive'}
HTTP_HOST{'www.example.com'}
HTTP_REFERER{'http://www.example.com/example.htm'}
HTTP_USER_AGENT{'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1068.0 Safari/536.3'}
[edited by: incrediBILL at 8:40 pm (utc) on Mar 25, 2012]
[edit reason] broken down JS_referer because of length [/edit]