Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Updating bot ban list & cleaning out obsolete entries


wilderness - 6:16 pm on Dec 21, 2009 (gmt 0)


I tried this method to condense my bad bot list and I found it actually increased my response times. I thought that maybe if I started a line with a fixed string before the regular expression it would be more efficient. For example:
RewriteCond %{HTTP_USER_AGENT} ^webpage(widget¦downloader¦scrapper¦harvester) [NC,OR])

This line has some real issues.
1) WHY the trailing parentheses after [NC,OR] ?

2) Your Rewrite essentially reads the following:
User Agent BEGINS with the word webpage and is followed immediately (no trailing space by ANY of the words enclosed in parentheses.
2a) EX: one UA would read (at beginning) webpagewidget, which is MORE THAN very unlikely with what you believe your attempting to catch.

2b) No idea what your attempting to catch with that line?
Perhaps you could expand on what exactly your attempting to do, maybe even provide an existing UA that fits this criteria?

Don


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4046696.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com