Forum Moderators: phranque

Message Too Old, No Replies

regarding user-agent string in .htaccess "ban" list

accidentally blocks valid visitors

         

stapel

5:45 pm on Jun 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This post is "FYI":

Many of us have "ban" lists within our .htaccess files. These lists return a 403 "failed" response when user-agents such as Frontpage or Web Copier are aimed at our sites, generally by site-scrapers looking to plagiarise our content.

In the course of a recent correspondence, I discovered that people surfing with a newer version of Firefox on a Linux box were receiving this "failed" response and were thus unable to view my pages.

The particular user-agent string contained the following:

    Gecko/20060608 Ubuntu/dapper-security Firefox/

Within my user-agent "ban" list, I had the following:

    RewriteCond %{HTTP_USER_AGENT} ^.*DA.*$ [NC,OR]

This line is meant to block the "DA" download utility, of which the known versions [psychedelix.com] are "DA 3.5", "DA 4.0", "DA 5.0", and "DA 7.0". But since the above line has the "no case" descriptor (the "NC" near the end of the line), and since everything after the characters "DA" are escaped, the "da" in the "Ubuntu/dapper-security" string accidentally flagged this innocent surfer as "bad".

To correct this, I removed the "no case" designator and added an escaped space after the "DA":

    RewriteCond %{HTTP_USER_AGENT} ^.*DA\ .*$ [OR]

The correspondent has confirmed that he now has access to the site. And the download utility should still be stopped.

I hope this helps someone.

Eliz.

jdMorgan

7:17 pm on Jun 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This can be made even more specific (less dangerous to the innocent) and more efficient by changing it to:

RewriteCond %{HTTP_USER_AGENT} DA\ [0-9] [OR]

thus requiring at least one digit following the space, and eliminating the unnecessary leading and trailing ".*" patterns, which do nothing but waste CPU time.

Jim

Pfui

4:40 am on Jun 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good problem to red-flag, Elizabeth! And a head-bangingly tricky one to ID in the normal course of debugging.

I ran into the same thing when the NC'd "DA" matched a Hostname (I forget which). Ditto the iffy agent "EI" and visitors using a specialized BoEIng UA. Here are those workarounds:

SetEnvIf User-Agent "^DA" no_way 
SetEnvIf User-Agent "^EI" no_way

If fewer chars = more efficient code, I win!:)

But seeing as how Jim tends to find my code snippets eminently tweak-worthy, I defer to his expertise -- and thank God for it.

jdMorgan

4:37 pm on Jun 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nah -- both of you are doing quite well with my favorite modules these days. If UA's beginning with DA are always 'bad,' then you do indeed 'win' with the shorter start-anchored pattern. ;)

Similarly, I found no comment that could improve this recent thread [webmasterworld.com], and so left Stapel's concise and correct answer un-embellished.

The more you do this mod_rewrite/mod_access/regular-expressions stuff, the easier it gets -- except of course that you then get involved with ever-more-complex applications. But anyone with the least inclination toward logic and programming can do it -- nothing special about me, and I certainly don't want this forum to be all by or about me! Frankly, I can use all the help I can get here.

Thanks, and
carry on,

Jim