Forum Moderators: phranque

Message Too Old, No Replies

Problem about blocking webzip.

         

skpippen

4:13 am on Jan 3, 2006 (gmt 0)

10+ Year Member



I read the thread on this forum:
[webmasterworld.com...]
and add the following lines in my .htaccess file.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} webzip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} teltport [NC]
RewriteRule ^(.*) - [F]

But when I test it with webzip 6.0, it doesn't work. I checked my access log and found that HTTP_USER_AGENT was detected as "Mozilla/4.0 (compatible; MSIE 6.0; Win32)". I think this is the reason.

Anyone know how to resolve this problem?

Thanks.

jdMorgan

4:41 am on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, go ahead and add that user-agent to your list; It's not a valid browser user-agent, so there's not much danger of blocking a real visitor as long as you use an exact compare:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Win32\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} webzip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} teltport [NC]
RewriteRule ^(.*) - [F]

Another alternative if you're running php would be to add AlexK's most recent version of xlcus' bad-bot script [webmasterworld.com].

Jim

Pfui

5:50 am on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did Apache 2.X (or Ralf Engelschall?) change how UA spaces are marked and/or semi-colons escaped in mod_rewrite? Given the preceding browser, I'd write it as this under 1.3.X:

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0.\(compatible\;.MSIE.6\.0\;.Win32\) [OR] 

Here's hoping one version's regex is akin to the other's because the above is the only mod_rewrite language I understand -- and three cheers, it works!

jdMorgan

6:44 am on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Literal spaces need to be escaped as "\ " (backslash-space) and semi-colons need not be escaped at all. An unescaped period (a.k.a. dot or full stop) means "match any single character."

Any character that has a special meaning as a regex token needs to be escaped, such as ".", "(" and ")" in this example. The complete list for regex in mod_rewrite would be $ % ^ * ( ) + { } [ ] ¦ \ . ? Spaces need to be escaped because mod_rewrite treats them as delimiters.

I don't see anything in Ralph's documentation that contradicts this (?) - It's not even mentioned as far as I can see/find. And most of my servers are still Apache 1.3.x.

It doesn't usually hurt anything to escape a character that doesn't need to be escaped, and a space will match the "." pattern, which is why your pattern works.

The pattern I show above is correct, maximally-specific, and optimized, AFAIK.

Jim

skpippen

3:50 pm on Jan 3, 2006 (gmt 0)

10+ Year Member



Thank you very much. It works well now.

But I have no idea about all other bad browsers. So I'm considering to block bad browsers by allowing good browsers only instead of usging a large list of bad browsers.

What do you think about this idea?

jdMorgan

8:06 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You will need about a year to collect all browsers' user-agent names, and then analyze these logs to find the 'patterns' among good user-agents. It is a very difficult project, and if you make an error, you will block legitimate users.

And of course, new valid user-agent names appear every week, so there is never an end to this project.

Jim

Pfui

8:11 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim, I wish I could remember where I learned the 'escape language' I've been using for (mumbles) years. Part of me thinks it hails from my perl CGI tweaking days, coupled with an ongoing fondness for grepping server logs, but I simply don't recall. Even though it's not completely proper, I must confess I'm glad it still does what I need it to do.

Shoot. Given the vagaries of mod_rewrite's code and effects, and the near-daily increases in abusive bots and scrapers, I'm feeling darn near whelmed playing catch-up. I already rewrite ALL non-Mozilla UAs but for robots.txt-respecting bots -- and still the bad Mozilla.* variations and cloaks come. There's GOT to be a better way....

Ah, well. That's enough meandering musing for one post, sorry. We now return you to your regularly scheduled Q&A:)