Forum Moderators: phranque

Message Too Old, No Replies

Mozilla/4.0 (compatible;)

regexp to block

         

Hobbs

8:04 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am lately seeing high scraping activity from this user agent, just Mozilla/4.0 (compatible;) nothing else many of those, from different networks and countries, these are not stripped proxy or firewall hits, they exhibit a scraper bot pattern and all have no referrer.

aa.bb.cc.dd - - [04/Jul/2007:10:23:46 -0400] "GET /path/to/a/page/ HTTP/1.1" 200 38413 "-" "Mozilla/4.0 (compatible;)"

now how do I reg exp it out in my .htaccess?

I've worked this one out:
SetEnvIfNoCase User-Agent "^Mozilla/4\.0\ \(compatible\;\)$" bad_bot

Please confirm that I will not be blocking someone else with the above and that this is the correct syntax to block just Mozilla/4.0 (compatible;) user agent.

Hobbs

10:20 am on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just in case someone lands on this unanswered thread
I've taken the plunge and so far this regexp looks to work as intended.

The question remains if there are other legitimate uses for just Mozilla/4.0 (compatible;) as a user agent but I will be finding that out with time.

jdMorgan

3:48 pm on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the past, I've had problems with both Looksmart and Verizon SuperPages using that User-agent. I'm not sure if they ever do this anymore, though...

Jim