Forum Moderators: phranque
Here is part of the request in my access log.
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB5; InfoPath.2; .NET CLR 2.0.50727; OfficeLiveConnector.1.3; OfficeLivePatch.0.0)"
The requests are coming from multiple IPs.
I tried blocking this with a redirect in my .htaccess. I copied the code from an old WW thread.
Did I make a mistake by not using 'InfoPath.1' and 'InfoPath.2' .. ?
or
.. am I taking the wrong approach?
Any suggestions that I can understand will be appreciated.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
...
RewriteCond %{HTTP_USER_AGENT} ^InfoPath [OR]
...
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ ['URL'...] [L,R]
[edited by: jdMorgan at 12:52 pm (utc) on Feb. 25, 2009]
[edit reason] Fixed formatting and trimmed [/edit]
The problem is that you have specified in your regular expression that the UA must *start* with InfoPath, and the string does not start with "InfoPath", it starts with "Mozilla". If you need help with regular expressions, then see the tutorial cited in our Forum Charter.
If you still think these are scrapers, then look into checking *all* the request headers they send, including the ones that don't show up in your standard log files. Luckily, many scrapers make mistakes when spoofing user-agents, which makes it easy to block some of them.
Jim
I see what you mean. I'm certainly inept when it comes to regular expressions.
If you still think these are scrapers, then look into checking *all* the request headers they send, including the ones that don't show up in your standard log files. Luckily, many scrapers make mistakes when spoofing user-agents, which makes it easy to block some of them.
I haven't a clue how to even start looking into these 2 issues .. especially #2. Can you give me a "jump start"?
Jim