Welcome to WebmasterWorld Guest from 54.163.210.170

Forum Moderators: Ocean10000 & incrediBILL & phranque

Is this the right way to block Mozilla/5.0 Jorgee?

     
4:30 pm on Nov 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts: 2264
votes: 0


It doesn't seem to be working :(

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Mozilla/5\.0\ Jorgee [NC]
RewriteRule !^robots\.txt$ - [F]
</IfModule>


TIA
6:49 pm on Nov 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14323
votes: 562


It looks right. I'm assuming you've snipped other stuff for posting purposes, and this is not the entire mod_rewrite section of your htaccess. What, exactly, does “doesn’t seem to be working” mean? Either it works or it doesn't; what do logs say?

Tangential comments:
-- the <IfModule> envelope isn't needed. Either you've got mod_rewrite, or you haven't. If you use a CMS, you may need the envelope just for the CMS-specific section of your htaccess, because the program looks for it. Otherwise, no need.
-- are there other, non-Mozilla Jorgees that you are willing to admit? Surely not. And then all you need in the HTTP_USER_AGENT box is "Jorgee" alone--no quotation marks--which in turn means you don't need to deal with the escaped space. (Don't know about anyone else, but this always makes me uneasy; edit the wrong thing and everything blows up.)
-- don't say [NC] unless you really need to support every possible casing. If you mean [Jj]orgee, say so. Otherwise, learn what exact casing it uses, and say that.
-- it's a bit wasteful to have the !^robots\.txt element on every single access-control rule. More efficient if you start out with a pair of lines like
RewriteRule ^robots\.txt - [L]
RewriteRule ^forbidden\.html - [L]
substituting the exact name and filepath of your custom 403 page. This goes before all access-control rules, overriding the usual “most severe - to - least severe” ordering.
7:44 pm on Nov 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts: 2264
votes: 0


Thanks, Lucy.

>> I'm assuming you've snipped other stuff for posting purposes,
Yes, I did snip the prior lines (lines ending in [NC | OR]

What do I mean about not working? Well, the Wordfence plugin in Wordpress shows me a "liveview" of the traffic hitting my site and in there ... I've got Mozilla/5.0 Jorgee! Wordfence does block the bot, but if the htaccess was working I'm assuming that Wordpress code wouldn't be executed and Jorgee wouldn't feature in the Wordfence logs.

To address the point about CMS, as above I'm using WP and WP is in the root folder. The htaccess I'm talking about is also in root. But, yes, there's already a CMS related entry in the htaccess added by the W3 caching plugin:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP:Accept-Encoding} gzip
etc


I had added the entire code from my OP just below that. It would appear from your post that this is not correct. How about if I remove the existing code and place the below at the top of my htaccess?


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(jorgee).* [NC]
RewriteRule ^(.*)$ - [L,R=403]
7:48 pm on Nov 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts: 2264
votes: 0


Oh, if I am to remove the NC, would the below be correct?

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(Jj)orgee.*
RewriteRule ^(.*)$ - [L,R=403]
8:53 pm on Nov 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14323
votes: 562


^.*(Jj)orgee.*
The (parentheses) are a mistake--in fact a mistake that would prevent the rule from executing as intended--because they mean "look for ‘Jjorgee’ and capture the ‘Jj’ part".
The ^.* is completely superfluous and may add a few picoseconds to execution time. The trailing .* is also not needed at all. Without anchors it is simply
[Jj]orgee
but, again, surely it's one or the other? It's quite rare for robots to change their casing.

the Wordfence plugin
I don't personally speak WordPress, and our resident WP expert is currently incommunicada, but have a closer look at the documentation for the plugin. Does it track requests or only successful requests?

Edit: Although [L,R=403] is not technically wrong--that is, it won't create unintended consequences--it's completely unnecessary. All you need is [F] which carries an implied [L]. The [F] flag is a shortcut for 403, as [G] is a shortcut for 410.

All access-control rules need to go before the WP section that involves the /index.php rewrite. Otherwise they will never execute at all.
9:12 pm on Nov 24, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts: 2264
votes: 0


>>All access-control rules need to go before the WP section that involves the /index.php rewrite. Otherwise they will never execute at all
Aha! That was possibly the problem earlier as my code was definitely AFTER that /index.php rewrite!

So, implementing the above I get:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} [Jj]orgee
RewriteRule ^ - [F]


Looks okay?

And thanks again for all your help.
10:11 pm on Nov 24, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11121
votes: 111


lucy24's point was that the [NC] is typically unnecessary and you should specify the case used in the user agent string.

use this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Jorgee
RewriteRule ^ - [F]


unless you are also getting requests from "jorgee"...

[edited by: phranque at 10:20 pm (utc) on Nov 24, 2017]

10:11 pm on Nov 24, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14323
votes: 562


Should be OK now.

For future reference, the overall order of RewriteRules--you may not have all of these--goes

0. Exemptions, such as robots.txt or forbidden.html
1. Access control ([F] flag)
2. Permanently gone ([G] flag)
3. External redirects ([R=301] or [R] flag), where “external” doesn't necessarily mean “some other site”, it just means “tell the visitor to make a different request”
4. Internal rewrites ([L] flag alone), including rules that belong to a CMS
5. Rules with no flag at all, such as setting cookies or environmental variables

And then, within each of those groups, list the rules from most-specific to most-general.

Good luck!
2:49 am on Nov 26, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7921
votes: 562


^^^ umpteen thousands of RTFM words reduced to commonsense. Kudos!
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members