Forum Moderators: phranque

Message Too Old, No Replies

How do I redirect requests ending in %5C

How do I redirect requests ending in %5C

         

diondeville

9:04 pm on Oct 25, 2011 (gmt 0)

10+ Year Member



A few bots keep sticking %5C at the end of URL requests which show as 404 errors in my access logs. I've tried redirecting them with the following rewrite rules to remove the %5C and send the bots/clients to the correct URL for the requested page:


RewriteBase /
RewriteCond %{QUERY_STRING} ^(.*)(\\|%5C)$ [NC,OR]
RewriteCond %{THE_REQUEST} ^(.*)(\\|%5C)$ [NC]
RewriteRule .* $1 [R,L]


The above rewrite rules don't seem to do anything. Please can you suggest a workable solution.

g1smd

10:17 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$1 will always be blank. You're not capturing anything for it. %1 has a value, but it isn't one that will at all be useful.

If it had done something it would have been a 302 redirect. You need a 301.

You'd also still be promoting duplicate content. The redirect target should contain protocol and domain name.

Additionally, both patterns with (.*) are incorrect. Never use .* at the beginning or in the middle of a RegEx pattern. Heck. That's the fourth time today that I have typed that in an answer.

Since the redirect target doesn't clear the query string, or use a replacement query string, you'd also end up with an infinite redirect loop too.

You have some of the basics right. Correct all of the above faults and you'll be close to something that will be usable.

diondeville

10:44 pm on Oct 25, 2011 (gmt 0)

10+ Year Member



My understanding is that (.*) captures everything up to the specifically stated characters that follow it and that $1 returns the contents of the first set of capturing parentheses. I have seen % used before but I assumed % was used when the rewrite rule was written on the same line as the rewrite condition. I do understand (.*) is hungry in RegEx terms and it creates a loophole for hackers to utilize but my rewrite rules block more illegal access attempts when I use it than when I don't. I think I need to read a little more about rewrite rules.

Going on your answer, I should change my rules to something similar to:

RewriteBase /
RewriteCond %{QUERY_STRING} (\\|%5C)$ [NC,OR]
RewriteCond %{THE_REQUEST} (\\|%5C)$ [NC]
RewriteRule .* [R=301,L]


But how do I reintroduce everything in the URL up to the captured component (\\|%5C) and how do I implement "The redirect target should contain protocol and domain name."?

lucy24

10:49 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{THE_REQUEST} ^(.*)(\\|%5C)$ [NC]

This one will always fail, because {THE_REQUEST} means the whole package:

GET /{blahblah} HTTP/1.1

They're not sticking a backslash at the end of the HTTP/1.1\ are they? You probably just mean {REQUEST_URI}.

I'm tempted to say that a robot which appends a gratuitous backslash is probably a stupid robot that you don't need anyway ;)

Do your pages actually have query strings? If they don't, you don't need that Condition.

Overlapping:
My understanding is that (.*) captures everything up to the specifically stated characters that follow it and that $1 returns the contents of the first set of capturing parentheses.

In .htaccess, $1 $2 etc. refer to captures within the Rule while %1 %2 etc. refer to captures within the Conditions.

(.*) does what you want-- but it does it in a time-and-energy-consuming way. Regular Expressions are greedy by default, so (.*) means "capture the whole thing". And then it has to backtrack and redo the pattern until it comes out with one that doesn't include your requested text. Never use .* if it can be replaced by something more specific.

RewriteRule .* [R=301,L]

This is flat-out wrong and will probably result in a 500 error because you need two pieces: a pattern and a target.

diondeville

11:03 pm on Oct 25, 2011 (gmt 0)

10+ Year Member



Thanks, Lucy24, I was wondering about the meanings of the different back-references. I'll try with REQUEST_URI and see what happens.

That's a good point about query strings. It's a WordPress site so bots shouldn't be calling query strings and if they are I should ban them outright.

I tend to agree with your idea of banning bots that stick backslashes at the end of URLs but I've opted to redirect them just in case they're not bad bots.

I can't be the only one who's getting this problem. Many of my sites see similar calls to correct URLs but with a backslash or %5C tagged the end. Is it something you see often?

Edited to add:

I understand why "RewriteRule .* [R=301,L]" is wrong. I wrote it in a rush. Thanks.

diondeville

11:14 pm on Oct 25, 2011 (gmt 0)

10+ Year Member



Hungry or not, this works

RewriteBase /
RewriteCond %{REQUEST_URI} ^(.*)(\\|%5C)$ [NC]
RewriteRule .* %1 [R=301,L]


Thank you :)

g1smd

11:39 pm on Oct 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if you care anything at all about site loading times, leaving rules open to hackers, and various other things, you'd strive to get the code 100% right rather than use something that is half-baked and barely does the job for valid requests and at the same time creates some "interesting issues" for certain non-valid requests.

How do you add the protocol and domain name to the target? Type your protocol and your domain before the %1 or the $1.

RewriteBase / is the default and is not needed here.

diondeville

11:54 pm on Oct 25, 2011 (gmt 0)

10+ Year Member



I will get the directives as 100% as can be but for the meantime, until my knowledge is greater, the above will suffice. I am open to your better knowledge if your care to show me how you would achieve the desired result.

g1smd

12:03 am on Oct 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We've got 80 000 previous threads here in the Apache forum. Some 60 000 of those are answering the same sixty basic questions roughly one thousand times each. There's more code examples than you could shake a stick at if you're prepared to read and learn.