Welcome to WebmasterWorld Guest from 54.145.166.247

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Htaccess rewrite to replace %3F with ?

rewrite: replace %3F with?

   
9:56 pm on Aug 3, 2011 (gmt 0)



htaccess rewrite to replace %3F with ?

The problem:

Some scrapers/bots grab search engine results, which they then insert into their site, but search engine results contain links that are url encoded, meaning a query string ? become %3F. If a link to your site contains %3F it will result in a 404 error, as it will make the server search for a directory name ? which cannot exist. Google webmaster tools will then report errors!

My fix, after many false leads:

You have to use THE_REQUEST to catch the string:

After Your:

RewriteEngine on


Add:

# replace %3F with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.*)\%3[Ff](.*)\ HTTP/ [NC]
RewriteRule \.*$ http://www.mysite.com/%1?%2 [R=301,L]


My skills at apache re-writes are limited, but it worked for me, I hope the solution helps others. :)

PS tried posting this elsewhere, but their 'overprotective' system wouldn't accept!
10:11 pm on Aug 3, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This isn't a rewrite. This is a redirect - even though it uses a RewriteRule. Confusing isn't it?

The (.*) pattern matches the entire
GET /somefile?somequery=somevalue HTTP/1.1
request. By using two (.*) patterns you are saying that you want to match "everything" into the first (.*) and then match it all again in the second one. This confuses the parser beyond belief. It then has to make tens of thousands of "back off and retry" trial matches until it discovers you only want "some" of the input in the first backreference and "some" of it in the second.

Replace the two (.*) patterns with something more specific, such as
([^\%]+)
and
([^\ ]+)


If you specify [Ff] you don't need the [NC] flag to allow "any case" as you have covered both cases by using [Ff]. You could use [F] and [NC] of course.

A redirect such as this should be listed as one of the very first redirects in your list of redirects.
10:16 pm on Aug 3, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



[Ff] with [NC] is redundant ;)
10:30 pm on Aug 3, 2011 (gmt 0)



Thanks, I realised it was a redirect, just did'nt bother stating that.

Thanks for pointing out the NC, missed that.

And thanks for pointing out the .* issue, your suggested replacement works great.