Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

Search and replace in apache htaccess a RewriteRule

 10:25 am on Nov 30, 2012 (gmt 0)

Hot to replace all upcoming links in .htaccess which contain %3F and %3D in urls with ? and =



and to get:




 7:19 pm on Nov 30, 2012 (gmt 0)

1. What have you tried so far?

2 and possibly more important: Where's the percent-encoding coming from? Normally it shouldn't be necessary to do anything about query strings. Have you got some %26 in there too or is it always a single query?


 11:45 am on Dec 4, 2012 (gmt 0)

I think that google "auto-replace" ? and = in my links.

When I try to test in Google webmaster tools "Fetch as google" I paste there url with ? and = but after few second when google returne result...he say Page not found and I see that he "covert" that original url with ? and = to url with %3F and %3D


 8:43 pm on Dec 4, 2012 (gmt 0)

It's not google, it's the internet as a whole. All "special" characters are percent-encoded in transit.

3. Do the percent encodings show up in referers or in the primary request?

4. Do the pages, with normally formatted query strings, exist? Not via "fetch as googlebot" but if you paste them in directly.


 9:59 pm on Dec 10, 2012 (gmt 0)

Original URLs looks like this one:


Url exist on this page (when you click on zoom icon on top of image):


but google translate that original url to url with %3F and %3D


can I somehow set in htacces "auto-replace" all incoming encoded urls to be corect with ? and =


 10:59 pm on Dec 10, 2012 (gmt 0)

Backtrack here, because now I see the problem. It's interpreting your query string as part of the URL. Or, in the alternative, your site is coded so the part beginning in ? isn't getting interpreted as a query.

What's the "real" file format behind the extensionless URL?

A moderator will come along presently to change your domain name to example dot com. But the underlying problem will still be visible. Meanwhile I have been to the site and confirmed that the unescaped version works, the escaped version doesn't,* and the queryless version has a link to the troublemaking version.

I assume you have lots and lots of these and the same problem occurs everywhere. Is it always just one question mark and just one equals sign? If so, the fix is trivial. But I want to get at the underlying issue.


 2:56 pm on Dec 11, 2012 (gmt 0)

yes .. it is always one ? and one = at end of url

that parameter at end of url is mean "if is sent parameter at end of url ?full=1 then show page with full size image"


 12:03 am on Dec 12, 2012 (gmt 0)

Well, "parameter" is the key word, because I get the impression it isn't being read as a parameter (Query String in htaccess-speak).

What is your "real" page?
is not the name of an actual file on your actual server. The extension .html would come at the very end of any "real" filename. All the intervening / are directories and I really doubt you have a directory called something.html.

:: insert boilerplate about directory paths and the part of the URL up through "example.com" ::

So something is already being rewritten. You can't simply add another RewriteRule without knowing what the existing rules are and what they do. Otherwise it would be a simple matter of

RewriteRule ^([^%]*)%3F([^%]*)%3D([^%]*)$
http://www.example.com$1?$2=$3 [R=301,L,NE]
or even
RewriteRule ^([^%]*)%3Ffull%3D(\d+)$
http://www.example.com$1?full=$2 [R=301,L,NE]

and I can tell you right now that neither of those will work.


 5:32 am on Dec 12, 2012 (gmt 0)

I'm not sure what the OP meant by "upcoming links" but I'm going to assume incoming.

Scraper bots are the main cause of these types of googlebot queries imo, they malform the link back to your site (if they link at all) and CMS systems like wordpress flub the 404 or 301. I see Googlebot requests for urls ending in the above more than I'd like in wordpress especially, but in all CMS systems.

rel=canonical tags tell Google which version you want indexed, it's a start.


 8:54 pm on Dec 12, 2012 (gmt 0)

I found way to "solve":

I chamged way of sending parameters via links with wordpress "endpoints", so now on end of permalink I added "full-size" word, so ? and = are removed from url

Tnx you guys for your time!

