Welcome to WebmasterWorld Guest from 54.145.166.96

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Search and replace in apache htaccess a RewriteRule

   
10:25 am on Nov 30, 2012 (gmt 0)

5+ Year Member



Hot to replace all upcoming links in .htaccess which contain %3F and %3D in urls with ? and =

ie:

http://mysite.com/some-perma-links.html/some-image%3Ffull%3D1


and to get:

http://mysite.com/some-perma-links.html/some-image?full=1
7:19 pm on Nov 30, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



1. What have you tried so far?

2 and possibly more important: Where's the percent-encoding coming from? Normally it shouldn't be necessary to do anything about query strings. Have you got some %26 in there too or is it always a single query?
11:45 am on Dec 4, 2012 (gmt 0)

5+ Year Member



I think that google "auto-replace" ? and = in my links.

When I try to test in Google webmaster tools "Fetch as google" I paste there url with ? and = but after few second when google returne result...he say Page not found and I see that he "covert" that original url with ? and = to url with %3F and %3D
8:43 pm on Dec 4, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It's not google, it's the internet as a whole. All "special" characters are percent-encoded in transit.

3. Do the percent encodings show up in referers or in the primary request?

4. Do the pages, with normally formatted query strings, exist? Not via "fetch as googlebot" but if you paste them in directly.
9:59 pm on Dec 10, 2012 (gmt 0)

5+ Year Member



Original URLs looks like this one:

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04?full=1


Url exist on this page (when you click on zoom icon on top of image):

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04



but google translate that original url to url with %3F and %3D

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04%3Ffull%3D1


can I somehow set in htacces "auto-replace" all incoming encoded urls to be corect with ? and =
10:59 pm on Dec 10, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Backtrack here, because now I see the problem. It's interpreting your query string as part of the URL. Or, in the alternative, your site is coded so the part beginning in ? isn't getting interpreted as a query.

What's the "real" file format behind the extensionless URL?

A moderator will come along presently to change your domain name to example dot com. But the underlying problem will still be visible. Meanwhile I have been to the site and confirmed that the unescaped version works, the escaped version doesn't,* and the queryless version has a link to the troublemaking version.

I assume you have lots and lots of these and the same problem occurs everywhere. Is it always just one question mark and just one equals sign? If so, the fix is trivial. But I want to get at the underlying issue.
2:56 pm on Dec 11, 2012 (gmt 0)

5+ Year Member



yes .. it is always one ? and one = at end of url

that parameter at end of url is mean "if is sent parameter at end of url ?full=1 then show page with full size image"
12:03 am on Dec 12, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Well, "parameter" is the key word, because I get the impression it isn't being read as a parameter (Query String in htaccess-speak).

What is your "real" page?
http://example.com/some-perma-links.html/some-image
is not the name of an actual file on your actual server. The extension .html would come at the very end of any "real" filename. All the intervening / are directories and I really doubt you have a directory called something.html.

:: insert boilerplate about directory paths and the part of the URL up through "example.com" ::

So something is already being rewritten. You can't simply add another RewriteRule without knowing what the existing rules are and what they do. Otherwise it would be a simple matter of

RewriteRule ^([^%]*)%3F([^%]*)%3D([^%]*)$
http://www.example.com$1?$2=$3 [R=301,L,NE]
or even
RewriteRule ^([^%]*)%3Ffull%3D(\d+)$
http://www.example.com$1?full=$2 [R=301,L,NE]

and I can tell you right now that neither of those will work.
5:32 am on Dec 12, 2012 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I'm not sure what the OP meant by "upcoming links" but I'm going to assume incoming.

Scraper bots are the main cause of these types of googlebot queries imo, they malform the link back to your site (if they link at all) and CMS systems like wordpress flub the 404 or 301. I see Googlebot requests for urls ending in the above more than I'd like in wordpress especially, but in all CMS systems.

rel=canonical tags tell Google which version you want indexed, it's a start.
8:54 pm on Dec 12, 2012 (gmt 0)

5+ Year Member



I found way to "solve":

I chamged way of sending parameters via links with wordpress "endpoints", so now on end of permalink I added "full-size" word, so ? and = are removed from url

Tnx you guys for your time!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month