Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Search and replace in apache htaccess a RewriteRule

     
10:25 am on Nov 30, 2012 (gmt 0)

New User

5+ Year Member

joined:Aug 25, 2008
posts: 28
votes: 0


Hot to replace all upcoming links in .htaccess which contain %3F and %3D in urls with ? and =

ie:

http://mysite.com/some-perma-links.html/some-image%3Ffull%3D1


and to get:

http://mysite.com/some-perma-links.html/some-image?full=1
7:19 pm on Nov 30, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


1. What have you tried so far?

2 and possibly more important: Where's the percent-encoding coming from? Normally it shouldn't be necessary to do anything about query strings. Have you got some %26 in there too or is it always a single query?
11:45 am on Dec 4, 2012 (gmt 0)

New User

5+ Year Member

joined:Aug 25, 2008
posts: 28
votes: 0


I think that google "auto-replace" ? and = in my links.

When I try to test in Google webmaster tools "Fetch as google" I paste there url with ? and = but after few second when google returne result...he say Page not found and I see that he "covert" that original url with ? and = to url with %3F and %3D
8:43 pm on Dec 4, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


It's not google, it's the internet as a whole. All "special" characters are percent-encoded in transit.

3. Do the percent encodings show up in referers or in the primary request?

4. Do the pages, with normally formatted query strings, exist? Not via "fetch as googlebot" but if you paste them in directly.
9:59 pm on Dec 10, 2012 (gmt 0)

New User

5+ Year Member

joined:Aug 25, 2008
posts: 28
votes: 0


Original URLs looks like this one:

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04?full=1


Url exist on this page (when you click on zoom icon on top of image):

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04



but google translate that original url to url with %3F and %3D

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04%3Ffull%3D1


can I somehow set in htacces "auto-replace" all incoming encoded urls to be corect with ? and =
10:59 pm on Dec 10, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


Backtrack here, because now I see the problem. It's interpreting your query string as part of the URL. Or, in the alternative, your site is coded so the part beginning in ? isn't getting interpreted as a query.

What's the "real" file format behind the extensionless URL?

A moderator will come along presently to change your domain name to example dot com. But the underlying problem will still be visible. Meanwhile I have been to the site and confirmed that the unescaped version works, the escaped version doesn't,* and the queryless version has a link to the troublemaking version.

I assume you have lots and lots of these and the same problem occurs everywhere. Is it always just one question mark and just one equals sign? If so, the fix is trivial. But I want to get at the underlying issue.
2:56 pm on Dec 11, 2012 (gmt 0)

New User

5+ Year Member

joined:Aug 25, 2008
posts: 28
votes: 0


yes .. it is always one ? and one = at end of url

that parameter at end of url is mean "if is sent parameter at end of url ?full=1 then show page with full size image"
12:03 am on Dec 12, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12702
votes: 244


Well, "parameter" is the key word, because I get the impression it isn't being read as a parameter (Query String in htaccess-speak).

What is your "real" page?
http://example.com/some-perma-links.html/some-image
is not the name of an actual file on your actual server. The extension .html would come at the very end of any "real" filename. All the intervening / are directories and I really doubt you have a directory called something.html.

:: insert boilerplate about directory paths and the part of the URL up through "example.com" ::

So something is already being rewritten. You can't simply add another RewriteRule without knowing what the existing rules are and what they do. Otherwise it would be a simple matter of

RewriteRule ^([^%]*)%3F([^%]*)%3D([^%]*)$
http://www.example.com$1?$2=$3 [R=301,L,NE]
or even
RewriteRule ^([^%]*)%3Ffull%3D(\d+)$
http://www.example.com$1?full=$2 [R=301,L,NE]

and I can tell you right now that neither of those will work.
5:32 am on Dec 12, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


I'm not sure what the OP meant by "upcoming links" but I'm going to assume incoming.

Scraper bots are the main cause of these types of googlebot queries imo, they malform the link back to your site (if they link at all) and CMS systems like wordpress flub the 404 or 301. I see Googlebot requests for urls ending in the above more than I'd like in wordpress especially, but in all CMS systems.

rel=canonical tags tell Google which version you want indexed, it's a start.
8:54 pm on Dec 12, 2012 (gmt 0)

New User

5+ Year Member

joined:Aug 25, 2008
posts: 28
votes: 0


I found way to "solve":

I chamged way of sending parameters via links with wordpress "endpoints", so now on end of permalink I added "full-size" word, so ? and = are removed from url

Tnx you guys for your time!