homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Search and replace in apache htaccess a RewriteRule
nicolass




msg:4523852
 10:25 am on Nov 30, 2012 (gmt 0)

Hot to replace all upcoming links in .htaccess which contain %3F and %3D in urls with ? and =

ie:

http://mysite.com/some-perma-links.html/some-image%3Ffull%3D1


and to get:

http://mysite.com/some-perma-links.html/some-image?full=1

 

lucy24




msg:4523941
 7:19 pm on Nov 30, 2012 (gmt 0)

1. What have you tried so far?

2 and possibly more important: Where's the percent-encoding coming from? Normally it shouldn't be necessary to do anything about query strings. Have you got some %26 in there too or is it always a single query?

nicolass




msg:4524929
 11:45 am on Dec 4, 2012 (gmt 0)

I think that google "auto-replace" ? and = in my links.

When I try to test in Google webmaster tools "Fetch as google" I paste there url with ? and = but after few second when google returne result...he say Page not found and I see that he "covert" that original url with ? and = to url with %3F and %3D

lucy24




msg:4525088
 8:43 pm on Dec 4, 2012 (gmt 0)

It's not google, it's the internet as a whole. All "special" characters are percent-encoded in transit.

3. Do the percent encodings show up in referers or in the primary request?

4. Do the pages, with normally formatted query strings, exist? Not via "fetch as googlebot" but if you paste them in directly.

nicolass




msg:4526714
 9:59 pm on Dec 10, 2012 (gmt 0)

Original URLs looks like this one:

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04?full=1


Url exist on this page (when you click on zoom icon on top of image):

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04



but google translate that original url to url with %3F and %3D

http://fabzz.com/cheryl-cole-at-jingle-bell-ball-in-london.html/cheryl-cole-at-jingle-bell-ball-04%3Ffull%3D1


can I somehow set in htacces "auto-replace" all incoming encoded urls to be corect with ? and =

lucy24




msg:4526737
 10:59 pm on Dec 10, 2012 (gmt 0)

Backtrack here, because now I see the problem. It's interpreting your query string as part of the URL. Or, in the alternative, your site is coded so the part beginning in ? isn't getting interpreted as a query.

What's the "real" file format behind the extensionless URL?

A moderator will come along presently to change your domain name to example dot com. But the underlying problem will still be visible. Meanwhile I have been to the site and confirmed that the unescaped version works, the escaped version doesn't,* and the queryless version has a link to the troublemaking version.

I assume you have lots and lots of these and the same problem occurs everywhere. Is it always just one question mark and just one equals sign? If so, the fix is trivial. But I want to get at the underlying issue.

nicolass




msg:4526986
 2:56 pm on Dec 11, 2012 (gmt 0)

yes .. it is always one ? and one = at end of url

that parameter at end of url is mean "if is sent parameter at end of url ?full=1 then show page with full size image"

lucy24




msg:4527150
 12:03 am on Dec 12, 2012 (gmt 0)

Well, "parameter" is the key word, because I get the impression it isn't being read as a parameter (Query String in htaccess-speak).

What is your "real" page?
http://example.com/some-perma-links.html/some-image
is not the name of an actual file on your actual server. The extension .html would come at the very end of any "real" filename. All the intervening / are directories and I really doubt you have a directory called something.html.

:: insert boilerplate about directory paths and the part of the URL up through "example.com" ::

So something is already being rewritten. You can't simply add another RewriteRule without knowing what the existing rules are and what they do. Otherwise it would be a simple matter of

RewriteRule ^([^%]*)%3F([^%]*)%3D([^%]*)$
http://www.example.com$1?$2=$3 [R=301,L,NE]
or even
RewriteRule ^([^%]*)%3Ffull%3D(\d+)$
http://www.example.com$1?full=$2 [R=301,L,NE]

and I can tell you right now that neither of those will work.

Sgt_Kickaxe




msg:4527227
 5:32 am on Dec 12, 2012 (gmt 0)

I'm not sure what the OP meant by "upcoming links" but I'm going to assume incoming.

Scraper bots are the main cause of these types of googlebot queries imo, they malform the link back to your site (if they link at all) and CMS systems like wordpress flub the 404 or 301. I see Googlebot requests for urls ending in the above more than I'd like in wordpress especially, but in all CMS systems.

rel=canonical tags tell Google which version you want indexed, it's a start.

nicolass




msg:4527499
 8:54 pm on Dec 12, 2012 (gmt 0)

I found way to "solve":

I chamged way of sending parameters via links with wordpress "endpoints", so now on end of permalink I added "full-size" word, so ? and = are removed from url

Tnx you guys for your time!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved