homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
%3F and %3D embedded links code is causing site to run very slow
%3F and %3D embedded links code is causing site to run very slow
hottrout



 
Msg#: 4413102 posted 4:05 pm on Feb 1, 2012 (gmt 0)

I am currently having google webmaster errors caused by ? being replaced with %3F and = being replaced with %3D. I searched the forum to find this previous thread and answer

[webmasterworld.com...]

The code is superb and works very well, however it does seem to have a drastic effect on my website speed. It increases the page load times by an additional 5-6 seconds. My site run on a dedicated physical server and I usually have excellent performance for the site.

Is there anything that shed more light on this? Do you want me to post my entire htaccess file to help you diagnose?

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4413102 posted 12:03 am on Feb 2, 2012 (gmt 0)

5-6 SECONDS?! How long are your query strings? Or rather, how deep are the queries nested? The top layer should always come through as a literal ? while the others get converted. The question has come up more recently than 2010; I remember pawing through raw logs to find something with nested queries. (In my case, it was piwik passing along search-engine information.)

Normally a query means you're going to a php or similar page that deals with it. Feed the whole thing into the disencode function (don't remember its formal name but you know what I mean) before anything else.

You shouldn't have to involve htaccess at all unless you're converting incoming queries to pretty URLs. Is that what you're doing? If so, we need to figure out what order to do things in. 6-7 seconds is definitely over the top unless you have the world's slowest server.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4413102 posted 12:12 am on Feb 2, 2012 (gmt 0)

If your RegEx patterns contain (.*) in any position other than the end, you have very inefficient code.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4413102 posted 12:52 am on Feb 2, 2012 (gmt 0)

Is there anything that shed more light on this?


You apparently skipped right over the portions of Jim's old thread reply, in which he advised to FIRST locate the issues that were causing these malformed-multiple-?'s

At least from the outpoint standpoint of your own website (s) URL's. Correct them first and then correct the inbound links that resulted from your previously malformed links.

hottrout



 
Msg#: 4413102 posted 9:10 am on Feb 2, 2012 (gmt 0)

Actually I read all of the portions of Jim's old thread reply. I might not understand it entirely but I did read it.

The malformed links are not being created on my site. Google webmaster tools is reporting them as 404's coming in from other sites. For a reason outside of my control their browser is seeing the link and replacing the ? and = with the replacement %code.

Am I wrong in thinking that there is nothing I can do from my side? A full check confirms that my site produces the URLs correctly with the ? and =.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4413102 posted 9:16 am on Feb 2, 2012 (gmt 0)

There is an alternative way of fixing this.

Rewrite (that's rewrite, not redirect) those requests and only those requests to a simple PHP script that uses preg_replace to fix up the URLs. The PHP script then sends the required HEADER directives for the 301 status and for the new location.

Unusually, the rewrite code will be placed before any canonical redirects. Necessarily, the code will be placed after any code that blocks malicious requests.

hottrout



 
Msg#: 4413102 posted 3:59 pm on Feb 15, 2012 (gmt 0)

I left this for a few days due to other work requirements and came back to it with fresh eyes. Knowing that the URL that uses the ? and the = is always the same path, I constructed this code to try and more easily capture %3D and %3F codes from google and process them properly. Unfortunatly the code does not produce any output, it does not seem to catch the incomming link. Could you please check my concept and let me know if it is even possible?

The incoming links from google look like this:-

http://www.example.com/cars/index.php%3Ffolder%3dford
http://www.example.com/cars/index.php%3Ffolder%3dford/mustang
http://www.example.com/cars/index.php%3Ffolder%3dford/mustang/bigbore

they are created on the site and should actually look like this:-

http://www.example.com/cars/index.php?folder=ford
http://www.example.com/cars/index.php?folder=ford/mustang
http://www.example.com/cars/index.php?folder=ford/mustang/bigbore

The code I was working on looks like this:-

# Redirect to remove query string characters from folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\%3[fF]folder\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^/cars/index\.php$ http://www.example.com/cars/index.php?folder=%2 [R=301,L]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4413102 posted 4:05 pm on Feb 15, 2012 (gmt 0)

Remove the leading / after ^ if the rule resides in .htaccess.

You migh t also need to remove the $ after .php too.

It's also time for you to start looking at using extensionless URLs that don't use parameters. :)

hottrout



 
Msg#: 4413102 posted 5:16 pm on Feb 15, 2012 (gmt 0)

Nice one. That fixed the code straight away and it works. Can I also ask if it is efficient? I am learning as I travel through the underworld of regex but sometime I find myself transfixed on a line of code and getting a headache to no avail.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4413102 posted 5:21 pm on Feb 15, 2012 (gmt 0)

The initial problem is that slashes have a particular meaning in a URL are not even a valid character in a query string name or query string value.

Overall your URL structure itself is the problem. You'd have less of a long term headache if you use folder and hyphen separated URLs with no parameters at all.

A set of internal rewrites then translate these new URL requests into the internal request pattern needed by your scripts.

hottrout



 
Msg#: 4413102 posted 5:36 pm on Feb 15, 2012 (gmt 0)

If I add additional folders later on and I use the same script to process. Can I Amend the code as follows to cope with any folder that has index.php?folder= expression used within it?

# Redirect to remove query string characters from folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\%3[fF]folder\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^index\.php$ %1index.php?folder=%2 [R=301,L]

or is it this

# Redirect to remove query string characters from folder
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\%3[fF]folder\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/%1index.php?folder=%2 [R=301,L]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4413102 posted 11:57 pm on Feb 15, 2012 (gmt 0)

The second, if you're asking about the protocol-plus-domain part. You want all redirects to go to the identical form of the domain name.

Would it be enough to say

RewriteCond %{THE_REQUEST} %\h\h

to grab anything that contains percent-encoded text?

System
redhat


 
Msg#: 4413102 posted 3:35 pm on May 18, 2012 (gmt 0)

The following 7 messages were cut out to new thread by incredibill. New thread at: apache/4455470.htm [webmasterworld.com]
7:13 pm on May 19, 2012 (PST -8)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved