Welcome to WebmasterWorld Guest from 23.22.17.192

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Wordpress Search Converts Spaces, Fails

   
2:25 pm on Sep 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know if this is a Wordpress or an apache issue.

What is happening is that spaces entered in the search box are badly converted. An endless loop is created until apache final gives up.

Search: Hello

This works fine:

1.1.1.1. - - [13/Sep/2012:15:18:55 +0100] "GET /?s=hello&submit=Go HTTP/1.1" 301 20
1.1.1.1. - - [13/Sep/2012:15:18:55 +0100] "GET /search/hello/ HTTP/1.1" 200 5164

Search: Hello World

This does not:

1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /?s=Hello+World&submit=Go HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /search/Hello+World/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252525252525252BWorld/ HTTP/1.1" 301 20

and so on.
9:05 pm on Sep 15, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I would check your htaccess file for mod_rewrite directives to encode special characters.
10:01 pm on Sep 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Is it apache giving up, or your browser? Generally when there's a Redirect, the new request comes back a nanosecond later and the server acts as if it has never met the thing before in its life. It's the browser that has to step in and finally say "This is going nowhere fast".

%25 is fun isn't it. I meet it in doubly-encoded query strings in analytics: from %nn to %25nn. It stops being fun when wordpress goes berserk. And that's where I'd look. Not your own htaccess but the parts that came with wordpress.

The spaces themselves are fine: changing them to + signs is standard in queries. Look at any random search-engine result in your raw logs. But you're right; it should have stopped the first time. Try a few other special characters-- generally anything other than an alphanumeric, hyphen or lowline-- and see if the same thing happens to all of them. Heck, try an explicit percent sign :)
10:09 pm on Sep 15, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



it is standard to replace spaces with plus signs in search strings but this should only occur in the url in a query string parameter value.
in this case the plus signs are appearing in the path after the first redirect.
2:50 am on Sep 16, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Search: Hello World

1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /?s=Hello+World&submit=Go HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /search/Hello+World/ HTTP/1.1" 301 20

If that's an accurate quote, the change to plus signs is automatic; it happens as soon as you hit Confirm or Find or whatever you do to say "Start searching".

The second step-- converting an "s=" query string to an URL in /search/{searchstring} --is presumably also WordPress. Note that the + sign is unchanged at this point.

The third step is where things go wonky because the code should "recognize" that you can't keep percent-encoding a percent sign.

There's a flag meaning "don't encode" but there isn't one for "do encode" since that's the default.

Quick detour to Firefox with LiveHeaders, followed by close study of raw logs, confirms that Apache-- acting on its own volition without instructions to the contrary-- doesn't mess with a + sign at all. But if a space is allowed to enter the query string before you start redirecting, then each successive redirect will balloon from %20 to %2520 to %252520 and so on. (Firefox appears to allow 40 tries, which strikes me as over-optimistic.)

So if you can intercept the one place where + turns into %2B, then all the subsequent %25 problems should disappear on their own.
9:44 am on Sep 16, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



for one thing it's not a good practice to generate an infinite indexable url-space made up of arbitrary search strings.

more specifically, url paths are case-sensitive and /search/Hello+World/ should return the same results as /search/hello+World/ or /search/Hello+world/ and i would suggest that at the bare minimum everything would be best folded to lower case as in /search/hello+world/


have you tried changing the search form to use the POST method to see if the problem disappears in that case?