Welcome to WebmasterWorld Guest from 54.161.88.189

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Wordpress Search Converts Spaces, Fails

     
2:25 pm on Sep 13, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1182
votes: 5


I don't know if this is a Wordpress or an apache issue.

What is happening is that spaces entered in the search box are badly converted. An endless loop is created until apache final gives up.

Search: Hello

This works fine:

1.1.1.1. - - [13/Sep/2012:15:18:55 +0100] "GET /?s=hello&submit=Go HTTP/1.1" 301 20
1.1.1.1. - - [13/Sep/2012:15:18:55 +0100] "GET /search/hello/ HTTP/1.1" 200 5164

Search: Hello World

This does not:

1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /?s=Hello+World&submit=Go HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /search/Hello+World/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%252525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%25252525252525252BWorld/ HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:04 +0100] "GET /search/Hello%2525252525252525252BWorld/ HTTP/1.1" 301 20

and so on.
9:05 pm on Sept 15, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10552
votes: 10


I would check your htaccess file for mod_rewrite directives to encode special characters.
10:01 pm on Sept 15, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13204
votes: 346


Is it apache giving up, or your browser? Generally when there's a Redirect, the new request comes back a nanosecond later and the server acts as if it has never met the thing before in its life. It's the browser that has to step in and finally say "This is going nowhere fast".

%25 is fun isn't it. I meet it in doubly-encoded query strings in analytics: from %nn to %25nn. It stops being fun when wordpress goes berserk. And that's where I'd look. Not your own htaccess but the parts that came with wordpress.

The spaces themselves are fine: changing them to + signs is standard in queries. Look at any random search-engine result in your raw logs. But you're right; it should have stopped the first time. Try a few other special characters-- generally anything other than an alphanumeric, hyphen or lowline-- and see if the same thing happens to all of them. Heck, try an explicit percent sign :)
10:09 pm on Sept 15, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10552
votes: 10


it is standard to replace spaces with plus signs in search strings but this should only occur in the url in a query string parameter value.
in this case the plus signs are appearing in the path after the first redirect.
2:50 am on Sept 16, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13204
votes: 346


Search: Hello World

1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /?s=Hello+World&submit=Go HTTP/1.1" 301 20
1.1.1.1 - - [13/Sep/2012:15:20:03 +0100] "GET /search/Hello+World/ HTTP/1.1" 301 20

If that's an accurate quote, the change to plus signs is automatic; it happens as soon as you hit Confirm or Find or whatever you do to say "Start searching".

The second step-- converting an "s=" query string to an URL in /search/{searchstring} --is presumably also WordPress. Note that the + sign is unchanged at this point.

The third step is where things go wonky because the code should "recognize" that you can't keep percent-encoding a percent sign.

There's a flag meaning "don't encode" but there isn't one for "do encode" since that's the default.

Quick detour to Firefox with LiveHeaders, followed by close study of raw logs, confirms that Apache-- acting on its own volition without instructions to the contrary-- doesn't mess with a + sign at all. But if a space is allowed to enter the query string before you start redirecting, then each successive redirect will balloon from %20 to %2520 to %252520 and so on. (Firefox appears to allow 40 tries, which strikes me as over-optimistic.)

So if you can intercept the one place where + turns into %2B, then all the subsequent %25 problems should disappear on their own.
9:44 am on Sept 16, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10552
votes: 10


for one thing it's not a good practice to generate an infinite indexable url-space made up of arbitrary search strings.

more specifically, url paths are case-sensitive and /search/Hello+World/ should return the same results as /search/hello+World/ or /search/Hello+world/ and i would suggest that at the bare minimum everything would be best folded to lower case as in /search/hello+world/


have you tried changing the search form to use the POST method to see if the problem disappears in that case?