Forum Moderators: phranque

Message Too Old, No Replies

spaces in query string

         

lucy24

8:28 pm on Sep 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Quick question, I hope:

I have to make a RewriteCond that looks at the query string. (Different thread.) These are search queries, so they will contain spaces. In the RewriteCond, do these get expressed as--

literal spaces (\ escaped)
+ signs (\ escaped)
%20

or all three? My raw logs include both + and %20. (Also the occasional three-byte Japanese variant, but I won't bother about those.) Just to be extra thoughtful, g### transmits the query exactly as typed, so there may even be more than one space.

(\ |\+|%20)*

covers all bases, but is it overkill? The asterisk is because this specific query may or may not contain a space at all; I've seen both.

phranque

4:50 am on Sep 9, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the pure space shouldn't be necessary since that will be percent-escaped.
handling the + depends on "where that query string has been" - you may also need to handle the percent-escaped + as i've seen that as well.

lucy24

7:02 am on Sep 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



%2[0B] then, with [NC] flag. Ouch. And "where that query string has been" is apparently Turkey via Google Translate, because that's the only place I found a %2B in my saved logs :) This particular Rewrite is only being applied to people from South Asia, but oh well.

g1smd

7:50 am on Sep 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The trick, if you can do it, is to send spaces as the + symbol.

lucy24

5:59 pm on Sep 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Dear Google, it would help me a great deal if you would be so kind as to..."

:)

I had no idea what all those plus signs in my logs meant until I read an article on Permitted Characters in URLs (don't remember where, but it was something you linked to once). I have yet to figure out which spaces go to + and which survive long enough to get encoded; my log-wrangling routine deals with both, so I end up with spaces.

Anyway, all I have to look at is incoming queries. ("This isn't the 'swag(\+|%2[0B])*rat' you're looking for, and incidentally you'd have better luck searching for suhagrat, but it's not my problem if you can't spell.")

:: now off to figure out how to address unrelated .php issue so piwik can get in while naughty robots remain outside ::

phranque

1:36 am on Sep 10, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



"Dear Google,..

haha, yeah, right - good luck with that!

something you linked to once

it was probably an ietf.org RFC document section about Reserved/Unsafe Characters in Uniform Resource Locators/Identifiers

lucy24

3:42 am on Oct 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Postscript:

Woo Hoo, it's working. It helped when I got it into my head that the potential spaces aren't in the query string at all, they're in the referer. And then for a goodish while I didn't get any of the searches I was trawling for, which made me feel silly. But just recently I had two back-to-back.

RewriteCond %{HTTP_REFERER} \.(in|pk)/ 
RewriteCond %{HTTP_REFERER} \b(q\w*|text)(=|%3D)([%\w]*)swag(\+|%2[0B])*rat
RewriteRule swagrat\.html /paintings/refrats/swagrat_marriage.html [R=301,L]


(The part in boldface means "Yes, yes, I know." :-P But it's just one page.)

Yes, that's a Redirect, not a Rewrite. I'm not trying to be nasty, just poking fun at the visitor. I mean, don't they read the search-engine snippet? Do words like "billabong" and "coolibah tree" not provide enough of a hint that this is not the swagrat they're looking for?

In case anyone wondered, "text" is the word Yandex uses for its searches, where most places have "q" or "question". Admittedly the chance of someone in South Asia using Yandex are pretty slim, but...

g1smd

7:39 am on Oct 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Admittedly the chance of someone in South Asia using Yandex are pretty slim, but...
...you do have to cater for people on holiday who can't keep off the net, or who work abroad.