Forum Moderators: phranque

Message Too Old, No Replies

RewriteRule for "example.com/?searchstring"

how to catch the? in the searchstring?

         

claus

5:52 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a small problem here. I wish to redirect a URL that starts with a questionmark, but i can't seem to make a RewriteRule that catches it. The url is similar to this one:

[example.com...]

I wish to catch the "?" part, as there might be other combinations of characters after it that i don't want. In fact, i don't want that option to return anything but "Gone". I've tried a few things already, none works:

(1)

RewriteCond %{REQUEST_URI} ^/\? [NC]
RewriteRule .* - [G]

(2)

RewriteCond %{REQUEST_URI} \? [NC]
RewriteRule .* - [G]

(3)

RewriteCond %{REQUEST_URI} ^\? [NC]
RewriteRule .* - [G]

(4)

RewriteRule ^/\? - [G]

(5)

RewriteRule ^\? - [G]

(6)

RewriteRule \? - [G]

Any ideas? Pointers?

/claus


Note: There's nothing wrong with the above rules when used on any alphanumerical character [a-Z0-9] - a questionmark does not work (dots don't work either for that matter)
Added:
Engelschalls advice on Extended Redirection [engelschall.com] does not apply here, as i still need to be able to match the questionmark before i can send it to a script, and if i can do that i can also serve a 410.

jdMorgan

6:30 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



claus,

Is it that Apache thinks you've got a query string after the '?' and moving the rest of the string from %{REQUEST_URI} into %{QUERY_STRING}?

I've used 'RewriteCond %{QUERY_STRING} !^$' and 'RewriteCond %{QUERY_STRING} ^(<some string>)$' to detect query strings before, because in a per-directory .htaccess context, Apache strips it out of %{REQUEST_URI} and it's not directly accessible in RewriteRule. The '?' itself is never 'visible' unless you use %{THE_REQUEST}.

Jim

claus

7:49 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great, that was it - i had to combine with REQUEST_URI because the Query string is used in deeper directories - anyway, this one did the trick:

------------------------------------------------------

RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !^/some-folder/ [NC]
RewriteRule .* - [G]

------------------------------------------------------

:)/claus


oooooooops... don't try this at home... This URL is somehow in Google's index, that was why i wanted to 410 it.

But...serving Googlebot a 410 for [example.com...] is definitely not a good idea, as it will show as if it was the domain that served a 410 (without the searchstring)...

I can't redirect 301 or 302 to the domain without the searchstring, as then the rule loops. Can't really 403 forbid it either, as this will also be interpreted as being done at the root...

My only option seems to make my index page a script in stead of a page, or redirect it internally to some appropriate location not equal to the index page.

[edited by: jdMorgan at 12:58 am (utc) on Dec. 2, 2003]
[edit reason] Corrected missing spaces [/edit]

jdMorgan

8:35 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How about a two-step fix... 301 redirect it to a new page, let Google index that, and then 410 that page later?

Also watch out for returning 410 to an HTTP/1.0 client (like Googlebot); 410 is new for HTTP/1.1.

Nasty problem...

Jim

claus

9:17 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, i redirected internally to a html page i made with <meta robots noindex,follow> and then i wrote to url-remove [at] google.com - i explained that i did not want my index page removed, only the URL with the searchstring.

Now i'm waiting, i hope they can sort it out. I suppose they are busy with the Florida update at the moment, so i'm not expecting anything fast.

/claus

claus

8:02 am on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a follow-up: less than 12 hours after asking for URL removal the wrong URL is now completely gone from the index. Of course, now i only have #1 and not #1 + #2, but as the other one was an error i still consider it to be better than before.

/claus