Forum Moderators: open

Message Too Old, No Replies

Whitespace in the green URL's

how to remove them with mod_rewrite

         

RonPK

5:34 pm on Sep 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google adds a whitespace to long URL's that get displayed as green text on the results page - probably to force a nice line break if necessary. Some people seem to copy those green URL's and paste htem into the address bar, instead of simply clicking the link (maybe to prevent webmasters from seeing what brought them to the site?). This offcourse causes a 404 error on my site, which I'm trying to prevent :)

Does anybody know a valid RewriteRule for Apache's mod_rewrite that strips out whitespaces from URL's? The whitespace always follows a /, so there is a nice pattern to match. I tried some rules myself, but after repeatedly being forced to kill those threads, I've given up :(

andreasfriedrich

2:33 pm on Sep 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The following rule set strips off spaces from the URL and forces an external redirect to the new URL.

RewriteRule "([^ ]*) +(.*)" $1$2 [N,E=AC_REWRITE:true]
RewriteCond %{ENV:AC_REWRITE} true
RewriteRule (.*) [%{HTTP_HOST}$1...] [R=permanent,L]

First RewriteRule
Match as many characters that are not spaces (([^ ]*) - store them in $1) followed by one or more space (+) followed by as many characters as possible ((.*) - and store them in $2). If the rule matches we rewrite to $1$2 which contains the URL without the first set of spaces and store true in the environment variable AC_REWRITE. This process is repeated until the rule no longer matches (flag N).

RewriteCond
Look up the environment variable. Rewrite only if following RewriteRule matches and AC_REWRITE is true.

First RewriteRule
Force an external redirect in case we removed any spaces.

This rule set should be at the beginning of your rewriting process since the flag N will restart rewriting with the first rewrite rule.

meannate

8:09 pm on Sep 23, 2002 (gmt 0)

10+ Year Member



I've noticed the same thing in some of our referrals from Google.

Wouldn't there be a way to do this with PHP?

RonPK

8:31 pm on Sep 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



meannate: I first used a PHP-solution:


if (strstr(rawurldecode($_SERVER['REQUEST_URI']),"/ ")) {
header("Location: http://" . $_SERVER['SERVER_NAME'] . str_replace("/ ", "/", rawurldecode($_SERVER['REQUEST_URI'])));
exit;
}

in my 404.php. This works fine. The only (very minor) disadvantage is that each of these hits is written to Apache's errorlog.

Haven't tried andreas' rules yet, will do very soon.

RonPK

9:38 pm on Sep 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Still haven't tried andreas' rules, but I found this solution on Usenet:

RewriteEngine on
RewriteRule ^(.*)/\ (.*)$ $1/$2 [N,R=permanent]

This works fine when used in httpd.conf, whereas andreas' rules seem good for .htaccess.

andreasfriedrich

12:06 pm on Sep 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule ^(.*)/\ (.*)$ $1/$2 [N,R=permanent]

While this approach is easier, just one rule it will cause an external redirect, i.e. extra traffic after each substitution. My rules strip off all whitespace internally and then do one external redirect to get the URL right in the userīs browser.

whereas andreas' rules seem good for .htaccess.

Actually I tested my rules only in httpd.conf and not in a .htaccess file although they should work there just as well.

Andreas

RonPK

9:40 pm on Sep 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Andreas: you're 100% right, and your rules work fine in my httpd.conf.

The accesslog shows one hit. The only -minor- disadvantage is that the URL in the browser's address bar keeps the ugly %20's.

Slade

10:28 pm on Sep 24, 2002 (gmt 0)

10+ Year Member



If you did an external rewrite ([R=permanent,L]), your users should be redirected to the real URL. Some browsers don't always redisplay that URL, even though it's been changed.

I have a simple domain.com to www.domain.com rule in affect on one of my sites and right now(time of day, alignment of planets,etc) it's redirecting properly, in IE and Opera. Last night, it didn't (different pc, but shouldn't matter?).

andreasfriedrich

11:00 pm on Sep 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you did an external rewrite

Thatīs what the [R=permanent,L] does in the second RewriteRule in my post #2 [webmasterworld.com].

Andreas

jdMorgan

2:48 am on Sep 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



andreasfriedrich,

Thanks for that sophisticated little rewrite - I like the fact that it does the fix-up recursively, and then does the external redirect - very neat, very clean.

Jim