Welcome to WebmasterWorld Guest from 18.210.22.132

Forum Moderators: open

Message Too Old, No Replies

Whitespace in the green URL's

how to remove them with mod_rewrite

     
5:34 pm on Sep 22, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1751
votes: 0


Google adds a whitespace to long URL's that get displayed as green text on the results page - probably to force a nice line break if necessary. Some people seem to copy those green URL's and paste htem into the address bar, instead of simply clicking the link (maybe to prevent webmasters from seeing what brought them to the site?). This offcourse causes a 404 error on my site, which I'm trying to prevent :)

Does anybody know a valid RewriteRule for Apache's mod_rewrite that strips out whitespaces from URL's? The whitespace always follows a /, so there is a nice pattern to match. I tried some rules myself, but after repeatedly being forced to kill those threads, I've given up :(

2:33 pm on Sept 23, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


The following rule set strips off spaces from the URL and forces an external redirect to the new URL.

RewriteRule "([^ ]*) +(.*)" $1$2 [N,E=AC_REWRITE:true]
RewriteCond %{ENV:AC_REWRITE} true
RewriteRule (.*) [%{HTTP_HOST}$1...] [R=permanent,L]

First RewriteRule
Match as many characters that are not spaces (([^ ]*) - store them in $1) followed by one or more space (+) followed by as many characters as possible ((.*) - and store them in $2). If the rule matches we rewrite to $1$2 which contains the URL without the first set of spaces and store true in the environment variable AC_REWRITE. This process is repeated until the rule no longer matches (flag N).

RewriteCond
Look up the environment variable. Rewrite only if following RewriteRule matches and AC_REWRITE is true.

First RewriteRule
Force an external redirect in case we removed any spaces.

This rule set should be at the beginning of your rewriting process since the flag N will restart rewriting with the first rewrite rule.

8:09 pm on Sept 23, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 21, 2001
posts:115
votes: 0


I've noticed the same thing in some of our referrals from Google.

Wouldn't there be a way to do this with PHP?

8:31 pm on Sept 23, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1751
votes: 0


meannate: I first used a PHP-solution:


if (strstr(rawurldecode($_SERVER['REQUEST_URI']),"/ ")) {
header("Location: http://" . $_SERVER['SERVER_NAME'] . str_replace("/ ", "/", rawurldecode($_SERVER['REQUEST_URI'])));
exit;
}

in my 404.php. This works fine. The only (very minor) disadvantage is that each of these hits is written to Apache's errorlog.

Haven't tried andreas' rules yet, will do very soon.

9:38 pm on Sept 23, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1751
votes: 0


Still haven't tried andreas' rules, but I found this solution on Usenet:

RewriteEngine on
RewriteRule ^(.*)/\ (.*)$ $1/$2 [N,R=permanent]

This works fine when used in httpd.conf, whereas andreas' rules seem good for .htaccess.

12:06 pm on Sept 24, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


RewriteRule ^(.*)/\ (.*)$ $1/$2 [N,R=permanent]

While this approach is easier, just one rule it will cause an external redirect, i.e. extra traffic after each substitution. My rules strip off all whitespace internally and then do one external redirect to get the URL right in the userīs browser.

whereas andreas' rules seem good for .htaccess.

Actually I tested my rules only in httpd.conf and not in a .htaccess file although they should work there just as well.

Andreas

9:40 pm on Sept 24, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1751
votes: 0


Andreas: you're 100% right, and your rules work fine in my httpd.conf.

The accesslog shows one hit. The only -minor- disadvantage is that the URL in the browser's address bar keeps the ugly %20's.

10:28 pm on Sept 24, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 16, 2001
posts:545
votes: 0


If you did an external rewrite ([R=permanent,L]), your users should be redirected to the real URL. Some browsers don't always redisplay that URL, even though it's been changed.

I have a simple domain.com to www.domain.com rule in affect on one of my sites and right now(time of day, alignment of planets,etc) it's redirecting properly, in IE and Opera. Last night, it didn't (different pc, but shouldn't matter?).

11:00 pm on Sept 24, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


If you did an external rewrite

Thatīs what the [R=permanent,L] does in the second RewriteRule in my post #2 [webmasterworld.com].

Andreas

2:48 am on Sept 25, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


andreasfriedrich,

Thanks for that sophisticated little rewrite - I like the fact that it does the fix-up recursively, and then does the external redirect - very neat, very clean.

Jim