Welcome to WebmasterWorld Guest from 54.145.176.120

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Removing spaces from urls

%20

   
8:45 pm on Nov 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Heya,

There are a few sites that seem to link to lots of my content, however they must have some buggy code because they keep inserting a space in the links.

For example, a url like 'test.com/thisisapage' will be linked to like 'test.com/thisisap age'

As a result, they generate a lot of 404's. Getting in touch with the people who own that site doesn't seem to be possible.

I'm not sure if there's something I can put in my conf file, or maybe alternatively some php code in my 404 file, that removes the blank space and does a redirect?

Any advice would be appreciated!
2:21 am on Nov 13, 2010 (gmt 0)

10+ Year Member



madmatt69 --

Sorry this one took so long to get to. In your example, the URL has an actual space character, but I think depending on the browser, requesting the link, it could get turned into a + or a %20, both valid escapes for a space character.

If you can do this in PHP, the code is simple:, e.g.


<?php
$str = " fo o+bar%20fubar ";
$str = preg_replace('/(%20|\s|\+)/', '', $str);
// will print "foobarfubar"
echo $str;
?>


So early in your request chain, you would do this:


<?php
// get the path part of the request
$str = $_SERVER['REQUEST_URI'];
// remove all the characters you don't want
$str = preg_replace('/(%20|\s|\+)/', '', $str);
// if anything changed
if ($str != $_SERVER['REQUEST_URI']) {
// return a 301
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://example.com." . $str );
}
?>


It's far more difficult using RewriteRule in your apache conf file because of the lack of a repetition operator.

Tom
1:26 am on Nov 18, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yes, perhaps the best you can do using RewriteRules is to use the RewriteRule [Next] function in a three-rule set:

# Detect and remove the first space from the requested URL-path, then save it and re-start
RewriteCond %{ENV:CleanedURLpath} =""
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^%?#\ ])\%(25)*20([^\ ?#]([?#][^\ ]*)?)\ HTTP/
RewriteRule ^. - [E=CleanedURLpath:%1%3,N]
#
# Detect and remove any subsequent space from the saved partially-corrected URL-path, then re-start
RewriteCond %{ENV:CleanedURLpath} ^([^%?#\ ])\%(25)*20([^\ ?#]([?#].*)?)$
RewriteRule ^. - [E=CleanedURLpath:%1%3,N]
#
# Once we get here, all spaces have been removed from the
# URL-path. Invoke an external redirect if any were removed.
RewriteCond %{ENV:CleanedURLpath} ^(.+)$
RewriteRule ^. http://www.example.com/%1 [R=301,L]

The allowance for "25" preceding "20" is to handle multiply-encoded spaces.

This is a fairly expensive rule-set; If using a more-specific pattern in the RewriteConds and RewriteRules is possible based on the "types" of the URLs that are being mis-linked and the nature of the linking errors, then I would recommend doing so.

Jim
 

Featured Threads

Hot Threads This Week

Hot Threads This Month