homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Removing spaces from urls
%20
madmatt69




msg:4228598
 8:45 pm on Nov 9, 2010 (gmt 0)

Heya,

There are a few sites that seem to link to lots of my content, however they must have some buggy code because they keep inserting a space in the links.

For example, a url like 'test.com/thisisapage' will be linked to like 'test.com/thisisap age'

As a result, they generate a lot of 404's. Getting in touch with the people who own that site doesn't seem to be possible.

I'm not sure if there's something I can put in my conf file, or maybe alternatively some php code in my 404 file, that removes the blank space and does a redirect?

Any advice would be appreciated!

 

sublime1




msg:4229815
 2:21 am on Nov 13, 2010 (gmt 0)

madmatt69 --

Sorry this one took so long to get to. In your example, the URL has an actual space character, but I think depending on the browser, requesting the link, it could get turned into a + or a %20, both valid escapes for a space character.

If you can do this in PHP, the code is simple:, e.g.


<?php
$str = " fo o+bar%20fubar ";
$str = preg_replace('/(%20|\s|\+)/', '', $str);
// will print "foobarfubar"
echo $str;
?>


So early in your request chain, you would do this:


<?php
// get the path part of the request
$str = $_SERVER['REQUEST_URI'];
// remove all the characters you don't want
$str = preg_replace('/(%20|\s|\+)/', '', $str);
// if anything changed
if ($str != $_SERVER['REQUEST_URI']) {
// return a 301
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://example.com." . $str );
}
?>


It's far more difficult using RewriteRule in your apache conf file because of the lack of a repetition operator.

Tom

jdMorgan




msg:4231729
 1:26 am on Nov 18, 2010 (gmt 0)

Yes, perhaps the best you can do using RewriteRules is to use the RewriteRule [Next] function in a three-rule set:

# Detect and remove the first space from the requested URL-path, then save it and re-start
RewriteCond %{ENV:CleanedURLpath} =""
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^%?#\ ])\%(25)*20([^\ ?#]([?#][^\ ]*)?)\ HTTP/
RewriteRule ^. - [E=CleanedURLpath:%1%3,N]
#
# Detect and remove any subsequent space from the saved partially-corrected URL-path, then re-start
RewriteCond %{ENV:CleanedURLpath} ^([^%?#\ ])\%(25)*20([^\ ?#]([?#].*)?)$
RewriteRule ^. - [E=CleanedURLpath:%1%3,N]
#
# Once we get here, all spaces have been removed from the
# URL-path. Invoke an external redirect if any were removed.
RewriteCond %{ENV:CleanedURLpath} ^(.+)$
RewriteRule ^. http://www.example.com/%1 [R=301,L]

The allowance for "25" preceding "20" is to handle multiply-encoded spaces.

This is a fairly expensive rule-set; If using a more-specific pattern in the RewriteConds and RewriteRules is possible based on the "types" of the URLs that are being mis-linked and the nature of the linking errors, then I would recommend doing so.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved