| Removing spaces from urls %20 |
madmatt69

msg:4228598 | 8:45 pm on Nov 9, 2010 (gmt 0) | Heya, There are a few sites that seem to link to lots of my content, however they must have some buggy code because they keep inserting a space in the links. For example, a url like 'test.com/thisisapage' will be linked to like 'test.com/thisisap age' As a result, they generate a lot of 404's. Getting in touch with the people who own that site doesn't seem to be possible. I'm not sure if there's something I can put in my conf file, or maybe alternatively some php code in my 404 file, that removes the blank space and does a redirect? Any advice would be appreciated!
|
sublime1

msg:4229815 | 2:21 am on Nov 13, 2010 (gmt 0) | madmatt69 -- Sorry this one took so long to get to. In your example, the URL has an actual space character, but I think depending on the browser, requesting the link, it could get turned into a + or a %20, both valid escapes for a space character. If you can do this in PHP, the code is simple:, e.g.
<?php $str = " fo o+bar%20fubar "; $str = preg_replace('/(%20|\s|\+)/', '', $str); // will print "foobarfubar" echo $str; ?>
So early in your request chain, you would do this:
<?php // get the path part of the request $str = $_SERVER['REQUEST_URI']; // remove all the characters you don't want $str = preg_replace('/(%20|\s|\+)/', '', $str); // if anything changed if ($str != $_SERVER['REQUEST_URI']) { // return a 301 Header( "HTTP/1.1 301 Moved Permanently" ); Header( "Location: http://example.com." . $str ); } ?>
It's far more difficult using RewriteRule in your apache conf file because of the lack of a repetition operator. Tom
|
jdMorgan

msg:4231729 | 1:26 am on Nov 18, 2010 (gmt 0) | Yes, perhaps the best you can do using RewriteRules is to use the RewriteRule [Next] function in a three-rule set:
# Detect and remove the first space from the requested URL-path, then save it and re-start RewriteCond %{ENV:CleanedURLpath} ="" RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^%?#\ ])\%(25)*20([^\ ?#]([?#][^\ ]*)?)\ HTTP/ RewriteRule ^. - [E=CleanedURLpath:%1%3,N] # # Detect and remove any subsequent space from the saved partially-corrected URL-path, then re-start RewriteCond %{ENV:CleanedURLpath} ^([^%?#\ ])\%(25)*20([^\ ?#]([?#].*)?)$ RewriteRule ^. - [E=CleanedURLpath:%1%3,N] # # Once we get here, all spaces have been removed from the # URL-path. Invoke an external redirect if any were removed. RewriteCond %{ENV:CleanedURLpath} ^(.+)$ RewriteRule ^. http://www.example.com/%1 [R=301,L]
The allowance for "25" preceding "20" is to handle multiply-encoded spaces. This is a fairly expensive rule-set; If using a more-specific pattern in the RewriteConds and RewriteRules is possible based on the "types" of the URLs that are being mis-linked and the nature of the linking errors, then I would recommend doing so. Jim
|
|
|