homepage Welcome to WebmasterWorld Guest from 54.225.24.227
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess URL rewrite with multiple hyphens and spaces
benlf




msg:4608673
 12:13 pm on Sep 10, 2013 (gmt 0)

Hi everyone, I've been wrestling with this for a while and it feels like I should be able to get my head around it, but I just can't.

I have some URLs that have may have an indeterminate number of plus symbols, or blank spaces and I want to convert them so that they have friendly URLs with hyphens.

So far these are the rules I have come up with, but they only change the first hyphen (or space) and the rest of the characters are unaffected.

RewriteRule ^somepath/(.*)\+(.*)$ http://example.com/somepath/$1-$2 [R=301,L]

RewriteRule ^somepath/(.*)\s(.*)$ http://example.com/somepath/$1-$2 [R=301,L]


Example of the URLs are:
http://example.com/somepath/here+are+some+words+going+here or
http://example.com/somepath/here are some words going here

I want to try and achieve
http://example.com/somepath/here-are-some-words-going-here

I'm sure I'm missing something basic. Any help any of you could offer would be most appreciated!

Thanks :)

 

lucy24




msg:4608818
 10:11 pm on Sep 10, 2013 (gmt 0)

an indeterminate number

<begin g1smd impression>
Rewrite-- don't redirect-- to a php script that will perform all substitutions and will issue a 301 redirect to the new URL.
</end impression>

You are absolutely right to be redirecting. Literal spaces are a big problem, and plus signs don't belong in the path at all. Make sure your php script grabs all four forms:

  (literal space)
%20 (escaped space)
+ (literal plus)
%2B (escaped plus, case-insensitive)

If you were certain that there could never be more than some small number-- say, two or at most three-- you could make a package of RewriteRules. But "indeterminate" sounds ominous.

Don't let the words "php script" scare you. Heck, I only speak about three words of php and _I_ could do this much. (That is: I couldn't post a fix for others to use. That would be irresponsible. But I'd be able to cobble something together for my own use.)

benlf




msg:4608964
 11:37 am on Sep 11, 2013 (gmt 0)

Hi lucy24, many thanks for your reply.

The URLs are being created by the tag "search" feature of a very old installation of Movable Type.

We are trying to create SEO-friendly URLs of popular tags that can be more easily shared.

I don't think that there would ever be more than 5 hyphens or spaces in any given ORL but all the code I have found only ever covered up to three instances, so I wanted to try and see a solution that would cover more!

There should be no instances of hyphens mixed with spaces, so that's not an issue.

I'm constrained by an old CMS and a boss who has particular ways he wants to do things.

I would be happier with the .htaccess solution because if I am only going to have to cover 10 possible combinations it just seems a little simpler than adding in yet another bit of code that lives elsewhere. For the record I am going to have to re-create this solution over a number of different folders on different sites, so a simple cut and paste one-size-fits-all solution has its own appeal! :)

JD_Toims




msg:4609022
 3:33 pm on Sep 11, 2013 (gmt 0)

Personally, I'd use PHP as Lucy24 suggested.

--> in the .htccess

RewriteEngine on
RewriteRule [+\ ] /change-to-hyphens.php [L]

--> /change-to-hyphens.php

<?php

$fixed_url=str_replace(' ','-',str_replace('+','-',$_SERVER['REQUEST_URI']));
header('Location: http://www.example.com'.$fixed_url,TRUE,301);
exit;

?>

benlf




msg:4609042
 4:50 pm on Sep 11, 2013 (gmt 0)

OK - I've tried the PHP solution you have suggested.

The only bit of the code I changed was the URL, to reflect the actual domain and the sub-directory where I have the htaccess file (in this format: http://example.com/somepath/ )

I CANNOT have this rule work at domain level as it could affect literally 1000s of legacy files and graphics that may have spaces or + symbols in them.

Anyhow - it's not working. It either breaks the page or has no effect, depending on where I put it in the htaccess file.

I suspect because it's conflicting with some other htaccess rules I have (and need for an MT plugin) and/or I just haven't understood the instructions.

I would really prefer if someone could suggest an htaccess-only solution (along the lines of the rewrite rules I have above) as I feel more confident I could see what was going wrong if that didn't work.

It's frustrating as those rules are 1/2 working, only stopping where there is more than one + or " ". I just feel that if they were tweaked that would be the easiest way to go.

Thanks.

g1smd




msg:4609047
 5:41 pm on Sep 11, 2013 (gmt 0)

The "all .htaccess" solution will be much more complicated than the mixed .htaccess/PHP solution and has an even greater risk of clashing with other rules.

The rewrite discussed above should be near the beginning of the .htaccess file (after rules that block access, and before any redirects).

The rule should go in the root .htaccess file. You will need to adjust the Rule RegEx pattern (the
[+\ ] bit) so that it matches only those requests that you do want to redirect.

You will also need a negative match condition preceding your non-www/www redirect such that it does not redirect internal requests for the fixer PHP script (otherwise this rule will expose the PHP script itself as a new URL back out on to the web).

The solution discussed here is three lines, maybe four, of .htaccess code and three lines of PHP.

Getting it to work correctly is merely a process of adjusting the Rule pattern to fit the specification of which URLs you actually want to redirect.

PHP is a much better processor of "replace this with that, multiple times" instructions. You really do not want to do this in all in .htaccess.

lucy24




msg:4609090
 7:48 pm on Sep 11, 2013 (gmt 0)

I CANNOT have this rule work at domain level as it could affect literally 1000s of legacy files and graphics that may have spaces or + symbols in them.

That's fine. You just need to constrain the original rule, in the main htaccess, to

RewriteCond %{THE_REQUEST} [+\ ]
RewriteRule ^directory/[^+\ ][+\ ] /fixup.php [L]
Use an opening anchor and then the exact directory path, up to the point where the relevant files are located. You can't use a <Directory> section in htaccess, so it has to be expressed as part of the path.

In most situations, adding a supplementary htaccess in the applicable directory is the easiest fix. For example, if you're setting different auto-index options in one area, or you want a different expiration period for files in one directory. But mod_rewrite is different. Unlike almost everything else in Apache, it isn't inherited by default. So the existence of mod_rewrite in one directory will wipe out the results of any mod_rewrite activity in higher-level directories. You can say
RewriteOptions inherit
but then the rules will work in a weird bottom-to-top order and frankly it isn't worth it. At least not unless the rules are 100% mutually exclusive, where nothing in the primary htaccess applies to the inner htaccess.

Final quirk: In general, RewriteRules that create redirects go before RewriteRules that are rewrites only. Here, the rule is superficially a rewrite-- but it will end up issuing a redirect. So you need to put it together with any other RewriteRules that create specific redirects. In other words, pretend that it has the [R=301] flag and locate it accordingly.

phranque




msg:4609166
 3:56 am on Sep 12, 2013 (gmt 0)

welcome to WebmasterWorld, benlf!


i would look at the solution provided by jdMorgan in the thread linked below and decide if you would prefer to maintain this among the MT-related directives and whatever else you have going on versus the simpler solution suggested above.
you're getting some really good advice here...


Removing spaces from urls:
http://www.webmasterworld.com/apache/4228596.htm?highlight=msg4231729 [webmasterworld.com]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved