Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite to change spaces to hyphens, not underscores

         

sasori

11:44 pm on Aug 6, 2009 (gmt 0)

10+ Year Member



Hello,
I have a smarty based cart and it does a mod_rewrite trick to make clean urls. While doing this is changes spaces to underscores; I'm trying to change it to make hyphens.
I got the php to write hyphenated links, however the mod_rewrite isn't 'getting it' and the url fails.

I've tried a couple of things, but it didn't work. Here is the mod_rewrite:
RewriteRule^ladies-([0-9a-zA-Z\_\-]*)\.htm([l]?)$cart.php?p=product&product_code=$1 [L]
RewriteRule^([0-9a-zA-Z\_\-]*)\.htm([l]?)$cart.php?p=catalog&catalog_code=$1 [L]
RewriteRule^pages/([0-9a-zA-Z\_\-]*)\.htm([l]?)$cart.php?p=page&page_id=$1 [L]
------------------
basically, I tried to swap the _ and - in each line; that didn't work.

Can you help?

Thanks!

jdMorgan

1:04 am on Aug 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> While doing this is changes spaces to underscores; I'm trying to change it to make hyphens.

Actually, it does not change these characters at all. It simply takes the client-requested URL-path matching the first parenthesized sub-pattern in the RewriteRule pattern and rewrites that path info into the "code=" query string value as it passes the request to cart.php.

I suspect you'll need to look at cart.php to see where it takes the GET value and parses it. It's likely that that any character substitution is done somewhere after that, possibly using the preg_replace function.

Note that the "\.html([l]?)" subpattern is redundantly/unnecessarily complex, and can be equivalently written as "\.html?" Also, the underscores do not need to be escaped, only the hyphens. So "\_" can be written as just "_" within the alternate character groups.

In addition, you should be aware that allowing this pattern to accept both .html and .htm URL requests as it does means that you've got potential duplicate content issues -- the same content appearing at two different URLs. Best practices indicate that you should *not* allow the cart rewrite to accept both, but pick only one. Then add an external redirect rule to redirect all requests for the non-preferred file extension to the preferred extension. This prevent duplicate content problems from arising accidentally or through the malicious activity of others.

Jim

sasori

4:12 am on Aug 7, 2009 (gmt 0)

10+ Year Member



thanks for the thorough explanation. since the pages in the site are generated dynamically, its also generating the links, so I figure you're right about the *.htm-thing. I'll give it a try.

This is the php function:
<?
for($i = 0; $i<strlen($title); $i++){
$c = $title[$i];
if(ereg("[0-9a-zA-Z]", $title[$i])){
$_title = $_title.$c;
}
else{
$_title = $_title."_";
// trying to change _ to - in title
}
$_title = str_replace("__", "_", $_title);
}
?>

As I think I understand, the php function is set up to match how the htaccess is set up. Otherwise, its going to generate links to pages that will not be there. ... now I'm confusing myself. However, this is exactly what I did when I replaced the _ with - in the php function.

I've found some other sites that have 'replace _ with -' htaccess rules, but I've been afraid to use them as I thougth they might not be as thorough as what I currently have.

As you see it, is my htaccess going to work with links that have hyphens, like:

[ididitonline.com...]

Oddly, the hyphens after hoody are added by the php function and work fine. Its parsing the title of the product that seems to be 'stuck' with underscores.

Thanks

jdMorgan

2:51 pm on Aug 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your code will rewrite any requested URL starting with "ladies-", "pages/", or blank, and followed by letters, numbers, hyphens or underscores, and ending with ".htm" or ".html" to /cart.php, with the matched part of the URL copied into the appropriate name/value pair in the cart.php query string. This query string data will then likely be retrieved inside cart.php by using the "GET" variable.

As stated above, it is also very likely that after cart.php retrieves the GET parameters, you'll want to use preg_replace or str_replace to swap the hyphens and/or underscores to spaces as needed, so that the result is a string that the script will be able to find in your database.

So when you are building a link to put in your HTML page URL, you change spaces from the database into hyphens in the links on the HTML page. Then the visitor clicks on a hyphenated link and requests that from your server. Your RewritRule takes that requested URL, and changes it into the form needed to invoke cart.php. Cart.php then retrieves the GET parameters created by your mod_rewrite rule and looks that up in the database to retrieve the necessary data to create the requested page. It is just before this is done that you must change the hyphens back to spaces with str_replace or preg_replace.

You *could* do this in your RewriteRule, but it is much, much, much less efficient, and only recommended for masochists who actually want to be forced into an early server upgrade. ;) This is because mod_rewrite has no internal 'looping' functions, and so can only replace one (or a very few) characters at a time; If more replacements are needed, then mod_rewrite has to be re-started from the top using the [N] flag. This can be *thousands* of times slower than using a script to do the replacement.

It's also possible you'd have to modify cart.php anyway, because mod_rewrite would encode the spaces as %20, and the script might have to decode them before accessing the database. So tweaking cart.php is really much easier and much, much more efficient.

Jim

sasori

11:50 pm on Aug 10, 2009 (gmt 0)

10+ Year Member



Thanks so much for the excellent explanation, jdMorgan.
I've sucessfully made some of the mods you've suggesed.
I think I also found the culprit:
$parts = explode("-", $product_code);

This would explain why ladies-Tie_Dye_Hoody-461-120.html
works but Ladies-Tie-Dye-Hoody-461-120.html doesn't. (spaces and Hyphens get converted in the function)

So, now the trick is figuring out how to parse the script'backwards'; I figure I can pull the two numbers by counting from right to left (later, I want to use the actual section name in the URL, not it's number.

For the time being, I'll swap the hyphen for an underscore in the explode.

Again, Thanks!

jdMorgan

2:55 am on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or use preg_replace to simply replace all hyphens with underscores before you do the explode (or vice-versa).

Parsing out the numbers appears to be quite easy: Look for (\d+<hypen>\d+)<period>html$ as the match, and then extract the two \d+ matches.

Jim