Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite take out spaces?

         

brisctt

3:51 pm on Sep 22, 2011 (gmt 0)

10+ Year Member



I'm new to mod rewrite, but have figured out so far how to do what I need. one of the last things I'm looking to do is change any Spaces in the url to have Dashes. Any help would be greatly appreciated.

For example:
[domainname.com...]
[domainname.com...]

Here's what I have so far, but not sure where to proceed from here:
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /manufacturer\.php\?id=([^\ ]+)\ HTTP/
RewriteRule ^manufacturer\.php$ [domainname.com...] [R=301,L]
#
# Internally rewrite search engine friendly static URL to dynamic filepath and query
RewriteRule ^parts/([^/]*)/?$ /manufacturer.php?id=$1 [L]

g1smd

7:13 pm on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use example.com in the forum to suppress URL auto-linking.

I would suggest rewriting (that's rewrite, not redirect) URL requests with a space to a special PHP script that adds the spaces and then sends the 301 header and the location header back to the broswer.

The special rewrite will need to go ahead of all of your other rules even before any redirects.

lucy24

8:49 pm on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^parts/([^/]*)/?$ /manufacturer.php?id=$1 [L]

If you're on shared hosting, try this rule on a "dummy" directory before you continue. Some hosts change / to /index.html or equivalent before the request ever reaches htaccess, so rules ending in / will always fail. (I'm speaking from direct personal experience.)

I'm not positive that ([^/]*)/? is what you meant. If there are no non-slash characters after "parts/" then there will definitely not be a second slash, unless you are in malformed URL territory. And if you're capturing a potential filename, do you want to capture the extension?

g1smd

9:34 pm on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Using
/?
means that two different URLs will be able to serve the same content. Do not promote Duplicate Content on your site. Allow only the URL without slash to serve content. Redirect requests "with slash" to URL "without URL".

The
*
in
([^/]*)/
allows a request for
example.com//
with a double slash to be valid. You do not want that. You need
([^/.]+)$
or similar.

[edited by: engine at 7:41 am (utc) on Sep 23, 2011]
[edit reason] fixed code [/edit]

brisctt

12:47 am on Sep 23, 2011 (gmt 0)

10+ Year Member



Using /? means that two different URLs will be able to serve the same content. Do not promote Duplicate Content on your site. Allow only the URL without slash to serve content. Redirect requests "with slash" to URL "without URL".

The * in ([^/]*)/ allows a request for example.com// with a double slash to be valid. You do not want that. You need ^([^/.]+)$ or similar.


Not sure I totally understand. Are you referring to this line:
RewriteRule ^parts/([^/]*)/?$ /manufacturer.php?id=$1 [L]

should be more like:
RewriteRule ^parts/^([^/.]+)$ /manufacturer.php?id=$1 [L]

I've tried that, but get a page not found error.

lucy24

2:40 am on Sep 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've tried that, but get a page not found error.

And no wonder, because it's looking for a page with the literal ^ character in the middle.

In RegEx, the caret has special meaning in two contexts. At the very beginning of an expression it means "starts with...". As the first character in grouping brackets, it means "does not include". Everywhere else it is a literal ^.

Since you are rewriting rather than redirecting, and not capturing the /, at least two different places might end up showing the content of /manufacturer.php.

Btw, if you're taking the entire query string and dumping it onto the visible url, I'm not sure how search-engine-friendly it's going to end up. How long are the queries?

g1smd

6:51 am on Sep 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The ^ was my typo.

brisctt

3:38 pm on Sep 23, 2011 (gmt 0)

10+ Year Member



And no wonder, because it's looking for a page with the literal ^ character in the middle.

In RegEx, the caret has special meaning in two contexts. At the very beginning of an expression it means "starts with...". As the first character in grouping brackets, it means "does not include". Everywhere else it is a literal ^.

Since you are rewriting rather than redirecting, and not capturing the /, at least two different places might end up showing the content of /manufacturer.php.

Btw, if you're taking the entire query string and dumping it onto the visible url, I'm not sure how search-engine-friendly it's going to end up. How long are the queries?


I guess I'm totally lost now. With what I have so far, I'm able to get the results that I want with the urls.

OLD URL: http://www.example.com/manufacturer.php?id=PartManufacturer
NEW URL: http://www.example.com/parts/PartManufacturer/

Are you saying with what I have, it's not good because it could potentially show more than one way, such as.
http://www.example.com/parts/PartManufacturer//

And I still am very lost with my inital question of wanting to take spaces out and add hyphens, such as.
http://www.example.com/manufacturer.php?id=Part Manufacturer
http://www.example.com/parts/Part-Manufacturer

g1smd

6:44 pm on Sep 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The code is almost right. Delete the second of three carets.

lucy24

7:40 pm on Sep 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The last time the spaces-to-hyphens question came up, the advice was to reroute to a php page that fixes things up. (Or was it massive case changing? Same principle, anyway.)

If your query strings contain only a modest number of spaces or hyphens, you can do it in mod_rewrite by looking at %{QUERY_STRING}, breaking it into ([^\ ]+)\ (.+) and rewriting or redirecting to %1-%2. Note that any and all literal spaces have to be escaped. This is specific to htaccess, not to Regular Expressions in general.

At this point it will help to repeat your original question using example.com so we can see exactly what the "before" and "after" are supposed to look like. Or the "before", "during" and "after" if you're doing the redirect-to-rewrite two-step. A final complication is that spaces in query strings may have been changed in transit to plusses-- which always have to be escaped \+ in RegEx. And either one may have been encoded, leaving you with four possibilities to look out for: literal space, plus, %20 and %2B.

And if something has been double-encoded-- I find this periodically in my logs-- you first have to change any %25 back to plain %.

Which is why you try very hard to avoid using literal spaces ;)

g1smd

7:45 pm on Sep 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We'll get to that soon enough. There's multipl e things going on in this thread.

brisctt

1:43 am on Sep 24, 2011 (gmt 0)

10+ Year Member



The code is almost right. Delete the second of three carets.


Ok, I think I figured out what you were referring to. I made that change, as well as the change in the redirect.

OLD URL:
http://www.example.com/manufacturer.php?id=ManufacturerName
NEW URL:
http://www.example.com/parts/ManufacturerName

This is what I'm using:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /manufacturer\.php\?id=([^\ ]+)\ HTTP/
RewriteRule ^manufacturer\.php$ http://www.example.com/parts/%1? [R=301,L]

RewriteRule ^parts/([^/.]+)$ /manufacturer.php?id=$1 [L]

Does this look right?

lucy24

6:32 am on Sep 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As written:

Redirect requests arriving from outside, taking the query string and making it into the end of the new URL.
And then
Rewrite this new url back into its original form, so the user gets the page they originally asked for but without a messy query string in the address bar.

If that's your intention, all is well, with two limitations:

Right now you're only grabbing the "nice" queries that don't happen to contain spaces. I assume this is intentional. (Never put off until tomorrow what you can do equally well the day after tomorrow.)

The rule assumes that the address element "parts/{one name, no further directories, no extension}" never occurs anywhere but as a result of the preceding redirect. If this is right, all is well. If it is potentially not right, you're up the ###. :)

Hm. I don't know if you need to deal with human users who willfully and maliciously type "www.example.com/parts/gibberish" into their address bar. I suppose your php has something to deal with invalid queries.

g1smd

7:06 am on Sep 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Rewording for clarity.

Redirect URL requests with parameters arriving from outside, taking the query string value (not the name) and appending it to the end of the new URL.
Browser then makes new URL request.
Rewrite this new URL request such that is it internally mapped to a server filepath and file with appended parameters to serve the content.

The user gets the page they originally asked for but without a messy query string in the address bar. Links on the site should be amended to point to the URL form that does not have a query string.

g1smd

5:59 pm on Sep 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ahead of your rules for normal requests, you'll add the special rules for requests with spaces, and we'll get to that soon enough.

brisctt

11:58 pm on Sep 24, 2011 (gmt 0)

10+ Year Member



Rewording for clarity.

Redirect URL requests with parameters arriving from outside, taking the query string value (not the name) and appending it to the end of the new URL.
Browser then makes new URL request.
Rewrite this new URL request such that is it internally mapped to a server filepath and file with appended parameters to serve the content.

The user gets the page they originally asked for but without a messy query string in the address bar. Links on the site should be amended to point to the URL form that does not have a query string.


Yes, I see that rewrite would be correct, not redirect.

As far as the links on the site being amended to point to the url form without the query string, is that a must do? Are there consequences for not changing the links?

lucy24

2:19 am on Sep 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I see that rewrite would be correct, not redirect.

You're doing both. First you redirect so the user sees the nice non-threatening address in their browser. Then you rewrite to show them whatever you need to show them.

Your internal links should always point to the URL that the user is really going to. (Universal guideline, not specific to this question.) You of all people know the correct form of the address ;) so there's no reason not to use it. Otherwise you're just creating unneeded redirects.


Rewording for clarity.

For a given definition of "clarity" :-P

g1smd

6:49 am on Sep 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



True, but using terms like "go to" and "location" without clarifying that we are talking about a URL used "out there" on the web or a server filepath used "here" inside the server is always going to be problematical.

I especially have a problem with the term "real location" used on its own. The real location of something on the web is its URL. The real location of something inside a server is its filepath. They are not at all the same thing, the server software merely "associates" them. When you ask for URL X the content is served from filepath Z.

At least 99% of all the explanations about mod_rewrite that I have ever read outside WebmasterWorld, explain how it works "exactly backwards". Mod_rewrite does not "make" new URLs. Mod_rewrite cannot change the links on the page.

YOU change the links on the pages of your site to refer to the URL form that you want users to "see" and "use". Mod_rewrite processing kicks in only after that link is clicked and the HTTP request is sent to the server.

Mod_rewrite can do at least two different things with a request depending on how the RewriteRule was coded.
  • Mod_rewrite can return a 3xx status code and a pointer to a new location. This is a redirect. The browser will need to make a new HTTP request in order to get the content.
  • Mod_rewrite can silently fetch content from inside a server and send it back to the requester. This is a rewrite if the content is within the same server, and a proxy if the content is inside some other server.
  • Mod_rewrite can also return a 4xx status code and block access to the requested resource.

    It's the Swiss army knife of server configuration tools.
  • lucy24

    7:42 am on Sep 25, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    You forgot the fourth hand :) If you do the htaccess equivalent of misplacing a comma, mod_rewrite can return a perfectly serviceable 500 error.

    At least 99% of all the explanations about mod_rewrite that I have ever read outside WebmasterWorld, explain how it works "exactly backwards".

    If you do say so yourself ;)


    In an obscure corner of an obscure page, I note that such-and-such text on another site is "spectacularly unidiomatic and only marginally grammatical." Punch line: "I know this for a fact, because I contributed it myself." Or words to that effect.

    g1smd

    9:07 am on Sep 25, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Oh, jdMorgan's war cry: "Error 500? Great! Only 499 to go."

    brisctt

    4:12 pm on Sep 25, 2011 (gmt 0)

    10+ Year Member



    My original question about the spaces. Is that done in the htaccess file with a str_replace or something?

    lucy24

    8:47 pm on Sep 25, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    No, it's done via a detour to a php script. There is a recent post that roughs out the wording, but I can't find it. The script has to grab all requests coming in from outside your site, both the ones with query strings and the ones in prettified form, because you never know what people will bookmark.

    If you could be positive that there are never more than, say, three spaces, and that they always arrive as plain spaces, it could be done right in the htaccess. But you have to allow for all of these, possibly more than once:

    \ (literal space)
    \+ (space in query string changed to plus in transit)
    %20 (encoded space)
    %2C (encoded plus)

    ... and %25-anything in case there was a glitch and the percent sign itself got re-encoded. Oh, and within htaccess-- or at least within mod_rewrite-- that same percent sign % has to be \% escaped so forms like %2C aren't interpreted as "the second capture from the Condition, followed by a literal C". (My search for the relevant post was not wholly in vain, because I had forgotten this detail. Whew.)

    brisctt

    9:06 pm on Sep 25, 2011 (gmt 0)

    10+ Year Member



    If you could be positive that there are never more than, say, three spaces, and that they always arrive as plain spaces, it could be done right in the htaccess.


    No, there would definitely not be more than 3 spaces. I actually don't for see more than 2.

    g1smd

    9:24 pm on Sep 25, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    At the top of the .htaccess file add a
    RewriteRule
    that detects requests with spaces and then rewrites (that's rewrite, not redirect) those requests to a special PHP script.

    The PHP script will "fix" the URL in every way that it is at fault, then send two HTTP headers back to the browser: one with the 301 status, the other with the new URL, including protocol and domain.

    It's also very efficient to do it this way. There's one rule in htaccess that matches requests with spaces. So, for requests without spaces there's just one rule to ignore and move on to the next.

    If you put all of the fix-the-spaces code in the .htaccess file everything will run slower for all requests. For requests without spaces there would be a big chunk of code that has to be processed and the action then skipped. I wouldn't do it that way.

    brisctt

    2:07 pm on Sep 26, 2011 (gmt 0)

    10+ Year Member



    This is what I've been able to get together so far. It does seem to somewhat work, but I notice it must break the mysql query on the pages.....so something must be wrong.

    RewriteCond %{THE_REQUEST} (\s|%20)
    RewriteRule ^([^\s%20]+)(?:\s|%20)+([^\s%20]+)((?:\s|%20)+.*)$ $1-$2$3 [N,DPI]
    RewriteRule ^([^\s%20]+)(?:\s|%20)+(.*)$ /$1-$2 [L,R=301,DPI]

    g1smd

    3:23 pm on Sep 26, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Redirect target must include protocol and domain.

    ^([^\s%20]+)
    means "not a space, not a % character, not the digit 2 or the digit 0". Not exactly what you intended.

    I find that it is much easier to rewrite these requests to a PHP script then do various preg_replace, str_replace, etc, functions on the URL in there, and let the PHP script issue the redirect response.

    brisctt

    4:50 pm on Sep 26, 2011 (gmt 0)

    10+ Year Member



    Can you give me an example of what that would look like?

    g1smd

    4:58 pm on Sep 26, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Check recent threads because I've posted stuff about doing it that way at least 3 or 4 times so far this month.

    [google.com...]

    brisctt

    1:06 am on Sep 27, 2011 (gmt 0)

    10+ Year Member



    Ok, I found a few posts and think I have a direction. Is this looking close?

    RewriteRule ^([^/.]+/)(.*) manufacturer-fix.php?id=$1 [L]

    lucy24

    3:09 am on Sep 27, 2011 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    If your intention is to capture the first directory including its closing slash, throw away everything else including the original query string if any, and rewrite to your php page, you've nailed it. :)
    This 31 message thread spans 2 pages: 31