homepage Welcome to WebmasterWorld Guest from 23.20.34.25
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
rewrite one bit of url into something new
glimbeek




msg:4177137
 7:12 am on Jul 27, 2010 (gmt 0)

Hi,

just to be sure:
I want to rewrite everything with http://www.example.com/blog/
to:
http://www.example.com/news/

To achieve this I'm using the following:
RewriteRule ^blog/(.*)$ http://www.example.com/news/$1 [R=301,L]

It seems to be working, am I doing this correctly or might I be forgetting something important?

In addition I want to add a trailing slash most URL's
I'll try to explain:
http://www.example.com/blog/ has a trailing slash
http://www.example.com/blog/something1/ has a trailing slash
http://www.example.com/blog/something1/something2/ has a trailing slash
but http://www.example.com/blog/something1/something2/the-article doesn't have a trailing slash. All the article's need a trailing slash. But the something1 and something2 are actually something else a lot of the time.

is there an easy way to achieve this without having to create a rule for every something possibility?

I found something in the following post: [webmasterworld.com...]

# Externally redirect to add a trailing slash if no filetype
# or trailing slash is present on the requested URL-path
RewriteRule ^(([^/]+/)*[^./]+)$ http://www.example.com/news/$1 [R=301,L]

But does that do what I want (I changed it slightly) AKA add the slash?

I would still need to combine the two, right?

George

 

jdMorgan




msg:4177339
 3:42 pm on Jul 27, 2010 (gmt 0)

The answer could be simple, it could be hard, or it could be impossible (leaving only the choice to redirect each URL one at a time).

Sort your "blog" URLs into two groups -- those that should be redirected (in this case, to add a slash), and those that should not. Look at the URLs in these two groups very carefully, and using only the characteristics that you see in the text of the URLs themselves, try to find an easy way to distinguish between the two groups.

Look at the "words" -- like "blog" and the actual values of "something1" and "something2." Look at the path-depth of the URLs -- how many slashes are used the URLs in each group. Look at the character-sets used in the path-parts of each group. Some or all of these characteristics (and others not listed here) may be useful in telling the two groups apart.

For starters, in this specific case, your add-a-slash rule should only apply if:
  • The requested URL-path starts with "blog/".
  • The requested URL-path does not end with a slash.
  • The requested URL-path contains four "levels" /blog(1)/somthing1(2)/something2(3)/the-article(4)
  • The requested URL-path does not resolve to a physically-existing static file.

    In addition, this might be helpful:
  • The requested URL-path (apparently) does not end with a recognized filetype.
  • The requested URL-path (apparently) ends with a string containing hyphens.

    However, one must be careful with these last two, because you might conceivably publish an article that could look like it had a filetype in the title -- "How to improve your robots.txt" or "Trouble at youtube.com" for example, and there's also the possibility that you might publish an article with a one-word title. You don't want to "lay traps" for yourself by using "potentially-ambiguous" characteristics.

    Do you see any other easily-identifiable characteristics that would be useful in making the do-redirect/don't-redirect decision when looking at your two groups of URLs?

    This "pattern recognition" is exactly what mod_rewrite does when "looking at" a requested URL-path. It matches the requested URL-path against the characteristics described by the regular-expressions pattern(s) to decide whether or not to rewrite or redirect the request. So going through this exercise will help you to understand mod_rewrite, in addition to pointing the way to a practical solution to your current problem. It may also give you insight into ways to 'design' better (more easily-distinguishable) URLs in the future if that is not the case now.

    Jim

  • glimbeek




    msg:4177785
     10:17 am on Jul 28, 2010 (gmt 0)

    Hi Jim,

    Thank you for your reply.
    The 'only' URL's that need to have a slash at the end of the URL are the individual article pages.

    I'll try to explain:
    It's a Joomla website located in the folder /blog/
    In this Joomla site, there are 7 sections.
    Depending on the sections there are several categories.
    All categories have 1 or more article(s) (some have a few hundred articles).

    The blog folder obviously has a slash at the end of the URL (it's a folder).
    The section URL's have a slash as well.
    So do the category URL's.
    This leaves me with just the "articles" that need a slash.

    - They are all "clean" URL's, because we use a SEF extention, except for the "search" option, but those URL's aren't indexed and taking into account that the redirect has a condition that the URL should start with blog/ and should have 4 "levels", this won't cause a problem in our new setup.
    - There are no physical existing files within the section or categories.
    - There are no strings in any of the URL's (except for the search url's, but those aren't 4 "levels" deep).


    Taking the above and your reply into account, I came up with this, which seems to work.

    # Externally redirect to add a trailing slash if no filetype
    # or trailing slash is present on the requested URL-path
    # If the URL starts with "blog/"
    # If the URL contains four "levels": blog(1)/somthing1(2)/something2(3)/the-article(4)
    RewriteCond %{REQUEST_URI} ^/blog/(.*)$
    RewriteCond $1 !^([^/]+/)*[^/.]+$
    RewriteRule ^/*(.+/)?([^.]*[^/])$ [%{HTTP_HOST}...] [L,R=301]

    However it works to "well", it puts a slash to any URL that doesn't have one. It lacks the check to see if the URL that needs to get rewritten is actually a 4th "level" URL. I'm at a loss on how to create a RewriteCond for this. Could you point me in the right direction?

    With kind regards,

    George

    jdMorgan




    msg:4178305
     2:24 am on Jul 29, 2010 (gmt 0)

    Looks like you did it the hard way...

    Really, look into regular expressions... well worth the time. Very powerful. Used and supported in all major high-level programming and scripting languages today. Basic kit.

    # Redirect to add a slash if the requested URL starts with "blog/",
    # includes four levels, and does not end with a filetype or a slash
    RewriteRule ^(blog/[^/]+/[^/]+/[^/.]+)$ http://www.example.com/$1/ [R=301,L]

    All of the required attributes are specified in that one regex pattern. It's only 'weakness' is that the test for "filetype" here is 'fairly weak.' It considers any period (full stop) character in the final path-part to imply a filetype. If a better test is needed, then a RewriteCond could be added to look for "period, then three letters" or "period, then two to four letters or three letters followed by a digit" -- or even to look for a list of specific filetypes. It depends on how 'strong' of a pattern you really need.

    Jim

    glimbeek




    msg:4178385
     6:59 am on Jul 29, 2010 (gmt 0)

    Thanks Jim,

    I did a step by step approach.
    First sort the blog
    Then the slash
    Then the levels.

    Tackle one problem at a time. I didn't doubt it could been done easier, as you've shown me, I'm just not the proficient with regex pattern's.

    After the above I wanted to add the "function" to rewrite all the blog/ url's to nieuws/ url's, taking my step by step approach.

    Of course I could do this afterwards, but that would leave me with a double redirect for ALL the URL's (around a thousand).

    I read up on: [webmasterworld.com...]
    And I looked at some examples, but I'm clueless on how I can change blog/ into nieuws/ using your example. I tried playing around with the $1 $2 %1 etc, but nothing seems to work.

    Cheers for helping me out. I'm learning more and more :)

    With kind regards,

    George

    glimbeek




    msg:4178411
     8:08 am on Jul 29, 2010 (gmt 0)

    Ok,

    after taking another careful read and some trial and error testing, I came up with this:

    # Redirect to add a slash if the requested URL starts with "blog/",
    # includes four levels, and does not end with a filetype or a slash.
    # Also rewrite the "blog/" to "nieuws/".
    RewriteRule ^(blog/([^/]+/[^/]+/[^/.]+))$ http://www.example.com/$1/ [R=301,L]

    Next step...

    All the other url's within blog/ and blog/ itself needs to get redirected to nieuws/ AKA blog/ needs to change into nieuws/

    Putting the above code first, can I use the following code:
    RewriteRule ^blog/(.*) http://www.example.com/nieuws/$1 [R=301,L]

    Or will this cause unforeseen results? Using (.*) seems a but dangerous, as it catches everything... Can I make it more robust? Or even better is there a way to combine the 2?

    jdMorgan




    msg:4179526
     12:56 am on Jul 31, 2010 (gmt 0)

    If you want to redirect "everything, anything, or nothing" following "blog/" then that's the correct pattern. Only you know your URL-set well enough to decide that...

    Jim

    glimbeek




    msg:4180466
     8:10 am on Aug 2, 2010 (gmt 0)

    Yeah, I want to redirect everything in "blog/" if it's not picked up but the add slash and redirect to "nieuws/" rule.

    Cheers for your help Jim. Very useful as always :)

    With kind regards,

    George

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Code, Content, and Presentation / Apache Web Server
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
    © Webmaster World 1996-2014 all rights reserved