homepage Welcome to WebmasterWorld Guest from 54.196.136.119
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
redirecting unknown directory depth to new URL
batface




msg:4599009
 7:07 am on Aug 3, 2013 (gmt 0)

I'm not sure how I can isolate the last directory in the old URL to do a 301 to the new URL for a site that has moved to Magento. The scenario is:

www.example.com/level1/level2/level3/level4/ to
www.example.com/level4.html

so I know I can use something like ([^/]+/)+ to capture all the directories but my problem is I need to capture the last directory to use as the filename in the new URL. This is complicated in that the directory structure might be 2 to 6 deep before the final level I need to capture.

I'm thinking about stuff like a lookahead for the end of the string but I think I am over complicating and making a mess.

Can anyone help point me in the right direction please?

 

g1smd




msg:4599018
 7:41 am on Aug 3, 2013 (gmt 0)

What's wrong with this?

RewriteCond %{THE_REQUEST} /([a-z]+)/\ HTTP/
RewriteRule /$ http://www.example.com/%1.html [R=301,L]


Do be aware that a request for e.g. http://www.example.com/common/files/css/ will be redirected to http://www.example.com/css.html and will then presumably serve a 404.

It might be a good idea to restrict the list of folder requests that are redirected.
Is the first folder level always the same, or on a short list of names?
Is the list of folder names (first level only) to NOT redirect, a very short list?
A RewriteCond can be added to refine the list of valid redirects using a list or a negative-match list by "first level".

batface




msg:4599032
 8:45 am on Aug 3, 2013 (gmt 0)

ok, so am I right in saying the RewriteCond looks for the final part of the URL no matter what it is - directory, filename or query string?

In this case /([a-z]+)/ picks that last directory. So if my directory name is /level-3/ I would be better with /([^/]+)/ ?

I want to remove the .html extension so that should get resolved.

Mainly there is a reasonable batch of possible top level directory which give all the possible main categories, so I need to work out what exactly they are, as well as finding out that list of not to capture directories. The clear one that I want to capture when accessing individual products is simply /products/.

lucy24




msg:4599046
 9:24 am on Aug 3, 2013 (gmt 0)

If you can pin it down to specific names, use them.

That's assuming you don't inadvertently have two widely separated directories with the same name, as in

/dir1/dir2/products/
/dir3/dir4/dir5/products/

Tip: Do not use a generic-capture rule if your target directory is called, say, /images/ :)

:: wondering why g1 passed up the chance to give a sales pitch for extensionless filename URLs ::

batface




msg:4599048
 9:51 am on Aug 3, 2013 (gmt 0)

sorry I may have not been clear. /products/ is an example leading top directory which now I need to capture so I don't have an issue like
/dir1/dir2/products/
/dir3/dir4/dir5/products/

batface




msg:4599050
 10:27 am on Aug 3, 2013 (gmt 0)

Probably being a numpty but is this correct?

www.example.com/products/level2/level3/level4/ to
www.example.com/level4/

RewriteCond %{REQUEST_URI} ^/products/
RewriteCond %{THE_REQUEST} /([^/]+)/\ HTTP/
RewriteRule /$ http://www.example.com/%1 [R=301,L]

and anything not /products/

RewriteCond %{REQUEST_URI} !^/products/

g1smd




msg:4599051
 10:29 am on Aug 3, 2013 (gmt 0)

msg:4599050 appears to be incomplete.

Use the "owner edit" button to fill in the rest of the code.

Please present both rulesets in full with all of the conditions and the rule.

[edited by: g1smd at 10:57 am (utc) on Aug 3, 2013]

g1smd




msg:4599052
 10:34 am on Aug 3, 2013 (gmt 0)

msg:4599032 - In this case /([a-z]+)/ picks that last directory. So if my directory name is /level-3/ I would be better with /([^/]+)/ ?

I'm not sure you are; consider the case where a URL request has a space in it. I used [a-z] in the Condition as an example. On the real site I would likely use [a-z0-9-] or whatever real and valid URLs actually contain.

msg:4599046 - wondering why g1 passed up the chance to give a sales pitch for extensionless filename URLs

My original post did actually have a line or two about that, but I removed it. There's a potential problem that requests for
example.com/some/other/thing are redirected to example.com/some/other/thing/ and then on to example.com/thing which you might not want. Additionally, if that request is somehow redirected to example.com/thing/ there's the potential for a redirect loop. I guess that it's best to not "go extensionless" until those types of issues and clashes have been fully catered for.

msg:4599050 - www.example.com/products/level2/level3/level4/ to www.example.com/level4/

The target URL in the example now ends with a slash. URLs ending with a slash are reserved for "folders". If this is the URL for a "page" it should not end with a slash. The example code redirects to a URL without a trailing slash. I summise the code is right and the example contains a typo?

[edited by: g1smd at 11:29 am (utc) on Aug 3, 2013]

batface




msg:4599053
 10:46 am on Aug 3, 2013 (gmt 0)

msg:4599050 appears to be incomplete.

Use the "owner edit" button to fill in the rest of the code.


There is no more I just listed the condition line to check if I am right.

g1smd




msg:4599058
 10:54 am on Aug 3, 2013 (gmt 0)

Please fill in the Rule that follows that Condition; otherwise I am confused as to exactly what you are trying to do.

Surely the Rule for "products" (or some of the other Conditions preceding the Rule) will be different to the Rule (and/or Conditions) for "not products" otherwise there's no point in having two Rules?

I'd like to see both rulesets in full. Remember, you're only working on only one set of code, I'm following code in three threads here, one somewher else, as well as doing my own stuff. The easier you can make it for the reader to follow, the more answers you'll get. :)

batface




msg:4599059
 11:00 am on Aug 3, 2013 (gmt 0)

I summise the code is right and the example contains a typo?


It does, and thanks that's great! :-)

Hopefully I can work out all the permutations of what should and shouldn't be included and the exceptions that are there.

batface




msg:4599062
 11:04 am on Aug 3, 2013 (gmt 0)

Please fill in the Rule that follows that Condition; otherwise I am confused as to exactly what you are trying to do.


I just meant as an example if I didn't want to capture /products/ I would use:

RewriteCond %{REQUEST_URI} !^/products/
RewriteCond %{THE_REQUEST} /([^/]+)/\ HTTP/
RewriteRule /$ http://www.example.com/%1 [R=301,L]

perhaps I should have chosen a different folder name to not confuse

RewriteCond %{REQUEST_URI} !^/exclude/
RewriteCond %{THE_REQUEST} /([^/]+)/\ HTTP/
RewriteRule /$ http://www.example.com/%1 [R=301,L]

g1smd




msg:4599063
 11:12 am on Aug 3, 2013 (gmt 0)

Yes, that is exactly right if you want to redirect requests for old URLs that end with a trailing slash. The target URL is on www irrespective of whether www or non-www was requested. The target URL is extensionless.

However, I'm not so sure I would use
[^/]+ as the pattern. I would use [a-z0-9-]+ for the pattern.

There is one small problem to consider. A request for
example.com/product/foo/bar/?name=/junk/ will be redirected to www.example.com/junk and whether you want to take extra steps to prevent that is up to you.


Thanks for clarifying the question. :) Clear wording and examples are crucial.

I have on many occasions misread or misunderstood what was said and provided the right answer to the wrong question and I have seen several other people here do that too.

batface




msg:4599068
 11:22 am on Aug 3, 2013 (gmt 0)

Thanks for clarifying the question. :) Clear wording and examples are crucial.


I've worked 10 years in technical support and that has been my gripe! :)
haha, they say mechanics are usually bad drivers - I should learn something from that :)

I don't get on here often but I just wanted to say I really appreciate the help I get from you guys when I do.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved