Forum Moderators: phranque

Message Too Old, No Replies

How to strip all (1 to 7) folders with ModRewrite

How to strip all (1 to 7) folders with ModRewrite

         

jtata

11:41 pm on Jul 24, 2010 (gmt 0)

10+ Year Member



Hello,

I have a question regarding mod rewrite and looking for a solution.

Situation:

A website already has a structure with 0 to 7 folders and urls look like domainname.com/first-folder/second-folder/third-folder/etc/actualpagename.html . The folders' names and folders amount varies through all the site.

Problem:

How to strip ALL folders despite their names and number of folders in URLs and get a final URL for all pages like domainname.com/actualpage.html ?

Was looking for a solution with:

RewriteRule ^dirA/(.*)/(.*)/(.*)/(.*)/(.*)/(.*)\.(.htm|.html)$ [domainname.com...] [QSA,L,R=301]

As a result - nothing happens :-/ .

*****

Can somebody help with achieving the desired result with this issue?

g1smd

11:57 pm on Jul 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



URLs are defined in links, so it is your links on your pages that have to change.

Once a user clicks one of those links, and the request is sent to the server, Mod_Rewrite accepts those requests and the RewriteRule maps the request to the right path inside the server to deliver the content.

There must be a "clue" in the URL that allows Mod_Rewrite to work out which folder the content is going to be served from.

Only when you have the rewrite working can you proceed to installing a redirect that redirects any stray requests for the old URLs to the new URL for that content.

Additionally, the multiple (.*) patterns in the redirect will kill your server, needing thousands of trial matches for each request. You will need a much more efficient pattern.

jtata

12:10 am on Jul 25, 2010 (gmt 0)

10+ Year Member



So actually we should include each and every folder and its subfolder and write a rule for redirect? It looks like a lot of work.

Is it possible to define that ALL folders and subfolders that are under specific directory - lets say .dirA are removed when showing the actual page in URL?

jdMorgan

1:20 am on Jul 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you remove the "directories" from the URL, how will the server know what page to serve?

On the other hand, if the various and sundry directories have no significance in serving a page, then how do you intend to prevent massive duplicate-content (same page appearing at many different URLs) problems?

We're asking these questions because the plan looks quite easy to do, but also quite inadvisable (if you care about usability or search engine ranking and traffic). It's the second time I've said this today, but we try not to provide the right answer to the wrong question...

It might be wise to back up and ask, "What, overall in the scheme of your site, are you trying to accomplish with this plan?"

If you answer in terms of URLs and the corresponding files, scripts, and/or pages of content, and the significance/meaning/origin of these "directory levels," that will provide the most direct path to understanding and a proper solution.

Jim

jtata

2:06 am on Jul 25, 2010 (gmt 0)

10+ Year Member



Well the situation is kind a different.

We used CMS that had option to remove hierarchy, now the CMS removes that option and all internal CMS directories (hierarchy) is displayed in actual URL. We have administrated pages inside the CMS in the way it was comfortable and understandable for us. Lets say we put product A within the hierarchy as:

categoryA/subcategoryB/subsubcategoryC/actualproduct.html

Within the hierarchy removed on display URL we showed domainname.com/actualproduct.html or domainname.com/categoryA.html (with list of related products) or domainname.com/subsubcategoryC (with list of related products). So all this makes our site Flat website.

Now, after this function of "do not show hierarchy" was removed we get a heavy URLs with lot subfolders.

The CMS does not allow duplicated page names.

Choosing the path with first directory defined like "^dirA/" we hope to guide function to the specific section of the overall websites structure from where we can be sure that removing folders will not harm other websites' sections (we do have multiple domains under one CMS).

Defining the end or actual page name with html/htm we hope to inform system its showable not a removable page.

Flat page is better for SEO. Yepp here comes a question flat vs. hierarchical website. Since today we choose flat and are very confused when this option was removed... so we are looking for a solution.

jtata

2:12 am on Jul 25, 2010 (gmt 0)

10+ Year Member



We hope that a rewrite rule function can stand for the removed CMS feature.

jdMorgan

2:30 am on Jul 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In that case, it seems you need to map an HTTP request for the URL examplee.com/actualproduct.html to the server-internal filepath /categoryA/subcategoryB/subsubcategoryC/actualproduct.html

So, the only way for the server to know what server filepath corresponds to the URL example.com/actualproduct.html is to search all possible /category/subcategory/subsubcategory directories.

Frankly, I'd suggest you map *all* URLs ending in .html to a script, which can open the database, look up the 'old' URL/filepath, and then call the CMS (as a wrapper) with the 'converted' URL. This is a bit more work than you want to try to do using only server directives in .htaccess.

If you've already got something like that in place, and the example.com/categoryA/subcategoryB/subsubcategoryC/actualproduct.html URLs exist *only* as links from third-party sites (and not from your own site), then stripping *all* path-info from the requested URL and redirecting is simple. But it's only simple if you do want to redirect *all* .html URLs. If there are any exceptions, then those must be incorporated into the code.

RewriteRule ^([^/]+/)+([^.]+\.html?)$ http://www.example.com/$2 [R=301,L]

That's all it takes, as long as you already have other working rewriterules in your .htaccess file. If not, you will need either both of the following directives, or only the second one; Only testing will tell you:

Options +FollowSymLinks
RewriteEngine on

Rule order: Place all external redirects first, in order from most-specific patterns and conditions (one or only a few URL requests affected) to least-specific patterns and conditions (many requested URLs affected), followed by all internal rewrites, again in order from most- to least-specific. End all RewriteRules with an [L] flag unless you know why you should not do so. This avoids multiple/chained redirects and avoids 'exposing' server-internal filepaths as URLs to HTTP clients.

Jim

jtata

3:33 am on Jul 25, 2010 (gmt 0)

10+ Year Member



we put the suggested

RewriteRule ^([^/]+/)+([^.]+\.html?)$ http://www.example.com/$2 [R=301,L]

and added specific folder in front that will guide to the required structure begining like:

RewriteRule ^dirA/([^/]+/)+([^.]+\.html?)$ http://www.example.com/$2 [R=301,L] -- works as a charm but does not redirect if its only 1 subcategory like example.com/dirA/actualpage.html .

So we added extra line and full rewrite looks like:

RewriteRule ^dirA/([^/]+/)+([^.]+\.html?)$ http://www.example.com/$2 [R=301,L]
RewriteRule ^dirA/(.*)$ http://www.example.com/$1 [R=301,L]

Looks like working.

Just to be sure:
- Is the first line will work for no matter how many subfolders exist?
- And does ^dirA/ dimension will work ONLY for this particular folder and all that is in it?
- Do you think it's a good idea to add full URL before the ^dirA/ for security? We do already have some rules defining url rewrites like:

RewriteCond %{HTTP_HOST} ^domain\.co\.uk$ [NC]
RewriteRule ^(.*) [domain.co.uk...] [QSA,L,R=301]

or

RewriteCond %{HTTP_HOST} ^(www.)?domain\.info$
RewriteRule ^(.*) [domain.com...] [QSA,L,R=301]

jtata

3:35 am on Jul 25, 2010 (gmt 0)

10+ Year Member



also do you think that RewriteRule ^dirA/(.*)$ http://www.example.com/$1 [R=301,L] has no mistakes? Maybe we should incorporate the .html?

RewriteRule ^dirA/(.html?)$ http://www.example.com/$1 [R=301,L]

jdMorgan

4:01 am on Jul 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You only need one rule. Change the quantifier "+" to "*" as shown:

RewriteRule ^dirA/([^/]+/)*([^.]+\.html?)$ http://www.example.com/$2 [R=301,L]

Adding the RewriteCond checking HTTP_HOST has nothing to do with security, so I do not understand that question.

Jim

jtata

5:49 pm on Jul 26, 2010 (gmt 0)

10+ Year Member



Thank you Jim!

The rewrite works as we expected it to work.

And you are right the rewrite is applicable for external links but not for the internally created links by the CMS itself.

jtata