homepage Welcome to WebmasterWorld Guest from 54.198.42.213
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess rewrite for cms. need some expertise
mihomes




msg:4509901
 4:40 pm on Oct 19, 2012 (gmt 0)

So I am trying out a cms platform to add into an existing site for articles.

The following htaccess is given for its subdir (www.example.com/news/) :


RewriteEngine On
#RewriteBase /news
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1 [L]


Basically this says if a file exsits then go to it else direct it to index.php. This allows the generated pages to show seo friendly urls like www.example.com/news/article1/ or www.example.com/news/category/article1/

Two things I want to accomplish :

1 - I am losing my non-www to www functionality given in my main sites htaccess which is :


RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


So basically, anything in the news folder (the cms) allows both www and non-www urls.

2 - I have doubts about this, and the more I think about the code I think it is in the actual script rather than this htaccess, but is there any way possible to have actual articles as name.php? In the example above all categories and actual articles are shown as folders for the url. So something like www.example.com/category/article1.php instead of www.example.com/category/article1/

 

lucy24




msg:4509988
 8:57 pm on Oct 19, 2012 (gmt 0)

Basically this says if a file exsits then go to it else direct it to index.php.

Not quite.

It says: if a file exists (-f) at the URL requested, then serve content from that location.

If there is no directory at the URL requested, then serve content from index.php.

It does not say what happens if there is no file (!-f) at the URL requested. What happens if someone asks for a nonexistent image? Or, conversely, what happens if there really is a directory there. (Possibly there aren't any, so the situation wouldn't arise.)

I am losing my non-www to www functionality given in my main sites htaccess which is

(correctly worded redirect)

BUT
if this comes after the Rewrites you quoted above, you have got yourself into a potentially disastrous mess.

I have doubts about this, and the more I think about the code I think it is in the actual script rather than this htaccess, but is there any way possible to have actual articles as name.php? In the example above all categories and actual articles are shown as folders for the url. So something like www.example.com/category/article1.php instead of www.example.com/category/article1/

Does each pseudo-folder contain only one article? Then you might argue that its name should be

www.example.com/category/article1

with no trailing slash. But you are welcome to let the user think each article is in its own little folder. They're all made-up anyway.

In any case you need to put your 301 redirects before the rewrites. This should have no effect on the final handling of the cms area, so long as it's carefully written, but will ensure that everyone starts out on the same metaphorical page.

Does the CMS serve contact from
www.example.com/index.php?lots-of-stuff-here

or from

www.example.com/news/index.php et cetera

?

I'd expect to see the word /news/ somewhere in those rules. But not in the RewriteBase, which you have properly commented out.

mihomes




msg:4509997
 9:32 pm on Oct 19, 2012 (gmt 0)

The actual cms is in the directory example.com/news/. The current format is example.com/news/category/article/ or example.com/news/category/sub-category/article/ depending if the article is in a sub-cat or not.

It appears having an actual file name such as article.php for the articles rather than a directory is going to cause a lot of work. At this point in time I am keeping it as is.

As for the 301 redirect from non-www to www I would still like to implement this. To answer your question, I think, index.php is the template for the cms. So, anything related to the template is pushed to that file and that file only. So I should be able to perform the redirect on it - correct?

g1smd




msg:4510005
 10:02 pm on Oct 19, 2012 (gmt 0)

Redirects work on URL requests coming from outside, not on files inside the server.

htaccess cannot and does not "make" URLs.

That said, you are better off moving your code to the root htaccess file, especially any code that performs any sort of redirect.

Make sure that none of your rules use the Redirect or RedirectMatch directives, convert all of them to use the RewriteRule directive.

mihomes




msg:4510007
 10:26 pm on Oct 19, 2012 (gmt 0)

Redirects work on URL requests coming from outside, not on files inside the server.

htaccess cannot and does not "make" URLs.


I know this - I don't think you understood the situation. Essesntially I just want to make sure my non-www to www redirect works in conjunction with this folder.

As it sits, you can enter in both non-www and www, but only for this /news/ directory. I want this directory to be forced to www like the rest of the site.

g1smd




msg:4510008
 10:40 pm on Oct 19, 2012 (gmt 0)

I understand perfectly. Simply add the non-www to www redirect code to the root htaccess file of the site it should apply to, and add it after any more-specific redirects and ahead of any internal rewrites.

mihomes




msg:4510057
 2:10 am on Oct 20, 2012 (gmt 0)

Well I already have :

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


in the sites root .htaccess. You're saying I should enter in the :

RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1 [L]


before this? If that is the case I'm going to need some help applying the above to that specific folder location... in this example /news/

I guess what I don't understand is why wouldn't the same principle work in an .htaccess in the /news/ location? Yeah, the redirect portion would be repetitive since its already in the root, but it should still work right - so far it has not.

lucy24




msg:4510091
 5:44 am on Oct 20, 2012 (gmt 0)

Nooo.

The CMS rewrites need to come AFTER all redirects. For htaccess purposes, "after" can mean either physically later in the same file-- or in a second htaccess in a deeper directory. (Caution! This only works because everything is happening in the same module, mod_rewrite in this case. Processing goes from outermost to innermost. Or from the top down, depending on how you visualize it.)

Since the domain-name redirect has no other restrictions, I don't understand why the CMS rewrites are preventing it from working. From your original post I got the impression the rewrites come before the redirect, which would explain it. But now you're saying the redirect comes first?

You don't need to have identical commands in a root htaccess and a subdirectory htaccess. Anything headed for the subdirectory first has to pass through your root htaccess, in the same way that anything headed for your domain has to start with the config file. A request can't leapfrog straight to its final destination.

Unless -- and this is a big horrible Unless -- unless the CMS has also added a line in your primary htaccess, something like

RewriteRule /news/ - [L]

meaning that it would skip the whole mod_rewrite for the root-level htaccess, and proceed directly to the inner directory.

Building onto an htaccess that was created and installed by a CMS can be much harder than developing your personal htaccess from scratch.

mihomes




msg:4510102
 7:40 am on Oct 20, 2012 (gmt 0)

This is my root htaccess redirect :


RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


This is the htaccess given by the cms which is located in www.example.com/news/, the location of all its files and the index.php which is the template file which displays everything :


<IfModule mod_php4.c>
php_value session.use_trans_sid 0
</IfModule>
<IfModule mod_security.c>
SecFilterEngine Off
SecFilterScanPOST Off
</IfModule>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1 [L]


Given the above, the non-www to www redirect works fine EXCEPT in the /news/* location where I can type either www or non-www... it is not forced (redirected) to the www version.

Thoughts?

You guys sure are making this tough... doesn't seem like it, but then again I am horrible with the htaccess conditions.

lucy24




msg:4510114
 8:31 am on Oct 20, 2012 (gmt 0)

Well, I'm stumped. I cannot for the life of me figure out what's intercepting those /news/ requests and preventing them from getting redirected.

Oh. Wait. Rewind.

I've assumed all along that www.example.com/news is not just your CMS directory for URL purposes, but is also a real directory that's physically located inside your root directory. Is it? Or is the physical setup something like

/users/yourname/example.com
side by side with
/users/yourname/CMS-directory

?

Sometimes you can't tell, because hosts will alias things all over the place. It's intended to be helpful. But your error logs give the real, physical path. So if you have any doubts, go request some nonexistent files in both places-- CMS and regular domain-- and see what you can glean from the error logs.

Admittedly this is scraping the barrel. But g1 lives in a different time zone than I do, so all is not lost :)

mihomes




msg:4510182
 3:24 pm on Oct 20, 2012 (gmt 0)

www.example.com/news/ is a physical location and has a few directories and other files for the cms... the main file being the www.example.com/news/index.php template file which outputs every file for the cms. This is where the htaccess for it comes into play with the file exist and directory not exist because those categories and articles it creates as folders out of the /news/ are not physical (they are mapped to sql database in some manner).

I did find that <base href="www.example.com/news/"/> is set on this index.php (the template file) and appears to help populate the correct links on the page, however, if I go to the non-www version it of course becomes <base href="example.com/news/"/>

The 'site' link used in that base is created by :

function site() {
$host = 'http://'.$_SERVER['HTTP_HOST'];
$directory = dirname($_SERVER['SCRIPT_NAME']);
$website = $directory == '/' ? $host.'/' : $host.$directory.'/';
return $website;
}


in a separate php file that is php included in the index.php template file.

Now, if I change that above function to always return http://www.example.com/news/ then that forces all links in the pages (the template file) to be www since that is the base. Problem is I can still manually type an address in the URL without the www and it will show up(no redirect), yet since the base is www all links on the page are correct.

If there is a way to force that index.php to always be www version then that would solve everything, but for whatever reason that is not happening with the code in the htaccess.

g1smd




msg:4510202
 6:06 pm on Oct 20, 2012 (gmt 0)

In the
root htaccess:

RewriteEngine On

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^news/([^/.]+)$ /news/index.php?category=$1 [L]


The above assumes there are no more slashes than the one each immediately before and after "news" and that the friendly URL also contains no periods.

^news/([^/.]+)$ could be ^news/(([^/]+/)*[^/.]+)$ for multiple folder levels.



In the
/news/ htaccess:

php_value session.use_trans_sid 0
SecFilterEngine Off
SecFilterScanPOST Off

mihomes




msg:4510245
 8:51 pm on Oct 20, 2012 (gmt 0)

This kind of works... the redirect works fine, articles 404... categories and sub-categories show fine though.

Main problem is now the url's show in the format :

http://www.example.com/news/index.php?category=cat-example/subcat-example/

instead of

http://www.example.com/news/cat-example/subcat-example/
g1smd




msg:4510256
 9:36 pm on Oct 20, 2012 (gmt 0)

Use the Live HTTP Headers extension for Firefox to investigate the request and the responses.

You appear to have an unwanted redirect that kicks in after the internal rewrite has occurred. This exposes the internal filepath back out on to the web as a new URL.

That's usually caused by having rules in the wrong order, or mixing directives from mod_rewrite and from mod_alias within the same site configuration.

Articles should not have a trailing slash. A URL with a trailing slash denotes a folder or the index page in a folder. Pages should not end with a slash.

mihomes




msg:4510284
 11:28 pm on Oct 20, 2012 (gmt 0)

Articles do end with a trailing slash... all categories, sub-categories, and articles are setup virtually in a folder heirarchy with everything going through the /snews/index.php template. At some point in the code these virtual folders link to the actual location of the content for it in the sql database so as the correct content is output - this should be what that category= specifies.

Using the default setup (without the changes you noted above) live headers shows exactly how it is supposed to be (www.example.com/cat/subcat/article/. Using the changes you noted above, both versions of the rewrite rule, the request in live headers shows the same except a 404 is returned for everything other than the /news/ location.

g1smd




msg:4510287
 11:38 pm on Oct 20, 2012 (gmt 0)

Use the Live HTTP Headers extension for Firefox to investigate URL requests that redirect.

Alter the RegEx pattern in the internal rewrite to allow a trailing slash in the requested URL. Be aware that your URL structure now breaks the HTTP specs and that you'll need to be especially careful with all rules that internally rewrite. The rules must match only virtual folders in URLs and not requests for any real folders.

mihomes




msg:4510292
 12:33 am on Oct 21, 2012 (gmt 0)

There is no redirect occurring... everything is handled by the single index.php template file. The ?category= gives the file the information on what page it should produce from the database as well as the link structure.

www.example.com/news/category1/subcategory/article/ is not a real location and is handled by the /news/index.php just as every other page is. When using live headers it appears just as a normal page request, no redirect, just like a physical page/location would.

In its current state... if I were to go to example.com/news/ it would redirect to www.example.com/news/ (because of my sites root htaccess which does this and I'm assuming because there is a physical index.php file in this location.

Now, if I type example.com/news/category1/etc./ it does not redirect and just shows without the www.

Based off the /news/ htaccess... if the file exists then serve it. If the file is not a directory then pass it to the index php in /news/. To me it sounds like I just need to make sure when the bottom rewrite happens it is always the www version. Maybe it's not possible they way this cms system was written?

RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?category=$1 [L]

lucy24




msg:4510305
 1:31 am on Oct 21, 2012 (gmt 0)

In its current state... if I were to go to example.com/news/ it would redirect to www.example.com/news/ (because of my sites root htaccess which does this and I'm assuming because there is a physical index.php file in this location.

Now, if I type example.com/news/category1/etc./ it does not redirect and just shows without the www.

That's the part that's giving me trouble. Is there another line in the htaccess that you haven't quoted? I ask this with hesitation, because usually we have the opposite problem: people dumping the entire htaccess-- or CSS or what have you-- and hoping someone else can identify the relevant bits.

See, the with/without www redirect is supposed to be unconditional:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

No testing whether the page exists or not. Rewrites don't matter. The presence or absence of "index.php" doesn't matter. Look at the domain name and that's it.

The only pages that are exempt from this rule are the ones that have met an [L] somewhere earlier in htaccess. Or an L-equivalent such as F or G, but those don't concern us here.

The only [L] flags that should occur earlier than the www redirect are other, more specific redirects in [R=301,L]. And those redirects will always include the correct domain name at the front. And, finally, any redirected file will eventually loop all the way through the whole htaccess again.

mihomes




msg:4510342
 8:17 am on Oct 21, 2012 (gmt 0)

Nothing else in the htaccess for root or the /news/ directory than what I gave above.

At this point I have no idea. I have posted on the cms's forum as well in hopes of an answer, but nothing yet.

I really cannot figure this one out, but it obviously has to do with the way the cms code is written since what has been done is not working... well it is, but just the news' base location.

lucy24




msg:4510358
 10:06 am on Oct 21, 2012 (gmt 0)

:: continuing to grasp at straws ::

Does your host have any involvement in the CMS? If you got it through the host, there may be stuff in the config file that grabs requests before your htaccess ever sees them.

:: shudder ::

You just know the explanation is going to turn out to be something mortifyingly obvious that everyone overlooked, don't you :(

mihomes




msg:4510458
 7:39 pm on Oct 21, 2012 (gmt 0)

No, it is not through the host. Yes, I'm sure it probably will be something simple.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved