Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite/.htaccess file opinions

could anyone suggest improvements on this?

         

Matthew1980

7:08 pm on Jun 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

First off, I know hardly anything about .htaccess.

This code is in a project of mines root folder, I'm wondering if this is as good as it can be or if there is a way of refining it at all, or even making it more efficient:-

RewriteRule ^(home|public|about|contact|release).html$ index.php?p=$1
RewriteRule ^release/script/(.*)/index.html index.php?p=release&script=$1 [nc]
RewriteRule ^index.html$ index.php



# URL rewrite rules
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>



Thanks for any hint's. This works fine as it is, just wondering if it could be made better.

Cheers,
MRb

g1smd

8:32 pm on Jun 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The .* in the middle of the URL to match should be replaced by a "faster" pattern like ([^/]+) or similar.

Add [L] to every rule.

Think about using "extensionless" URLs.

Don't use "index.html" in published URLs.

The [NC] flag opens up your site to Duplicate Content problems.

Make sure you set up a non-www to www canonicalisation rule.

In the rewrite, the -f and -d "exists" checks will hammer the server hard drive to death. Add a negative match preceding RewriteCond that directly excludes extensions for images, CSS and JS, and anything else that will never be handled by the script.

If the URLs for pages on your site were "extensionless" there would be no need for the -f and -d "exists" checks at all.

Matthew1980

8:46 pm on Jun 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi there g1smd,

Thanks for the input, I'm assuming that the (.*) is a wildcard or means everything etc, when you say pattern, I guess this is similar to regexp pattern, in which case I could specify only numbers & letters, or have I missed the point?

Think about using "extensionless" URLs.

I think you are meaning loose the index.html there, other than that I'm not too sure. What benefit is there in removing the index.html or index.php from an URL? I'm curious as to why thats a 'good thing'.

[NC][L] Is there a reference place that lists the different tags & how they can be used?

Make sure you set up a non-www to www canonicalisation rule.

I'll be honest - I have just googled "canonicalisation" and it sort of makes sense, but again not sure on how to 'code in' for that.

I have tried adding the [L] instruction to every rule but the site stops working, but I'm sure that's down to the way as the rules are structured/written.


[EDIT:]
This is further down in the .htaccess file, is this what you were referring to?
# URL rewrite rules
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>

Honestly, I am a complete noob to this, I understand some of this but not all of it :)

Cheers,
MRb

Matthew1980

7:45 pm on Jul 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

My mistake, I just realised that I had already posted the code in the first post (referring to the EDIT)

Could anyone point me to a decent reference place (apart from the apache site), and could someone tell me why it would be preferred to use extensionless urls, and to that end, what happens if I have a genuine folder that I wanted to have 'URL' access to, how would I stop that from being interpreted as a directive for a page, because if a page is not existing or not set, I redirect the user back to the main homepage, I don't use the standard error pages that servers provide, maybe that's bad practice, but that's how I'm set up.

Thanks,
MRb

jdMorgan

5:44 am on Jul 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> what happens if I have a genuine folder that I wanted to have 'URL' access to, how would I stop that from being interpreted as a directive for a page

Folder URLs end with slashes, and page URLs don't... A very simple thing to test for in mod_rewrite.

> because if a page is not existing or not set, I redirect the user back to the main homepage, I don't use the standard error pages that servers provide, maybe that's bad practice, but that's how I'm set up.

This is SEO suicide... really.

I strongly suggest that you spend about a week using Google to do site searches in this forum... All of your questions have been addressed previously, and to very deep levels, not reproducible in a single response.

"Extensionless URL"
"Link to index"
"404 redirect to home page"
"Speed up WordPress .htaccess"
"Duplicate content Get it right or perish"

These WebmasterWorld searches will help avert several pending disasters that you are now facing.

The short version may be to simply repeat what's been posted above:

Do not link to "index.whatever," link to "/". You will thank us when you replace HTML with PHP, or replace PHP with whatever comes along next. Remember that .html and .php are *filetypes* and that URLs are not files or filenames; they are only *associated* with filenames by the action of the server.

Do not allow more than one single unique URL to return the same content. Otherwise, you have a duplicate-content problem looming. If even a single character in a URL changes, that makes it a new and unique URL. So example.com/index.php and example.com/ are different URLs, and example.com/ and www.example.com/ are different URLs. If the four possible "www and iudex" variations all return the same content, then that means that you have created three additional competitors for your "real" home page URL's ranking. I would think that actual on-line competitors would be enough, without competing against yourself(!)

Now throw in FQDN-formatted hostnames, appended port numbers, incorrectly-applied [NC] flags, bogus query strings (e.g. http://example.com.:80/index.php?some-random-string-here versus http:www.example.com/) -- the number of possible duplicate-content URLs is practically infinite -- If you allow it to happen.

When g1smd said "pattern" above, he was indeed referring to the regular-expressions patterns used in RewriteCond and RewriteRule directives.

The best reference for Apache directives is at apache.org. There are lots of books and other Web sites available as well, but be warned that many of them contain incorrect information. Therefore, I suggest that you use them only as aids to understand the original documentation at apache.org.

Your post count here is certainly non-trivial, so I'd say that it is time to get serious about your server configuration. It is the foundation upon which your site is built, and no matter how nice the site, it can all come crashing down if the foundation is poorly-constructed. Research first, then code later.

Jim

Matthew1980

9:38 pm on Jul 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi there jdMorgan,

Thank you for taking the time to explain all that to me, I had no idea about a few things as you mention there, I shall have to research this more deeply now than I first thought, I have been sitting with the same .htaccess file for a while thinking it was fine (admittedly cobbled from lots of other sources), but I have spent ages making sure that my projects have been well coded & secure, but I guess I missed the most important part off.

Again, thanks for the pointers and advice.

Cheers,
MRb