Forum Moderators: phranque

Message Too Old, No Replies

generic mod rewrite required

         

whatson

10:01 am on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What RewriteRule would be required to remove the .php extension from a url.

e.g. mysite.com/about.php will become mysite.com/about

But I want it to be generic so any other url like mysite.com/contact.php will be mysite.com/contact

Is this possible?

incrediBILL

1:07 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You mean something like...

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !\.php$
RewriteRule ^(.*)$ $1.php [L]

g1smd

5:34 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mod_rewrite cannot alter URLs. You alter the URL by changing the links on your page.

Once that has been done, you install a rewrite to intercept those requests and internally rewrite them to the place where the content really resides.

Don't use (.*) as the pattern. You're never going to rewite requests for images or stylesheets, so make the pattern more precise matching only extensionless requests.

The -f and -d tests are very very inefficient. They are not needed at all if you make the RegEx pattern match only extensionless URLs.

The target will be /$1.php with a leading slash. Don't omit the slash otherwise you leave your server open to hackers using path injection methods.

incrediBILL

5:40 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The -f and -d tests are very very inefficient.


They aren't that inefficient but obviously slower than not making an OS request. They just make an OS call to test for a file and the directory data used in both the -f and -d is typically cached so it's really pretty quick.

Seriously, you make it sound like -f and -d are such hogs and they really aren't.

Considering the sloppy PHP scripts that get executed at the end of the request, which open tons files, way more than those Apache commands, it's trivial by comparison.

g1smd

5:56 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. The -f and -d tests are hugely inefficient having to make two calls to the hard drive to check that files are not there for requests that were never going to be rewritten anyway. These checks literally beat server hard drives to death, as each "page" might result in several dozen such checks, the vast majority of them for requests that were never going to be rewritten anyway (images, stylesheets, js files, etc).

It should be one of the highest priorities in htaccess coding to eliminate such checks completely. In this case it is simple to do. Use a RegEx pattern that matches requests for pages but does not match requests for images, stylesheets or js and other files.

incrediBILL

7:56 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These checks literally beat server hard drives to death


Don't even go there.

Lots of high volume sites use -f and -d and they're quite snappy.

While I respect your vast knowledge of Apache, I used to work in OS internals for a hard disk company and had to profile the performance of OS calls as part of my job, particularly directory performance. They do NOT beat the drive to death as that data is CACHED in memory the majority of the time like I previously stated making it quite quick.

This argument is like the ones the web hosts always tell my customers that having a PHP script pre-process page requests will bring the server down vs. using htaccess when the site is WordPress which is 100% PHP in the first place, duh.

g1smd

8:08 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's absolutely no need to go look on the hard drive (twice!) to check that requests that were never going to be rewritten do actually match up with a real file on the hard drive when a tighter RegEx pattern could have made that decision in a very small fraction of that time. Jim made this point over and over again for many years in this forum. A pattern looking only for extensionless URLs will operate vastly faster than any other process.

lucy24

11:13 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Rewrite pattern is

^([^.]+)$ $1.php [L]

This rule is only invoked when you meet an URL that contains no literal periods-- which is to say pages. There is then no need to check whether the file exists, because "blahblah" and "blahblah.php" will end up getting the same 404. Putting the -f in a RewriteCond will not absolve the server from checking all over again when it gets to the point of actually serving up the page.

Before this Rewrite, there is a Redirect that goes something like

RewriteCond %{THE_REQUEST} \.php$
RewriteRule ([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

(Thrown together from memory, so do not cut & paste!)

g1smd

12:45 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The target (of the rewrite) will be /$1.php with a leading slash. Don't omit the slash otherwise you leave your server open to hackers using path injection methods.

I'd prefer a start anchor on that RewriteRule pattern for the redirect. :)

lucy24

1:10 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd prefer a start anchor on that RewriteRule pattern

Y'know, I thought about it and decided it couldn't matter unless your URL (excluding domain name) contains more than one period-- and if it does, it's more likely to come after the "php", as in ".php.zip". Although that's not why I conversely decided there has to be a closing anchor. I was thinking of "php2" or similar.

Then again, thinking along the lines of "it can't make any difference UNLESS..." may or may not be safe when composing RewriteRules. (Something vaguely analogous to "It's impossible to make things foolproof because fools are so ### ingenious.") Like escaping literal periods: Sooner or later, someone will come along and request "widgetshtml".

incrediBILL

3:17 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's absolutely no need to go look on the hard drive


Exactly.

That's why the OS caches directory data and doesn't hit the hard drive twice, once worse case, not likely even that if it was cached in the first place which is highly likely.

whatson

4:05 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok thanks - so are you saying that if I do this, then anything.php should become anything/ not anything?

lucy24

6:59 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No, absolutely not. Trailing slash = directory. No slash = file. So "anything.php" would become "anything".

But as long as we're on directories: make sure that names in the form "blahblah/index.php" never show up anywhere. It's simply "blahblah/". (Not "blahblah/index" ;)) You don't need to make a RewriteRule to fetch content from index.php; it happens automatically unless you've got wonky config settings.

So, come to think of it, you need a redirect for requests in "index.php" before the generic ".php" redirect.

That's why the OS caches directory data and doesn't hit the hard drive twice

How did hard drives get mixed into this? The server, unlike your computer, caches nothing and has no memory.

incrediBILL

8:21 am on Jun 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How did hard drives get mixed into this? The server, unlike your computer, caches nothing and has no memory.


We got into this because G1 said something misleading and you're wrong as well but I suspect you're messing with me but I'm too annoyed by the topic to sort it out.

Argue all you want, those 2 flags are a bit slower than other options in certain circumstances, but not to the point that we tell everyone to avoid them like it's going to bring the server down which is 100% nonsensical FUD.

To be specific, when -f and -d become killers is when they're being used on directories with thousands of entries but so is every other disk operation at that point and performance overall will suffer, not just Apache. Even in those circumstances it's more of a linear processing problem than a disk issue since they're still in CACHE!

Then again, I have 30K images in a single directory and they get served pretty fast all day long so what do I know?

Oh yeah, it's directory cache that makes them fast.

Now move along.