Forum Moderators: phranque

Message Too Old, No Replies

File URI rewrite - .htaccess rules suddenly causing server error

mod-rewrite, .htaccess, URL-rewrite

         

chankirtan

4:27 pm on Feb 25, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Please can anyone help to identify flaw or incompatibility in .htaccess code causing server error?

In the process of changing over from static to dynamic website, I had to maintain some static files with .html extension, but others were replaced with .php extension. Furthermore, I wanted visitors to be able to access any files by file name alone without the extensions.

OBJECTIVE: To cause the browser to return file-name.php, if it exists; else return file-name.html - whether the visitor has typed the url as any one of the following:

1. 'http://mydomain.com/file-name'
2. 'http://mydomain.com/file-name.html'
3. 'http://mydomain.com/file-name.php'

The following code - implemented some years ago, and I've lost track of the source - has worked perfectly all this while, until today.

# BACKWARD COMPATIBILITY RULESET
# FOR REWRITING FILE URI TO file.php IF EXISTS
Options Indexes +FollowSymLinks +MultiViews
Options +ExecCGI
RewriteEngine on
RewriteBase /
# parse out basename, but remember the fact
RewriteRule ^(.*).html$ $1 [C,E=WasHTML:yes]
# rewrite to document.php if exists
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ $1.php [S=1]
# else reverse the previous basename cutout
RewriteCond %{ENV:WasHTML} ^yes$
RewriteRule ^(.*)$ $1.html


All of a sudden, this block of code is causing server error. Website is on shared hosting, and I do not have shell access. Server was recently upgraded, but right now my account is running on PHP 5.3.28, Apache 2.4.7. Even after the server maintenance last week, everything was still okay even up to yesterday. Only today it has ceased to work. I renamed the .htaccess file and created a fresh file, pasted in fresh code, but sure enough, this bit of code is causing the server to choke. Can anyone point out what might be the problem?

chankirtan

6:37 pm on Feb 25, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Just wish to add that the rule set above was originally published at [httpd.apache.org...] Originally written by
Ralf S. Engelschall <rse@apache.org>December 1997, applicable to Apache httpd v 2.0. The guide has been updated for Apache httpd v 2.4 at [httpd.apache.org...] and the rule set under the same category is now different:

# backward compatibility ruleset for
# rewriting document.html to document.php
# when and only when document.php exists
<Directory /var/www/htdocs>
RewriteEngine on
RewriteBase /var/www/htdocs

RewriteCond $1.php -f
RewriteCond $1.html !-f
RewriteRule ^(.*).html$ $1.php
</Directory>


But it does not appear to address the same conditions. And how would these rules be placed in the .htaccess file? Are the Directory tags essential?

lucy24

10:31 pm on Feb 25, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: tangent ::
Options Indexes +FollowSymLinks +MultiViews

Nooooo! Please say that was a typo, or your server will explode.

:: detour to Apache docs ::
Mixing Options with a + or - with those without is not valid syntax, and is likely to cause unexpected results.

Now then:

You CANNOT USE <Directory> sections in htaccess. This is because htaccess is, itself, a directory-specific file. ("But, but, but you can nest <Directory> sections!" Yes, well, tough.) The two alternatives are:

A. make a supplementary htaccess and put it in the directory it applies to. This is best for basics like Index options or directory-specific expiration headers.

B. for mod_rewrite, let the pattern of each rule show the directory, like
^paintings/rats/blahblah


RewriteEngine on
RewriteBase /

You don't actually need a RewriteBase, because the target of each RewriteRule will begin in a / slash.

At this point I deleted several paragraphs of reply, because I think we need to backtrack and explain in English what you're trying to do. It looks as if the object is to serve the same content to all requests, regardless of original extension: effectively

RewriteRule blahblah(\.html|\.php)? valid-content-here [L]


That's not duplicate but triplicate content.

Conditions involving -f should be considered an absolute last resort, since it means the server has to physically look for the file before it can do anything else. Whenever possible, try to name the actual files involved. How big is the site? How many pages were originally html and are still static files? It may be easier just to rename all physical files-- this can be done globally, surely. And then redirect everyone to extensionless

RewriteRule ^([^.]+)\.(html|php)$ http://www.example.com/$1 [R=301,L]

Personally I don't approve of extensionless URLs, but that's just age ;)

chankirtan

12:23 am on Feb 26, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you, lucy24. Yes, you've understood my objective to serve the same content to all requests, regardless of original extension, more or less. That is to say, where foo.php exists, return that rather than a foo.html; otherwise, if no foo.php, return the foo.html; and also return http://example.com/foo.

Simply renaming the actual files involved could work with this particular website, but not with the other - which is much older and has thousands of files. Right now, that website is still running on Apache 2.2.24, but any day now...

May I ask, if I apply your rewrite rule, how should I enter it in the .htaccess file? Does it need also "RewriteEngine on" and "RewriteBase /" and the Options?

[edited by: phranque at 12:42 am (utc) on Feb 26, 2014]
[edit reason] Please Use example.com [webmasterworld.com] [/edit]

lucy24

2:39 am on Feb 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The directive

RewriteEngine on


needs to go in each separate htaccess that uses mod_rewrite, because it is not inherited-- unlike most things in Apache. This, in turn, means that you should bend over backward to avoid having RewriteRules in more than one htaccess along the same path.

The reason you don't need a RewriteBase --although it does no real harm --is that the RewriteBase is only used when the target of a RewriteRule begins in a naked directory name (no http://example.com OR / for "root"). This will never happen, because your targets always will start in / or full protocol-plus-domain.

Please abandon the idea of serving the same content to different requests. It will work up until the day that someone types in the wrong extension-- or, more to the point, until the day a search engine happens to see what happens if they request ".html" instead of the familiar ".php". That's when you suddenly get duplicate or triplicate content.

So you've got three things:
-- pages that started as html and still are, so URL = physical file
-- pages that started as html and were changed to php
-- pages (existing or future) that have always been php

Is this a live site? Have you already started issuing redirects?

How many pages of the first two types have you got? Continuing html pages, and formerly-html-now-php? More accurately, how many URLs are involved? And how many of those URLs translate to "the entire contents of such-and-such directory"?

There's more than one approach, but solutions depend on those two questions: What's the current redirect/rewrite situation, and how many URLs are involved?

chankirtan

7:09 am on Feb 26, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi lucy24. The larger website has since been redeveloped as a Wordpress multisite, and whole directories were redirected where possible; however, thousands of static files remain that were not converted, and their extensions are a mix of .html and .php.

There's no duplicate content; where foo.html was updated to foo.php, the original foo.html was deleted from server, but there were just too many files and links all over the place to update each and every one. We need to allow for old links calling for foo.html to resolve to foo.php when it exists; else serve the existing foo.html.

g1smd

7:23 am on Feb 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A crucial question here is what do the WordPress URLs look like? Are these .html or extensionless?

Where foo.html was updated to foo.php, were some, all or none of the links updated to point to the new URL and was a redirect from .html to .php installed?

If file /foo.php can be accessed as www.example.com/foo.php and as www.example.com/foo.html you do have a duplicate content issue. I'll assume you do have a non-www to www redirect in place.

You have multiple issues to solve:
- making sure each piece of content has a single canonical URL
- redirecting non-canonical URL requests to the canonical form
- for any given URL request, deciding whether to fulfil from WordPress, a .php file or a .html file.

My strategy would be to use extensionless URLs within Wordpress, redirect all .php and .html requests to extensionless, redirect non-www to www, then use the -f test to see if the request matches a physical .php file and serve it, use the -f test to see if the request matches a physical .html file and serve it, and if neither match, internally rewrite the request to be handled by WordPress. The disadvantage of this method is that some internal navigation clicks would lead to a redirect. It would be a good idea to build a sitemap listing only extensionless URLs.

An alternative strategy would again see extensionless URLs for WordPress and a redirect to extensionless for .php and .html URL requests but this time only for those that do not resolve to a physical .php or .html file. The remainder of the site would continue using .php and .html URLs. For those requests internal rewrites would use the -f test to select the correct .php or .html file to serve. This latter strategy does not fully fix the existing duplicate content issues where multiple URLs can access the same content.

You have to analyse this from a URL viewpoint as that is all that matters to searchengines and browsers. They cannot see, nor do they care, how the content is organised internally.

lucy24

9:45 am on Feb 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's no duplicate content; where foo.html was updated to foo.php, the original foo.html was deleted from server

I think you may have misunderstood what "duplicate content" is. It doesn't mean parallel versions of the same physical file; it means two or more URLs leading to the same content.

user asks for
pagename
receives
pagename.php

user asks for
pagename.html
receives
pagename.php

That's duplicate content: "pagename" and "pagename.html" lead to the same material.

user asks for
otherpage.php
receives
otherpage.html

user asks for
otherpage.html
receives
otherpage.html

Also duplicate content: "otherpage.php" and "otherpage.html" lead to the same material.

chankirtan

2:39 pm on Feb 26, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you both so very much. Yes, I had misunderstood about duplicate content from the viewpoint of search engines and browsers. Redirect www to non-www is in place (prefer http://example.com rather than http:///www.example.com). The WordPress site is using extensionless URLs. The only problem I have is with the old website whose file extensions are a mix of .html and .php. Please can you help me with a rule set that will allow me to redirect all .php & .html requests to extensionless, and use the -f test to verify that the request matches a physical .php file or .html file and serve it, or if neither match, redirect to the 404?

g1smd

6:43 pm on Feb 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If neither match there should not be a redirect.
That would erroneously return 301 or 302 status.

# Redirect all .php and .html requests to extensionless

# Redirect www to non-www

# Use the -f test to see if the request matches a physical .php file and serve it

# Use the -f test to see if the request matches a physical .html file and serve it

# Continue with existing WordPress rewrite

lucy24

7:58 pm on Feb 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if neither match, redirect to the 404

I hope that's another terminology problem. You don't "redirect" to a 404. You serve a 404 response, accompanied by your custom 404 page. The form is

ErrorDocument 404 /my404page.hmtl


with leading slash but no protocol-plus-domain.

If you've got a huge mixed number of files, you may really have to use the -f test, ugh. There are two parts.

First redirect to extensionless. If your filepaths contain no literal periods (hostname doesn't count) you can use the simple form:

RewriteRule ^([^.]+/)\.(html|php) http://www.example.in/$1 [R=301,L]


Then rewrite. Again, this is when you have no literal . in filepaths:

RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^([^.]+)$ /$1.php [L]


and

RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^([^.]+)$ /$1.html [L]


Lord! What a mess. But I think that's simplest. If your filepaths do contain literal periods-- it's perfectly legal, but do try to avoid it-- your rules will be more commplicated. But make sure they're written so the body of the rule only matches requests for pages. You don't want the server having to evaluate conditions on every single request for a stylesheet or favicon.

lucy24

1:46 am on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops, forgot an essential detail, so it's a good thing we are on opposite sides of the planet. The rule for the redirect needs a Condition looking at {THE_REQUEST} so you don't go around in circles. It will look something like

RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /[^.]+/\.(html|php)


You can leave off the capturing parentheses, keeping only the ones for grouping. Note that literal spaces have to be escaped. (This is much more crucial than escaping literal periods, because an unescaped space can crash the server. Or, er, make all requests return a 500.)

Final note: The two rules for rewriting to html and php can go in either order, since they're independent of each other. So start with the one that will happen more often; there's no other factor.

chankirtan

2:01 am on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks again. I'm taking your advice. Wish me luck! Will be back if I run into trouble.

g1smd

6:53 am on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



See my comments in the preceding post regarding rule order.

Your www to non-www redirect will need to go after the php and html to extensionless redirect and before the internal rewrites.

chankirtan

7:34 am on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ok. Yikes, my .htaccess file is a mess right now. Let me work on it and get back to you with what I've got. It looks like one terrible tangle of electric wires you might see in Old Delhi. : (

g1smd

7:55 am on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add a blank line after every Rule for clarity.

Make sure each rule has a comment describing what it does.

To make it easier to refer to particular rules it is also useful to number them.

chankirtan

1:59 pm on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi, I'm back. : ) And thank you for the added notes. Yes, I hope you are not disappointed that I sometimes take so long to get back to you. The time difference is one thing, and having to do some other stuff is another. But here I am, trying to make sense of this. Please advise me:

A. Are the following .htaccess rulesets okay?

B. Is the order correct so far?


1.
#PHP and HTML extensionless redirect
RewriteCond %{THE_REQUEST} ^[.]{3-9}\ /[^.]+/\.(html|php)
RewriteRule ^([^.]+/)\.(html|php)$ http://www.example.com/$1 [R=301,L]



Note: Lucy24, you originally suggested:
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /[^.]+/\.(html|php)

But some file names include numerals or characters like the hyphen, so I've substituted [.] for [A-Z]. Is that okay? And what is the function of the quantifier {3-9}? I don't understand it.


2.
# Canonical redirect
# Redirect from www.example.com to example.com
RewriteCond %{HTTP_HOST} !^(example\.com)$
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]



3.
# backward compatibility ruleset for
# rewriting document.html to document.php
# when and only when document.php exists
RewriteCond $1.php -f
RewriteCond $1.html !-f
RewriteRule ^(.*).html$ $1.php [L]


Note: It seems to me - and I may well be wrong (again!) - that this code (from httpd.apache.org) should work as well or better than two separate rules. But is it written correctly here for .htaccess? Because up at httpd.apache.org, it's enclosed in
<Directory>
tags. You had suggested the following:

RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^([^.]+)$ /$.php [L]
RewriteCond %[REQUEST_FILENAME}\.html -f
RewriteRule ^([^.]+)$ /$.html [L]


Do your rulesets work differently or serve a different purpose than #3 above?

C. Where do I put "RewriteEngine On"? The live .htaccess file has at least 3 different rulesets that start with "RewriteEngine On"! Crazy, but miraculously the website is still working.

D. And where and how frequently should I put "Options +Indexes +FollowSymLinks +Multiviews +ExecCGI"?

E. You said I don't really need "Rewrite Base /", but I want to doublecheck with you. The website is organized with WordPress at root, and practically all old physical files inside folders, with the exception of two or three files in root. Thus the path to most - but not all - of the files is http://example.com/quux/foo.html or http://example.com/quux/foo.php.

F. I have a lot of .htaccess rulesets. Some for security. WordPress Multisite. 301 page redirects. User agent redirects. WordPress Supercache. Where should I put these rules in relation to those?

g1smd

2:38 pm on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewiteEngine and Options go once at the very beginning.
These should be followed by rules that block access, i.e. those with [F] flag, then rules that redirect, ordered from most specific to most general.

In rule 1, ^[A-Z]{3-9} is correct. This matches the GET or POST part of the http request.

In rule 1, both [^.]+/\. are incorrect. See my previous examples.
I much prefer (([^/]+/)*[^/.]+)\.(php多tml)

Rule 2 should be the final rule that redirects as it is the most general of all the redirects.

After that come rules that rewrite, e.g. Rule 3 and the WordPress rewrite.

Your rule 3 is for serving content from .php files that are requested with a .html URL. The replacement code serves content from .php or .html files that are requested as extensionless URLs. Rule 1 enforces extensionless requests.

On the final two rules, the backreferences should be numbered as per the earlier examples: /$1.php

The WordPress internal rewrite must be at least after Rule 2, and depending on how it is coded perhaps after Rule 3.

chankirtan

4:41 pm on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you, g1smd. Now I understand the difference between the backwards compatible ruleset and the rulesets for serving content from .php and .html files that have been requested as extensionless files. Thanks also for explaining about the rule ^[A-Z]{3-9}.

This is what I have so far for the extensionless redirect, canonical redirect, and serving content from .php and .html files.

For Ruleset 1, which will it be?

# 1 per Lucy24
# PHP and HTML extensionless redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /[^.]+/\.(html|php)
RewriteRule ^([^.]+/)\.(html|php)$ http://www.example.com/$1 [R=301,L]


# 1 per g1smd
# PHP and HTML extensionless redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ (([^/]+/)*[^/.]+)\.(php多tml)
RewriteRule ^([^.]+/)\.(html|php)$ http://www.example.com/$1 [R=301,L]



# 2
# Canonical redirect
# Redirect from www.example.com to example.com
RewriteCond %{HTTP_HOST} !^(example\.com)$
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]


# 3
# serve content from .php or .html files requested as extensionless files
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^([^.]+)$ /$1.php [L]
RewriteCond %[REQUEST_FILENAME}\.html -f
RewriteRule ^([^.]+)$ /$1.html [L]



Do I need to put "Options +Indexes +FollowSymLinks +Multiviews +ExecCGI" somewhere?

And where should the WordPress Multisite ruleset go?

g1smd

5:28 pm on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The new pattern is needed in both places in my version of Rule 1. You have only changed it in one place and accidentally deleted a required slash in the process.

Also a fix is needed for non-www target hostname and a correction to the pipe character.

[edited by: g1smd at 6:30 pm (utc) on Feb 27, 2014]

chankirtan

5:44 pm on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



So this?

# 1 per g1smd
# PHP and HTML extensionless redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ (([^/]+/)*[^/.]+)\.(php多tml)
RewriteRule ^(([^/]+/)*[^/.]+)\.(html|php)$ http://www.example.com/$1 [R=301,L]

g1smd

6:27 pm on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# .php and .html to non-www extensionless redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /([^/]+/)*[^/.]+\.(php|html)
RewriteRule ^(([^/]+/)*[^/.]+)\.(html|php)$ http://example.com/$1 [R=301,L]

chankirtan

6:33 pm on Feb 27, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Oh, right! I didn't even look at the non-www bit of it. : o Thank you very much. Have a good day. I'm off to the Land of Nod.

g1smd

8:49 pm on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Every character is important. Just one change could result in completely different functionality.

lucy24

10:15 pm on Feb 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To clarify:
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /[^.]+\.(html|php)
...
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /(([^/]+/)*[^/.]+)\.(php多tml)


There are two differences between these versions. One is that g1 and I use different operating systems and therefore end up with different "pipe" characters. Your actual htaccess file will use the one on your keyboard (probably shift-backslash).

The other difference involves [^.] vs. [^/]. "Any character other than a period" vs. "any character other than a slash". If you never use literal periods in directory names, either one will work. If there is a possibility of literal periods, use [^/] for the first part, saving [^./] for the filename.

It now occurs to me that any element looking at %{THE_REQUEST} should say [^/\ ] or [^./\ ] since the request itself contains literal spaces. Otherwise the RegEx
[A-Z]{3,9}\ /([^/]+/)*[^./]+\.
would initially match like this:
GET /filename.html HTTP/1.1
requiring even more backtracking.

Note that the space has to be escaped even inside grouping brackets.

That "HTTP/1.1" is also the reason you can't just say
RewriteCond %{THE_REQUEST} \.
and be done with it. The request always contains at least one literal period.


[mod's note]typo fixed in the code at the top of this post which will change the meaning of a few subsequent posts discussing the typo[/mod's note]

[edited by: phranque at 4:28 pm (utc) on Feb 28, 2014]
[edit reason] fixed typo [/edit]

chankirtan

6:58 am on Feb 28, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy, are you saying I should use the following instead?

RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /(([^/]+/)*[^/.\ ]+)\.(php|html)
RewriteRule ^(([^/]+/)*[^./\]+)\.(php|html)$ http://example.com/$1 [R=301,L]

Lucy & g1smd - thank you both for all your help. Please point me to resources where I can learn more about regex for mod rewrite. The tutorials and cheatsheets I've looked at are helpful, but I'm struggling to wrap my head around some of the expressions.

For instance, why does the rewrite rule not say ^(([^/]+/)*[^/.]+)$ http://example.com/$1 [R=301,L] instead of ^(([^/]+/)*[^/.]+)\.(php多tml)$ http://example.com/$1 [R=301,L]? The way I read it, it looks like the extension will be tagged on. Isn't everything within ^ and $ tagged onto the $1 ?

Also please tell me whether I need to also put Options +Indexes +FollowSymLinks +Multiviews +ExecCGI" ? Where?

[edited by: chankirtan at 7:10 am (utc) on Feb 28, 2014]

[edited by: phranque at 9:42 am (utc) on Mar 3, 2014]
[edit reason] fixed the first line of code [/edit]

g1smd

7:07 am on Feb 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy. Both of the examples you quoted in message 4649849 have typos.

The first has no space for a filename. It requires requests to have a period directly after the final folder slash.

The second is missing the initial root slash, has a redundant set of brackets and the wrong pipe character - all of which I had fixed in my earlier post (4649801).

g1smd

7:25 am on Feb 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Everything inside the first set of ( and ) becomes $1.

The RegEx pattern matches a request that includes a .php or .html extension. The rule target is an almost identical URL but without the extension isssued as a redirect.

The Condition makes sure the ruleset matches only external requests for .html or .php and avoids an infinite loop.

RewiteEngine and Options go once at the very beginning.

The WordPress internal rewrite must be at least after Rule 2, and depending on how it is coded perhaps after Rule 3.

[edited by: g1smd at 8:03 am (utc) on Feb 28, 2014]

g1smd

9:20 am on Feb 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy. For root filename requests you are correct about my pattern initially matching all the way through to the slash in the HTTP/1.1 part of the request.

In this case, the remainder of the pattern clearly does not match as a filename and extension and therefore it is quickly established the * evaluates as "zero times".

Reprocessing of the request begins from after the root slash and is tested against only the "filename" part of the RegEx pattern, i.e. matching [^/.]+\.(php|html).

This occurs for root folder requests and occurs in a similar manner after the "last" folder level when processing a request with multiple folder levels.

The pattern matches "too much" but does it only once per request.

This pattern allows periods in folder names, and one period immediately before the extension.

The pattern is open ended after the extension as the OP hasn't indicated if any URL requests have attached parameters and I wanted to keep the code simple. Ordinarily the pattern would continue with provision, or not, for parameters and end with HTTP/ or similar.

That additional code would be vital if the pattern had to match only requests with attached query string parameters or match only requests without attached query string parameters. In the current more general case as posed in this thread the extra code has been omitted.

chankirtan

10:20 am on Feb 28, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi g1smd, URL parameters are way over my head, and I can't think where they might be employed for the .php and .html files on the old website. Certainly for image files like .gif, .png, .jpg, but the redirection is only for .php and .html URLs.

Thank you for explaining the rewrite rule. I have yet to test the rulesets, and I won't be able to for the next two days, because I have other stuff on the stove, so please don't be annoyed if you don't hear back from me right away. But I do promise to post results. : )
This 42 message thread spans 2 pages: 42