Forum Moderators: phranque

Message Too Old, No Replies

Is this htaccess file crashing my site?

         

leebow

12:41 pm on Oct 5, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi All,

I've pieced this together - and I'm not sure if this is the reason my apache keeps overloading and needs restarting...

# Dont list files or folders
Options -Indexes
#
# Dont show server details
ServerSignature Off
#
RewriteEngine On
RewriteBase /
#
# Add WWW
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*) http://www\.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
#
# add trailing slash if missing
rewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ $1/ [R=301,L]
#
# Allow Access to real files or folders
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
#
# Everything else goes to loader
RewriteRule ^(.*)$ loader.php?t=$1 [L,QSA]
#
# CACHE EVERYTHING...
<IfModule mod_expires.c>
# Enable expirations
ExpiresActive On
# Default directive
ExpiresDefault "access plus 1 month"
# Html
ExpiresByType text/html "access plus 1 month"
# My favicon
ExpiresByType image/x-icon "access plus 1 year"
# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
# CSS
ExpiresByType text/css "access plus 1 month"
# Javascript
ExpiresByType application/javascript "access plus 1 year"
# Music
ExpiresByType audio/mp3 "access plus 1 year"
</IfModule>


It's a really popular site (about 1million page views a day).
Up to now its been static html pages - and we are trying to move to a template system - so don't know if the server just cant cope with everything being routed to loader.php using rewrite rules.


Thanks so much!

lucy24

5:04 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can't say I much care for the RewriteRules, though they can't necessarily be blamed for the server getting tired.
# Add WWW
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*) http://www\.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
#
# add trailing slash if missing
rewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ $1/ [R=301,L]
#
# Allow Access to real files or folders
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
#
# Everything else goes to loader
RewriteRule ^(.*)$ loader.php?t=$1 [L,QSA]


Rules 1 and 2 are in the wrong order. Domain-name-canonicalization goes last among all external redirects. All other redirects require the complete protocol-and-hostname, which is conspicuously absent from Rule 2 (the one that should be Rule 1):
RewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ http://www.example.com/$1/ [R=301,L]
It may or may not be preferable to express the pattern as
^([^.]+[^./])$
to simplify the capture.

Rule 1 (which should be Rule 2) is optimally
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC] 
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
No point in saying REQUEST_URI if you've already captured it anyway. Periods in the target don't need to be escaped. Nothing will break; it just isn't needed. (And speaking of not needed: You don't need a RewriteBase. It's safer to put a / at the front of each Rewrite target. This is only relevant for internal rewrites; external redirects look the same either way.)

But--ahem! cough-cough!--shouldn't you be thinking about moving to HTTPS? The only substantial change to the htaccess will be adding one more RewriteCond, with [OR], to the rule above.

You're also missing a rule 1.5, which handles requests ending in /index.html (plus /index.php or whatever extension you actually use). That's assuming your site includes real, physical directories with real, physical index files. If it doesn't, the -d test (below) is also not needed.

Now, about that loader.php. The -f and -d tests are pretty server-intensive. For most sites in most circumstances, all supporting files--images, stylesheets and so on--are real files that really exist. So it shouldn't be necessary to check for their existence. Moreover, you've got URLs with mandatory final / which, by itself, means that the request will never involve a real file. You only need to check whether it's a real directory. One possibility is
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /loader.php?t=$1 [L,QSA]
This should be immediately preceded by

RewriteRule ^loader\.php - [L]
Since you already know this file physically exists, and there will be a lot of internal requests for it, get it out of the way as soon as possible.

Edit: Explanatory comments are a good idea. But personally I'd think the # for empty lines are counter-productive; it just makes it harder to eyeball and see where one ruleset ends and another begins. Blank lines in htaccess/config (unlike, say, robots.txt) are ignored by the server. So you can use them liberally.

And a final by-the-way: It's your own server. Why on earth is all this in htaccess? It should be in the config file in a <Directory> section reserved for site files.

leebow

6:24 pm on Oct 5, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Lucy - can I just say - thank you for such a fantastic reply! :) It's helped so much!

As you can probably tell - I've no idea what I'm doing when it comes to htaccess -- I didn't even know about any alternatives by using a config file - I will have to read about that - and hope the people running the dedicated server can help!

(I'm hoping to move the site to https next year. It's just another item on the list of the never ending jobs.)

About Rule 1.5 - the site eventual wont have any real directories or index files - as I move from a static site with index.php files - to the new template site. But until it's fully moved - pages that aren't yet in the template system - people still need to access them - so the RewriteCond %{REQUEST_FILENAME} !-d could possibly be removed in the future.

Without the RewriteCond %{REQUEST_FILENAME} !-f -- if a user does try to access a file that doesn't exist - can I include a link to custom 404 page - or will it just be picked up by the loader.php - and the 404 will be shown through the template engine?

The perfect htaccess file for my site? :) :) :)

Options -Indexes
ServerSignature Off
RewriteEngine On

RewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ http://www.example.com/$1/ [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /loader.php?t=$1 [L,QSA]

RewriteRule ^loader\.php - [L]

# Custom 404 - just in case someone tries to go to file that doesn't exist - example.com/notreallythere.png
ErrorDocument 404 /404.php


Thanks so much again! Really really appreciate it :)

(How did you highlight code? I've been through posting guidelines and help section? edit: Oh - the site has done it itself this time! I wonder why it didn't in my original post!)

keyplyr

7:28 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@leebow - it is highly unlikely the rewrite rules have anything to do with the server overloading and needing restarting.

But the loader.php is a prime candidate. Read through you host documentation and see if there is anything you can do to speed up php processes, possibly upgrading php versions and/or switching to fast-php.

You can also set caching and compression for various file types and applications to relieve server load and improve download time for your visitors.

And oh yeah... as lucy24 asked - why are you using htaccess when you have admin rights to the server?

lucy24

8:22 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess the temporary answer is: Do it all in htaccess until you're certain it works right. After that, you can move it to config and get rid of all Override settings so the server doesn't continue looking for htaccess on every request. (Fun fact: It isn't necessarily the existence of htaccess that slows things down. It's the possibility of htaccess.)

if a user does try to access a file that doesn't exist - can I include a link to custom 404 page - or will it just be picked up by the loader.php - and the 404 will be shown through the template engine?
Under your new CMS, all user requests will be for files that don't exist. That's the point of your loader.php. And paradoxically that's why you may not need the -f test: you already know it will always test positive, assuming the body of the rule has a pattern that will only match pages, not supporting files, and that you've previously excluded requests for loader.php. (General guideline: never put something in a RewriteCond that can go in the pattern of a RewriteRule. That's a guideline, not a law.)

Now, within loader.php there will be something that checks whether the request is for a legitimate URL--which is not the same thing as a physically existing file. If loader.php is unable to build a page with the requested URL, it serves up the 404 response accompanied by the 404 page. (Those are two different things.) If you are accustomed to dealing with hard-coded html, you will need to wrap your brain around a new idea: The response code that the server sends out, as reflected in your server access logs, is not necessarily the response code that the user receives. Thanks to the CMS, page requests in server logs will always show up as 200. (Or 301 if they used the wrong hostname. Or, of course, 403 if they are blocked outright, or 410 if you later remove pages.)

robzilla

9:05 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have simple template engines and you have complex template engines. Even a simple engine will significantly increase the strain on your server compared to the static HTML files, simply because the PHP parser is called upon and some code is executed. With a complex engine and a million pageviews per day, you need to make sure your hardware (and software configuration) is up to the task. One way I like to quickly assess the extra processing required is to open the Network tab in the Developer Tools in Chrome (or another browser), load a page powered by the template engine, and note the Time to First Byte (TTFB). Then do the same for a static file like an image or robots.txt. Generally speaking, the bigger the difference, the fewer requests your server will be able to handle simultaneously. A PHP profiler can help you identify bottlenecks.

I haven't used Apache in a long time. If it doesn't come with a PHP opcode cache out of the box, make sure you install one.

phranque

9:25 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



robzilla makes an excellent point.
if your content is cacheable and your cacheing is enabled, that can significantly reduce the load on their server.

robzilla

9:41 pm on Oct 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thankfully, opcode can always be cached :-) But an additional in-memory key/value cache like APCu or Redis can also make a huge difference; it just requires a bit more set-up than an opcode cache, which is essentially plug-and-play.

leebow

8:41 am on Oct 6, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Guys - this advice has been amazing!

I really appreciate that you've all took the time to reply - You've given me ideas I've never heard of - and it sounds like exactly what I need to do.

I will check the TTFB - and opcode - and all other advice - your fantastic htaccess Lucy24 (until transferring it to the config) :-)

I tried to keep my template engine as simple as possible - it's basically its..


page1:

$page = array('title' => 'My Page 1','heading' => 'Welcome!');

Template:

<title><?php echo $page['title']; ?></title>

Then - the main parts of the loader:

$url = filter_var($_SERVER['REQUEST_URI'], FILTER_SANITIZE_URL);
// All the code to get just the part I need - which becomes $filename
// Is page cached:
if(file_exists('cache/' . $fileName . '.txt')){
include_once('cache/' . $fileName . '.txt');
die();
}
//If not.. build page...
include('pages/' . $filename . '.php');
ob_start();
include "template.php";
$contents = ob_get_contents();
ob_end_clean();

$fp = fopen(('cache/' . $fileName . '.txt'), 'w');
fwrite($fp, $contents);
fclose($fp);
echo $contents;


And that's basically it.

I know this has gone past the htaccess support now in the apache web server forum - so i'll have to post in the PHP forum - but again - thanks to you apache guys for you help! :)

robzilla

12:14 pm on Oct 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's pretty good, but the file_exists() check is pretty expensive to execute for every request. A memory cache like APCu is perfect for this. You could set the cache key to $fileName and the contents of the cached file as the value, and then use apc_exists() and apc_fetch() the same way you use file_exists() and include_once(), and apc_store() like fwrite(), but without the expensive disk access. Of course, you'd need enough memory to potentially store all your cached pages in, but as little as 100 MB of RAM can store 20.000 5KB pages.

As for Apache requiring a restart, there's probably valuable info in the Apache server (not access) logs.

leebow

4:25 pm on Oct 6, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks again robzilla -- I will look into this now!

The server we are paying for has 32GB ECC DDR4. The site has 468 pages - about 12kb each - So based on your example - we should be fine!

And If the memory does flush the cache for whatever reason - it would just fall back to loading the page - and recreating the cache again :)

Thanks again for the help!


<?php
$url = filter_var($_SERVER['REQUEST_URI'], FILTER_SANITIZE_URL);
// All the code to get just the part I need - which becomes $filename
// Is page cached:
if(apcu_exists($fileName)){
echo apcu_fetch($fileName);
die();
}
//If not.. build page...
include('pages/' . $filename . '.php');
ob_start();
include "template.php";
$contents = ob_get_contents();
ob_end_clean();

apcu_store($fileName,$contents);

echo $contents;
?>



Also - i'm looking to see how to use php-opcache - as its enabled by default - and I wouldn't have to ask my host to add apcu.

Thanks again

lucy24

6:10 pm on Oct 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



include('pages/' . $filename . '.php');
What happens if filename.php doesn't exist? Seems like you should start your buffering earlier, so you can detour to the 404 track if necessary. Otherwise bad requests will lead to empty pages--or pages that contain nothing but what's in the template.

$filename = $fileName ?

leebow

6:27 pm on Oct 6, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Sorry - I have code to check - I just thought i'd keep it simple for the example :-)

I've used:

if(!file_exists('pages/' . $fileName . '/index.php')){
header("HTTP/1.0 404 Not Found");
include_once("404.php");
die();
}else{
// The load page part show above
}


I'm really struggling to decide about the advice by robzilla now.

Do I ask for apcu to be installed on my server - and cache pages like explained above.

OR -- do I remove the in-built caching part - and just let opcache cache everything - so basically I would have:


<?php
$url = filter_var($_SERVER['REQUEST_URI'], FILTER_SANITIZE_URL);
// All the code to get just the part I need - which becomes $filename

if(!file_exists('pages/' . $fileName . '/index.php')){
header("HTTP/1.0 404 Not Found");
include_once("404.php");
die();
}else{
//If not.. build page...
include('pages/' . $filename . '.php');
ob_start();
include "template.php";
$contents = ob_get_contents();
ob_end_clean();
echo $contents;
}
?>


Does opcache remove the need to try and cache the $contents - from what I understand - everything will be cached.

(I'm even thinking now about staying with html pages - and putting up with nightmare of managing them! All this is getting complicated!)

lucy24

9:23 pm on Oct 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If it gets too complicated you may prefer to spin-off specific questions to the php subforum. It will be seen by the same people, more or less, but it keeps things tidier.

robzilla

9:27 pm on Oct 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Before your server knows what to do when you run a script, a parser program will compile your PHP code into a set of instructions called bytecode. By default, this happens on the fly, anew every time a script is run. An OPcode cache can take that resulting bytecode and store it in a memory cache. As a result, so long as the script itself doesn't change, the compilation process can be skipped entirely, and the bytecode can "simply" be executed, saving the processing power required to parse the PHP code. That's all the OPcode cache does, and that's why it's basically plug-and-play. I see you already have it enabled, great!

An OPcode cache doesn't interact with your PHP code. To implement an in-memory cache that can replace your file-based cache, i.e. one that can store simple key-value pairs, you would need a separate caching module like APCu, which is the branched-off user cache of the original APC module that included both an opcode cache (now made superfluous in recent versions of PHP) and a user cache. Memcached is another example, but I personally like the simplicity of APCu and its integration into PHP. It also comes with a basic APCu INFO script that gives you data about memory usage (so you can easily tell if you've assigned ample memory to the cache) and allows you to search and delete cache entries. If you can get your host to install APCu, refer to the APCu manual [php.net] for all available functions.

leebow

7:37 am on Oct 7, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Oh - that makes perfect sense - thank you again. :-)
So apcu is must really to reduce the read/writes with fopen. Then opcode to cache the compiled code.

Just going back to my htaccess again - I know there is a specific folder (called “files”) which contains php pages - that will never need to be rewritten by my loader.php script - have I excluded this folder correctly?

It doesn’t need to force www, or a forward slash, the files are just used by the template engine.


Options -Indexes
ServerSignature Off
RewriteEngine On

RewriteRule ^(files)($|/) - [L]
RewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ http://www.example.com/$1/ [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /loader.php?t=$1 [L,QSA]

RewriteRule ^loader\.php - [L]


Thanks so much again everyone for all this help :)

lucy24

6:28 pm on Oct 7, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^(files)($|/) - [L]
This can be simplified to
RewriteRule ^files - [L]
No capture, no closing anchor. That is, I assume you don't concurrently have a directory called, say, /filesand/ or similar, which is not to be protected. If you want to protect yourself against this possibility, you can always put a word boundary:
RewriteRule ^files\b - [L]
Unless you've got a directory called /files-and-more/ (a hyphen counts as a word boundary) in which case I wash my hands of you ;)

leebow

6:56 pm on Oct 7, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ohhh Lucy24 - I’m sorry but you’ve gone over my head with one RewriteRule ! :-)

I have: /files/sub-folders/more/

But everything inside of files doesn’t need to be checked.

lucy24

8:59 pm on Oct 7, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ordinarily
^files
will do. The lack of a closing anchor means it doesn't matter what comes after "files". The form
^files($|/)
is only necessary if you happen to have something whose name begins with "files" other than the /files/ directory and its contents. If you were cutting-and-pasting and aren't actually sure what it means: the form
files($|/)
means "the next thing after "files" is either a directory slash or nothing". (In the case of "nothing", since /files/ is a real physical directory, mod_dir will step in later to append the slash.)

No matter what, you do not need parentheses around "files" There exist circumstances where you'd put parentheses around a single non-optional element that is always the same--but they are rare.

leebow

1:25 pm on Oct 13, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks so much again!

The new htaccess file seems to be working great! The site is running nice :-)

There is one problem - When I removed this line as most files are REAL:

RewriteCond %{REQUEST_FILENAME} !-f 


Which has saved a LOT of rewrites to the site - as you said - all images are real, all js files are real, etc

The index.php files are no longer real - so if a user goes to this:

example.com/my-page/ <-- the site works great :-)

If they go to example.com/my-page/index.php <-- This fails as the file doesn't exit

I don't really want them going to the index.php page - but I don't know if search engines want to try and access the .php file (even though it doesn't exist)

Is there a way of modifying this line - so to say... allow access to all files on the site (As they are probably real) - but index.php isn't real?

RewriteCond %{REQUEST_FILENAME} !-f 



Thanks again!

phranque

2:32 pm on Oct 13, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There is one problem - When I removed this line as most files are REAL:

RewriteCond %{REQUEST_FILENAME} !-f

the purpose of this conditional directive is to prevent the following RewriteRule from firing when files that actually exist are requested - such as documents, scripts, media files, css, etc

If they go to example.com/my-page/index.php <-- This fails as the file doesn't exit

I don't really want them going to the index.php page - but I don't know if search engines want to try and access the .php file (even though it doesn't exist)

you need to add an index directory document canonicalization redirect before the hostname canonicalization redirect:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]

lucy24

8:25 pm on Oct 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The index.php files are no longer real

Is someone actually making these requests, or is this purely something that came up in your own experimentation?

I can't remember if I said earlier that in any case you'll need a redirect for requests ending in index.html--which can easily be generalized to "index\.(php|html)" or even "index\.\w+"--because search engines can and will ask for blahblah/index.html any time they've found a blahblah/ with final slash.

Since these requests are fairly rare, it may be easier on your server to make a RewriteCond whose sole purpose is to capture:
RewriteCond %{REQUEST_URI} ^/((?:\w+/)*)index\.\w+
RewriteRule index\.(?:html|php)$ https://www.example.com/%1 [R=301,NS,L]
Ordinarily this will be your second-to-last redirect, immediately before the domain-name-canonicalization rule (the one that handles www and/or https). Replace \w with [\w-] if you use hyphens in URLs. If the index.php files never, ever really exist, I guess you don't need the [NS] flag, but it does no harm.

leebow

8:19 am on Oct 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks again for the help Lucy24 :)

That unfortunately made it so my visitors HAD to have index.php at the end of the urls - unless I put it in the wrong place!

All my site links are just example.com/directory/ and that's how id like the visitors still to use them. Being able to access the index.php was just for search engines.

So example.com/directory/ and example.com/directory/index.php both needed to be routed through my /loader.php?t=$1 [L,QSA]

Options -Indexes
ServerSignature Off
RewriteEngine On

RewriteRule ^files - [L]
RewriteRule ^loader\.php - [L]

RewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ http://www.example.com/$1/ [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_URI} ^/((?:\w+/)*)index\.[\w-]
RewriteRule index\.(?:php)$ http://www.example.com/%1 [R=301,NS,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /loader.php?t=$1 [L,QSA]

phranque

12:56 pm on Oct 14, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That unfortunately made it so my visitors HAD to have index.php at the end of the urls - unless I put it in the wrong place!

All my site links are just example.com/directory/ and that's how id like the visitors still to use them. Being able to access the index.php was just for search engines.

So example.com/directory/ and example.com/directory/index.php both needed to be routed through my /loader.php?t=$1 [L,QSA]

do you have a "physical" subdirectory of the document root directory named /directory/?
if so this would explain why that path isn't being passed to your loader script.
you might try removing this line:
RewriteCond %{REQUEST_FILENAME} !-d

however if there are any other trailing slash url paths that shouldn't get rewritten to loader.php you need a different solution.

phranque

1:16 pm on Oct 14, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteRule index\.(?:php)$ http://www.example.com/%1 [R=301,NS,L]

you should use the version lucy24 suggested:
RewriteRule index\.(?:html|php)$ https://www.example.com/%1 [R=301,NS,L]

leebow

5:16 pm on Oct 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks phranque - I tried to change lucy24's version - as my pages never were "index.html" -- so if suddenly you could access the pages at index.html AND index.php - google would probably have me for duplicate content :)

I need to be able to let a user access this: example.com/sub-folder/ and the search engine access this: example.com/sub-folder/index.php (and only .php - my site also contains-hyphens-for-more-htaccess-fun!)

BOTH do NOT actually exist on the server - and need to be picked up by the loader.php script.

I've also got another problem now with this line:

RewriteCond %{REQUEST_FILENAME} !-d


It works great in letting people get to real directories -- but it's stopping the loader.php get pages to the site root e.g. example.com/ is forbidden because of Options -Indexes and example.com/index.php doesn't work as I haven't found a way to make the index.php file work yet :-)

I've started reading the guide to htaccess - as I'd love to learn this myself. It's amazing how these rules and just RewriteRule and RewriteCond can make so many changes to the way people can access your site.

lucy24

6:52 pm on Oct 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC] 
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_URI} ^/((?:\w+/)*)index\.\w
RewriteRule index\.(?:php)$ http://www.example.com/%1 [R=301,NS,L]
These two rules are in the wrong order. First the index redirect, then the domain name. So it would be
RewriteCond %{REQUEST_URI} ^/((?:\w+/)*)index\.\w
RewriteRule index\.(?:php|html)$ http://www.example.com/%1 [R=301,NS,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
When I said to replace \w with [\w-] I should have specified that it only applies to the rest of the URL, because obiously you don't have extensions beginning in - hyphen.

To clarify again: You've never used index.html, so those files will never exist. The only reason for saying anything at all about "index.html" is that search engines will request it--and if you didn't have the redirect, who knows where the request would end up? (Possibly with a 404, which serves them right, but why put your server to the work?)

but it's stopping the loader.php get pages to the site root
Oh, oops, you'll need to change
^([^.]+)$
to
^([^.]*)$
with asterisk instead of plus, so the rewrite also applies to the root. I don't think I realized that loader.php handles everything; a lot of times there could be a hard-coded root page and then the CMS takes over for the interior pages.

Matter of fact, hard-coding the root might not be a bad idea, since that's where most of your requests will come in, but this depends on how the site is coded overall. Otherwise, look into caching so the server doesn't have to keep making a fresh root page every two seconds.*


* Number pulled out of a hat. 30x60x24/2 = about 20,000 human visitors a day.

leebow

8:32 pm on Oct 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



You know what... you’re probably right about keeping index static and a real page - it would make it much easier on the server.

I will try the loader.php ithough — to see how it goes - as it will let us start having “new features” and “new this week” etc. Which might help visitors find more of our sites content.

Thanks again for all the help! :-)

I can’t wait to try the new htaccess - I feel like driving to the office now to give it a go. Lol

leebow

1:04 pm on Oct 16, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Everything almost works now :) Users can access with or without index.php

The only thing not working is - the ^([^.]*)$ still doesn't allow the root index.php file to be removed. You just get a forbidden error - probably because of the Options -Indexes

Options -Indexes
ServerSignature Off
RewriteEngine On

RewriteRule ^files - [L]
RewriteRule ^(([a-z0-9\-]+/)*[a-z0-9\-]+)$ http://www.example.com/$1/ [R=301,L]

RewriteCond %{REQUEST_URI} ^/((?:\w+/)*)index\.\w
RewriteRule index\.(?:php|html)$ http://www.example.com/%1 [R=301,NS,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]*)$ /loader.php?t=$1 [L,QSA]

RewriteRule ^loader\.php - [L]


I know I keep saying it -- but THANK YOU SO MUCH for this help! I'd never have been able to work this out. Thank you! :-)

lucy24

10:32 pm on Oct 16, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Options -Indexes
This line tells the server not to generate an automatic index for directories that do not contain an index.html file (or index.php, or whatever else you have specified in your DirectoryIndex line). It has nothing to do with allowing access to index files that physically exist. Users who request the bare directory will get a 403 response if and only if there is no index.html (index.php) in the directory and you don't have a rewrite such as loader.php that deals with the request in some other way.

What happens right now if you request
example.com/index.php
? Look at three things: your browser's address bar, the content of the visible page ... and your raw access logs. Error logs may or may not be necessary. Access logs ought to show two consecutive requests: first "GET /index.php" with response 301, and then "GET /" with response 200.

RewriteRule ^loader\.php - [L]

This line should go before all the redirects, right after the one for ^files. In fact you could consolidate them if you wanted to:
RewriteRule ^(files|loader\.php) - [L]
But keep an eye on your logs to make sure you never get an explicit request for /loader.php. If you do, you'll need a rule for it. This is one of the many situations where you don't need to deal with an issue unless and until it actually arises.
This 34 message thread spans 2 pages: 34