homepage Welcome to WebmasterWorld Guest from 54.237.213.31
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Rewrite rule to organize files into subfolders
ocon

5+ Year Member



 
Msg#: 4604277 posted 2:35 am on Aug 23, 2013 (gmt 0)

With the awesome help I've received on these forums I've been able to piece together these lines of code in my .htaccess file:

RewriteEngine On
RewriteBase /

<FilesMatch "\.html\.gz$">
ForceType text/html
</FilesMatch>
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [L]

RewriteRule ^name/(([a-z0-9]+-)*[a-z0-9]+){1,255}/$ /cache/$1.html [QSA,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^cache/(([a-z0-9]+-)*[a-z0-9]+){1,255}\.html$ /scripts/createPage.php?name=$1 [QSA,L]


What the above code does is attempt to serve page /name/page-name/ from my cache (/cache/page-name.html). If a cached version doesn't exist it will load up a script to generate it for the user as well as save the .html and .html.gz file into the cache for future use.

This, combined with code that serves pre-gzipped content if available, is greatly speeding up my site and creating a marked reduction in bandwidth, but I'm still trying to improve the script further.

Right now this setup dump tens of thousands of pages into the /cache/ folder on my site. My understanding is that this is a bad design (can somebody confirm this?) and that I should organize these files into subfolders which is what I would like to do:

/cache/page-name.html => /cache/p/a/g/page-name.html
/cache/a-different-page.html => /a/-/d/a-different-page.html

In order to do this just the following lines would need to be modified:

RewriteRule ^name/(([a-z0-9]+-)*[a-z0-9]+){1,255}/$ /cache/$1.html [QSA,L]
RewriteRule ^cache/(([a-z0-9]+-)*[a-z0-9]+){1,255}\.html$ /scripts/createPage.php?name=$1 [QSA,L]


I'm hoping I can leave the regex untouched and that maybe there might be a trick I can do to the $1 part. If so I would love to know. If not, what approach to modifying the regex would you might advise?

Thank you very much!

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 4:45 am on Aug 23, 2013 (gmt 0)

Hey, I recognize this one.

(([a-z0-9]+-)*[a-z0-9]+){1,255}

I hope you appreciate that {1,255} means 1-255 iterations of the package in the outer parentheses; it doesn't mean the total number of characters in the string. (Go back to the other thread to refresh your memory.) Further, with this particular pattern, multiple iterations are exactly the same as one iteration:

abc-def-ghi-jkl + mno-pqr-stu + vw-xyz
=
abc-def-ghi-jklmno-pqr-stuvw-xyz
=
([a-z0-9]+-){6}[a-z0-9]+
=
([a-z0-9]+-)*[a-z0-9]+

AND... since this entire RegEx is meant to identify the path part of an URL, I really think it's safe to exclude the possibility that it would ever be longer than 255 characters! (I've seen query strings that could circle the earth-- petfinder dot org comes to mind --but not paths.)

Right now this setup dump tens of thousands of pages into the /cache/ folder on my site. My understanding is that this is a bad design

There are two unrelated issues. One is physical organization; the other is URL.

On the physical side, having tens of thousands of files in a single directory would make most people insane. But this only applies if you're talking about real, physical files. If it's all done with smoke and m-- Whoops! I meant parameters and dynamic pages-- then it really doesn't matter. Cached pages are a special case because even if you never lay eyes on them yourself, your server may still be creaking at the seams trying to plow through directories with several googol entries. Subdividing the directories may help the server.

On the URL side, forms like

www.example.com/dirname/any-of-tens-of-thousands-of-possibilities-here

are not likely to result in "friendly" URLs. But here you're just looking at physical files and physical directories, right?

ocon

5+ Year Member



 
Msg#: 4604277 posted 7:45 am on Aug 23, 2013 (gmt 0)

Yes, Lucy, that piece of regex was the most recent addition, the rest was piecemealed over a long time.

The 255 length corresponds to a MySQL database column that is saved as a varchar(255). Although I rarely see anything even approaching 20 characters, I'm using 255 as a hard cutoff for the reason that it could potentially and validly be up to 255 characters long. Besides this rewriterule, this regex is also used on my createPage.php script as an initial and fast way ensure that it only processes valid requests before it even connects to that database.

My database has about 40,000 rows in it, times 2 (.html and .html.gz files) means there could potentially be 80,000 files physically stored in that directory (and much more when the database gets larger). I once had a folder on my server storing 19,000 files stored directly in it, stored, not even accessed, and it was very unresponsive. I'm definitely trying to avoid.

I'm perfectly fine with www.example.com/dirname/any-of-tens-of-thousands-of-possibilities-here, I'm just looking at splitting up the physical files and directories right now and I think /cache/a/n/y/any-of-tens-of-thousands-of-possibilities-here.html might be the best approach for this, I'm just not the best at the regex.

I do think just three folders deep would be sufficient. Unless my math is wrong 80,000/37/37/37 = approximately 2 files in each of the deepest folders (assuming regular distribution which I know won't be the case).

Although not even close this is the furthest I can get the regex right now is:

RewriteRule ^name/(([a-z0-9]+-)*[a-z0-9]+){1,255}/$ /cache/$2/$3/$4/$1.html [QSA,L]
RewriteRule ^cache/([a-z0-9])/([a-z0-9-]*)/?/([a-z0-9-]*)/?(([a-z0-9]+-)*[a-z0-9]+){1,255}\.html$ /scripts/createPage.php?name=$4 [QSA,L]


It doesn't even work for long file names, let alone the possibility that the file might be less than three characters long.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 8:27 am on Aug 23, 2013 (gmt 0)

How early in the filename might hyphens occur? You don't want to have a directory named "-". It may be technically legal, but, well, ugh.

Pattern (here shortcutting to \w on the assumption that we can just ignore lowlines, otherwise replace each with [a-z0-9]):

^(\w)(\w)(\w)((?:\w+-)*\w+)
>>
$1/$2/$3/$1$2$3$4
(leaving off the extensions because you already know how to handle those)

That leaves only those filenames which
(1) have a hyphen in 3rd place or sooner
and/or
(2) have less than three letters
We'll get back to those if necessary, but only if they can actually occur.

RewriteRule ^name/(([a-z0-9]+-)*[a-z0-9]+){1,255}/$ /cache/$2/$3/$4/$1.html [QSA,L]

I don't understand where $3 and $4 come from. Even $2 may or may not exist. A further complication is that it might be either the first or the last iteration of [a-z0-9]+- depending on RegEx engine.

Get it working for long well-behaved URLs and we can come back to the shorter or trickier ones-- those that have a hyphen in or before 3rd place, and/or have no more than two characters total. But, again, only if they can actually occur within the code you're using to generate page names.

ocon

5+ Year Member



 
Msg#: 4604277 posted 10:18 am on Aug 23, 2013 (gmt 0)

A hyphen could easily be the second or third character and make for a hyphen named directory. I don't have a strong issue with that, other than being unsightly is there any other concern? It would be buried in the cache folders anyway and would never directly accessed in the URL (by other recently worked out .htaccess lines of code omitted from this thread for simplicity.)

In the last post I added some additional parenthese sets in an attempt that I could isolate these value separately and use them in the folder path, with $1 reserved for the whole file name. I was only able to attempt on how to put the sets in the second rewrite rule regex but added them to the second part of both because I thought they would eventually need to go there.

You have to forgive me, {1,255} not meaning what I thought didn't register with until shortly after my last message. I took a short nap and actually had a bad dream about that. Would this give me what I would want:
(([a-z0-9]+-)*[a-z0-9]+){,128} (exception it would allow up to 256 characters)?

Should I split up my two lines into four lines, adding to each a more rigorous regex check and then a simpler rewriterule if the regex check passes? I'm not sure exactly how to write the conditions though.

RewriteCond %{REQUEST_FILENAME} (([a-z0-9]+-)*[a-z0-9]+){,128}
RewriteRule ^name/(\w)(\w)(\w)((?:\w+-)*\w+)/$ /cache/$1/$2/$3/$1$2$3$4.html [QSA,L]

RewriteCond %{REQUEST_FILENAME} (([a-z0-9]+-)*[a-z0-9]+){,128}
RewriteRule ^cache/(\w)(\w)(\w)((?:\w+-)*\w+)\.html$ /scripts/createPage.php?name=$1$2$3$4 [QSA,L]


I'm not too sure what
((?:\w+-)*\w+) means.

leaving off the extensions because you already know how to handle those


That's a pretty bold assumption. ;)

Thank you for your help so far, I'm greatly appreciating both it and your patients.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4604277 posted 11:03 am on Aug 23, 2013 (gmt 0)

([a-z0-9]+-)*[a-z0-9]+) - allows an unlimited number of characters.

([a-z0-9]+-)*[a-z0-9]+){,128} - allows an unlimited number of characters, repeated up to 128 times.
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 11:25 am on Aug 23, 2013 (gmt 0)

Overlapping g1, because I type VERY slowly.
Would this give me what I would want: (([a-z0-9]+-)*[a-z0-9]+){,128} (exception it would allow up to 256 characters)?

Any quantifier-- whether it's * or + or ? or an exact number in braces-- applies to whatever comes immediately before it. In this case, that's the whole parenthesized group.

[abc] = any one character a or b or c
(abc) = the three-character sequence abc


More general RegEx stuff:

\w = [A-Za-z0-9_] (technically a lot more than a-z, but this doesn't apply to your URLs)

If your actual filenames are made up only of [a-z0-9] then \w will cover everything. You can't say \w if you need to exclude filenames that contain capital letters or _ lowlines.

If you have parentheses enclosing other parentheses, the captures are numbered according to where the opening parenthesis was:
((aa)(bb)(cc))
$1 = aabbcc
$2 = aa
$3 = bb
$4 = cc
So if you're making directory names by cutting out part of the full filename, that full filename will always be $1.

?: means non-capturing. Useful when you have a lot of things in parentheses, because parentheses serve two different purposes, capturing and grouping.
((aa)(?:bb)?(cc))
$1 = aabbcc
$2 = aa
$3 = cc
bb is still captured as part of $1; it just doesn't get a number of its own.

name/(\w)(\w)(\w)((?:\w+-)*\w+)/$ /cache/$1/$2/$3/$1$2$3$4.html

I think it works better intuitively if you shift the parentheses so it's
name/((\w)(\w)(\w)(?:\w+-)*\w+)/$ /cache/$2/$3/$4/$1.html
letting $1 be the full filename including the bits you pulled off to make directory names. But I don't know if there's any measurable difference in server load, so use whichever form you're comfortable with.

Is there any conceivable circumstance where you could get a request for a filename longer than 256 characters? Would your server's filesystem even allow it? Otherwise, it seems like just one more in the long list of errors you don't need to code for unless they really happen. Sort of the "Don't put beans up your noses" of mod_rewrite.

If you do need to prevent requests from coming in for overlong filenames, that's
RewriteCond %{REQUEST_URI} ^.{1,256}$

As g1 or someone like him pointed out in the other thread, it's going to be simpler to separate the counting RegEx from the character-validating RegEx. Counting starts at one, not zero, so {255} is 255 whatever-you're-counting, not 256.


Now then...

You're doing two different things and it may help to keep them separate.

#1 Sort the vast number of cached files into different physical directories
#2 Translate requests into filepaths.

We've dealt with vanilla filenames that start with four or more alphanumerics. Personally I'd deal with, uh, early-onset hyphens by leaving them out of the directory names:

^((\w)-?(\w)-?(\w)-?(?:\w+-)*\w+)$
>>
$2/$3/$4/$1
This is where it's handy to have that one overall capture. The directories themselves don't mean anything, right? It's just to keep the server from having to root through ten thousand files each time.

Will it confuse your record-keeping if some directories contain both files and subdirectories?
^((\w)-?(\w)-?(\w))$
>>
$2/$3/$4/$1.html

^((\w)-?(\w))$
>>
$2/$3/$1.html
...and that's all, with subdirectory /a/b/ containing a handful of loose files (would it be 72? jamais vu is beginning to set in...) in addition to any third-level subdirectories.

Similarly
^(\w)$
>>
$1/$1.html
where "a.html" is lying loose in the /a/ directory alongside /a/b/ and /a/c/ and /a/1/ and so on.

Will these supershort filenames actually occur?

[edited by: lucy24 at 11:34 am (utc) on Aug 23, 2013]

ocon

5+ Year Member



 
Msg#: 4604277 posted 11:33 am on Aug 23, 2013 (gmt 0)

Thanks g1smd, I clearly need to refresh on the regex and have gone back to the original thread to get it correct before continuing with this one.

ocon

5+ Year Member



 
Msg#: 4604277 posted 4:56 pm on Aug 23, 2013 (gmt 0)

I've fixed the regex and incorporated those changes into the rewrite rules and conditions, and now I'm hoping to proceed with my original problem.

I've converted:
RewriteRule ^name/(([a-z0-9]+-)*[a-z0-9]+){1,255}/$ /cache/$1.html [QSA,L] 
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^cache/(([a-z0-9]+-)*[a-z0-9]+){1,255}\.html$ /scripts/createPage.php?name=$1 [QSA,L]

Into:
RewriteCond %{REQUEST_URI} ^/name/[a-z0-9-]{3,255}/$ 
RewriteCond %{REQUEST_URI} !^/name/(-.*|.*--.*|.*-)/$
RewriteRule ^name/((.)(.)(.).*)/$ /cache/$2/$3/$4/$1.html [QSA,L]
RewriteCond %{REQUEST_URI} ^/name/[a-z0-9-]{1,2}/$
RewriteCond %{REQUEST_URI} !^/name/(-.*|.*--.*|.*-)/$
RewriteRule ^name/((.).*)/$ /cache/$2/$1.html [QSA,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^cache/(.+)\.html$ /scripts/createPage.php?name=$1 [QSA,L]

The code above works and I'm very happy with it, though I still have some concerns:
  • Lines 1-2, and as currently written lines 4-5 contain the regex split up on multiple lines. If possible I'd like to cut down on this bloated code and ideally make this into a single line.
  • Lines 1-3 are very similar to lines 4-6 with only a slight modification to handle file names between one and two characters long. If possible I'd appreciate feedback on how I could condense it down. I would love if the script could automatically decide how many subfolders to place itself into up to X-subfolders deep, such as 3-subfolders.
  • Line 8 no longer has any meaningful filename validation. If possible I'd like to incorporate this back in.
  • Line 8 passes all the subfolders into the $name variable (createPage.php?name=p/a/g/page-name). I'd like to change this so it only passes the last 'subfolder' name (createPage.php?name=page-name) but am having a difficult time with this as the files can now be various levels deep.
Again, I am very thankful for all the help and patients I have received so thus far. Thank you!
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 10:09 pm on Aug 23, 2013 (gmt 0)

You don't need all this:

!^/name/(-.*|.*--.*|.*-)/$


Get rid of the anchors, the /name/ and all .* You're not capturing the full request; you just need to check whether the forms occur at all.

I forgot that the overall form is /name/blahblah so anywhere I had ^ you instead need / representing the beginning of the part you're testing. And similarly replace $ with \. (literal period) for the end of the test string.

But if you can shift all of this to the php side, you can start by stripping away the /name/ and the .html leaving only your test string.

This probably all belongs in a single thread, but oh well.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4604277 posted 12:03 am on Aug 24, 2013 (gmt 0)

The pattern

!^/name/(-.*|.*--.*|.*-)/$

simplifies to

!(^/name/-|--|-/$)

and parses very much faster.

Add a blank line after each
RewriteRule for code clarity.
ocon

5+ Year Member



 
Msg#: 4604277 posted 1:08 am on Aug 24, 2013 (gmt 0)

So by shifting everything to PHP for the heavy duty verification I can greatly simplify the last section of code:

RewriteRule ^name/((.)(.?)(.?).*)/$ /cache/$2/$3/$4/$1.html [QSA,L] 

RewriteRule ^name/((.)(.?).*)/$ /cache/$2/$3/$1.html [QSA,L]

RewriteRule ^name/((.).*)/$ /cache/$2/$1.html [QSA,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^cache/(.+)\.html$ /scripts/createPage.php?name=$1 [QSA,L]

I guess this distills the script down to my two remaining issues:
  • I've expanded the first RewriteRule to support cached files ranging from 1 character long (stored 1 level deep), 2 characters long (stored 2 levels deep), and 3 characters or longer (stored 3 levels deep). Can these three lines be condensed?
  • In the last line is it possible to capture just the filename without any of the subfolder names, regardless of how many levels deep it is stored?

ocon

5+ Year Member



 
Msg#: 4604277 posted 2:43 am on Aug 24, 2013 (gmt 0)

The last RewriteRule I think I have figured out:

RewriteCond %{REQUEST_FILENAME} !-f 
RewriteRule ^cache/.*/([a-z0-9-]{1,255})\.html$ /scripts/createPage.php?name=$1 [QSA,L]
/cache/a/a.html => a 
/cache/a/b/ab.html => ab
/cache/a/b/c/abc.html => abc
/cache/a/b/c/abcd.html => abcd
Now if I could just figure out how to condense the first three RewriteRules into one.
JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4604277 posted 3:16 am on Aug 24, 2013 (gmt 0)

Interesting you said something about switching to PHP.

I have a rather large site I work on and use PHP for almost all redirecting/rewriting except for canonicalization and some other "little things".

You have to have both PHP files in the example below prepared to serve a 404 for bad requests, but here's the basic "gist" of what I do in a situation similar to yours...

.htaccess

RewriteEngine on
RewriteRule ^name/ /get-the-file.php [L]

###

get-the-file.php

if(is_file('/path-to'.$_SERVER[REQUEST_URI])) {
echo file_get_contents('/path-to'.$_SERVER[REQUEST_URI]);
exit;
// good + cached requests happen very quickly
}
else if(is_file('/path-to'.strtolower($_SERVER[REQUEST_URI]))) {
header('Location: http://www.example.com'.strtolower($_SERVER[REQUEST_URI]),TRUE,301);
exit;
// correct capitalization errors
}
else if(preg_match('#^[pattern]$#',$_SERVER[REQUEST_URI],$page_var)){
$show_save=file_get_contents('/path-to/php-file-to-generate-the-page.php?var1='.$page_var[1].'&var2='.$page_var[2].'&var3='.$page_var[3]);
echo $show_save;
file_put_contents('/the-save-to-path',$show_save);
exit;
// rinse and repeat as necessary
}
else {
header('HTTP/1.1 404 Not Found');
include_once('/path-to/custom-404-error.php');
}

[edited by: JD_Toims at 3:36 am (utc) on Aug 24, 2013]

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4604277 posted 3:31 am on Aug 24, 2013 (gmt 0)

One more note:

You could likely use a single preg if you set you file generation script to disregard empty or non-matching variables by making them optional in the pattern and only using what's available and matches in the file generation script.

ocon

5+ Year Member



 
Msg#: 4604277 posted 3:42 am on Aug 24, 2013 (gmt 0)

JD, wouldn't always invoking PHP create an unnecessary step?

.htaccess => run PHP script to check if page exists => serve cached page, OR
.htaccess => run PHP script to check if page exists => run PHP script to create page

As opposed to:

.htaccess => serve cached page, OR
.htaccess => page not found => run PHP script to create page

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4604277 posted 3:51 am on Aug 24, 2013 (gmt 0)

To the best of my knowledge, using -f in the .htaccess can invoke hundreds of extra lines of code and forces the server to "walk the file path + scan the disk" to see if a file exists, so it's likely slower than just rewriting to a PHP file that does a "single check" for the file, even though it may seem like an extra step.

File-exists and directory-exists checks are inefficient. In most cases, they will invoke several hundreds of lines of (machine) code at the OS file-handler level, and in some cases --especially on heavily-shared virtual servers-- the filesystem and ACLs may be only partially-cached due to excessive swapping. In that case, the OS is going to have to actually go read the physical disk, and compared to *any* code execution, that is going to be *very* slow.

jdMorgan - 3rd post - This thread: [webmasterworld.com...]

The site I did what I outlined on was having speed issues after a hosting change and we couldn't figure out why, but it slowed to a crawl, so I switched everything to a system similar to my preceding posts and it was back to loading so fast if you weren't paying attention you couldn't tell the page changed sometimes, especially when the page was cached.

Note: I also built in an "auto update" with a filemtime() check and if it had been more than a week since the file was updated it was regenerated. Yes, this might make generation/serving time slower for one user, but only one user would have any type of slowdown from it, so I'm fine with it -- My dynamic XML sitemap files are coded in a very similar way to keep them up-to-date -- If it's been over 7 days since they were modified, they're regenerated automatically.

[edited by: JD_Toims at 3:55 am (utc) on Aug 24, 2013]

ocon

5+ Year Member



 
Msg#: 4604277 posted 3:55 am on Aug 24, 2013 (gmt 0)

I know this is pseudocode, but is it possible to create something like:

RewriteRule ^name/((.)(.?)(.?).*)/$ /cache/$2(/$3)(/$4)/$1.html [QSA,L]

Where (/$3)(/$4) only populate on the right side if it matched something on the left? I know the third and fourth parenthetical sets on the left are optional since they are flagged with the question mark and therefore would be null on the right if they don't match anything, but I only want them to insert the subdirectory slash on the right if they do match something.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 4:00 am on Aug 24, 2013 (gmt 0)

Edit: I was typing while the last few posts were coming in, so scroll back about half an hour.

JD, you may not have noticed that there are two parallel threads on the same question. The other one's next door in php, but the underlying questions are still about constructing Regular Expressions.

RewriteRule ^name/((.)(.?)(.?).*)/$ /cache/$2/$3/$4/$1.html

You can't do this. If the filename is less than four characters, captures 2-4 may be empty and you'll be rewriting to, at worst,
/cache////filename.html

A bit further up this thread I suggested leaving hyphens out of the directory names by setting up everything in the form
((\w)-?(\w)-?(\w)blahblah) >> /$2/$3/$4/$1

This rule comes before the ones made for supershort filenames
\w-?\w-
and
\w

But if you can dump the whole thing on your php, you don't need any of this in htaccess. All that's left is

^name/blahblah /something.php?blahblah
and
^cache/blahblah /somethingelse.php?blahblah

The object is to collect all the Regular Expressions into one place. Depending on how you code, you can either have two different php files that both call the same subfile (which contains the RegExes), or start out in a single php file which then branches out in different directions for creating a new file or retrieving an existing one.

Edit as we overlapped:
Where (/$3)(/$4) only populate on the right side if it matched something on the left?

Not in htaccess, but it can easily be done in php. I think JD lays it out in the post right before mine.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4604277 posted 4:12 am on Aug 24, 2013 (gmt 0)

.htaccess => run PHP script to check if page exists => serve cached page, OR
.htaccess => run PHP script to check if page exists => run PHP script to create page

Just to make sure it's clear:
This is not exactly what I outlined does.

What I outlined [minus the capitalization correction] is:

.haccess => Immediate rewrite to PHP [we know the PHP file exists] => Single PHP check for the requested file [no extra code, file path walk or disk scan invoked] => Serves the file's contents or *includes* a single PHP file [we know it exists -- again, no extra code, file path walk or disk scan invoked] and the *included* file creates the page, then serves it's contents, then saves the page [saving comes after serving it to the visitor] or it serves a 404 error.

Added Note: One thing to keep in mind on this type of system is the more a file is used the closer it's "pushed to the front of the server cache", even on shared hosting, so I personally want a single file used as much as I can use it, because if it's used enough it'll be cached in RAM rather than loaded from the hard-disk and a file cached in RAM makes it *screaming fast* for the server to use.

ocon

5+ Year Member



 
Msg#: 4604277 posted 6:38 am on Aug 24, 2013 (gmt 0)

Wow, what a bitter pill to swallow! In my effort to speed up the site I've been trying to reduce my PHP usage and switch to using mod_rewrite and cached static files but had to align with an even less efficient methods? Here's my realignment attempt.

.htaccess:
RewriteEngine On
RewriteBase /

# Prevent direct access to the cache by the browser
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /cache/
RewriteRule ^.* - [F]

# Previously omitted line now included to avoid any conflicts
RewriteRule ^name/_(\d+)/$ /scripts/assignName.php?id=$1 [QSA,L]

# Name verified before sending to file loader
RewriteCond %{REQUEST_URI} !(^/name/-|--|-/$)
RewriteRule ^name/([a-z0-9-]{1,255})/ /scripts/loadFile.php?name=$1 [QSA,L]

loadFile:
$path = $_GET['path'];
$file = '/cache/'.join('/', str_split(substr($path, 0, 3))).'/'.$path.'.html';

if(is_file($file)){
echo file_get_contents($file);
die();}
else{ /*Create, save to cache, and display newly generated page */ }

So, for me, this is a big shakeup on how I was approaching things before, please let me know if I'm on a better track now.

I know one of my biggest concerns with this new approach is not knowing how to deliver pre-gzipped files to supported users.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4604277 posted 8:43 am on Aug 24, 2013 (gmt 0)

I know one of my biggest concerns with this new approach is not knowing how to deliver pre-gzipped files to supported users.

I personally stick with simple and just use mod_deflate:

AddOutputFilterByType DEFLATE text/html text/plain text/xml

[httpd.apache.org...]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4604277 posted 9:14 am on Aug 24, 2013 (gmt 0)

# Prevent direct access to the cache by the browser
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /cache/
RewriteRule ^.* - [F]

This seems a bit extreme. Why not simply redirect, exactly the same as you'd do if someone asked for /index.html by name? Or anything else in the rewrite:redirect package.

ocon

5+ Year Member



 
Msg#: 4604277 posted 10:25 am on Aug 24, 2013 (gmt 0)

Lucy, I wanted to take the position that only someone familiar with the backend would even know about the existence of these folders, that there should never be a reason they are directly accessed by the public, and that any attempt to directly access them should be viewed suspiciously.

JD, I'm not familiar with mod_deflate, but it seems like it generates the compressed file on-the-fly. It just seems wrong to compress a known static file large in size, such as my homepage, every single time a browser supports gzipped content requests the file. I worked hard to engineer that page to where I could serve a static shell and load dynamic content asynchronously. (I was fine with that because the nature of the site required JavaScript.) For something like the homepage where I know there is a gzipped version, is there a something more direct than my heavy-handed approach?

<FilesMatch "\.html\.gz$">
ForceType text/html
</FilesMatch>
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [L]

And if there is a direct approach is it something I could also incorporate into loadFile.php, maybe something like sending a redirect header to open the file instead of using file_get_contents($file)?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved