Forum Moderators: phranque

Message Too Old, No Replies

SetEnv performance

         

csdude55

9:27 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A few weeks ago, I discussed setting constants in PHP instead of regular variables that are set on every page load, and whether they were beneficial. But after several speed tests, I found that regular variables were faster to process, so other than preventing them from being changed by accident there was no real advantage to using a constant.

But today I discovered SetEnv! I don't have mod_env installed, but in theory I should be able to set, say:

SetEnv DB_USER [MySQL username]
SetEnv DB_PASS [MySQL password]
SetEnv BASE /home/example
SetEnv DATA /home/example/data
SetEnv IMAGES https://images.example.com

and then access them in PHP via $_ENV['BASE'], etc.

What do you think, would these be better / faster to process than a regular PHP variable?

w3dk

11:18 pm on Apr 5, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



You probably don't want to be storing passwords in system-wide environment variables. (?) Env vars aren't just global to your script, but potentially global to every script running on the server.

"I don't have mod_env installed" - mod_env is installed by default, so if you don't have it installed then presumably you've explicitly removed it?

But after several speed tests, I found that regular variables were faster to process, so other than preventing them from being changed by accident there was no real advantage to using a constant.


"Performance" should not be a factor in determining whether you use a "constant" or a "variable"!?

lucy24

11:53 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



would these be better / faster to process
Well, “better” and “faster” may be entirely different questions. Unless the time factor involved is absolutely vast--like, say, three whole seconds for Method A as opposed to three microseconds for Method B--time should be pretty far down the list. Does time correlate directly with CPU load? Or are you choosing between occupying the entire server for one millisecond vs. taking a tenth of its computing capacity for ten milliseconds?

I’d weigh this the same way you choose between, for example, slightly different ways of achieving the same end in mod_rewrite, involving various conditions in various orders, all of which get you there in the end. Server processing time is generally less significant than how it works for you, the human editor.

csdude55

12:58 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't "really" expect much of a performance boost or anything, this is just one of those things that I came across and wondered if it would add any value. I have a variables.php script that I include on every page of the site, and I set a handful of static variables in that script that never ever change... so I figure that if I can minimize the size of variables.php then it will open marginally faster, which would the turn in to a few microseconds shaved off of the load time.

I'll eventually move everything in the htaccess to the httpd.conf. Having never really done much with that... if I set variables in the Apache configuration then would they just be set once (at compile time) and then never have to be set again? Or would they still load at each page load?

Env vars aren't just global to your script, but potentially global to every script running on the server.

Good point, @w3dk! In this case my server only runs my sites so it's not a big deal, but that's a good thing to know.

tangor

4:18 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Beware the unintended consequences. If you create linkages in your site that rely on single variables if anything should happen to one or more your site becomes toast ... or happy hunting grounds for bad actors.

Or not. Careful coding is generally a good thing. :)

Just how much "time" is expected to be saved?

csdude55

5:56 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just how much "time" is expected to be saved?

It's tough to say... at peak time it's not unusual to have 1,000 users on the site at one time. I've done a ton of research and found that load time is directly proportionate to pages per session, so microseconds shaved during production could turn in to 1 second during peak time. If that results in an average of 1 more page per session, it could turn in to an extra $1,000 from Adsense.

(or maybe $10, considering the value of ads these days)

I've shaved microseconds here and there, and (excluding Adsense load time) I have my homepage down to around 3 seconds. I'd like to get it to 2 seconds, so if I shave 50ms here, 25ms there... it might add up.

tangor

7:02 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Time, of course, is a prime consideration.

One reason I dumped adsense a century or so ago was the delay introduced by third party.

HOWEVER that does not quite apply as adsense is necessary for your presentation/income so keep that in mind.

The lag is always going to be third party load, so my question is, once again, how big a difference if you shave off 25ms on YOUR side if the third party is going to take up 3 seconds no matter what you do on your end?

phranque

7:59 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You probably don't want to be storing passwords in system-wide environment variables. (?) Env vars aren't just global to your script, but potentially global to every script running on the server.

the environment variables set by mod_env are internal environment variables, available only to the apache process that created them.
they are not the same as system environment variables that can be passed to the apache process.

Environment Variables in Apache [httpd.apache.org]

csdude55

6:33 am on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"I don't have mod_env installed" - mod_env is installed by default, so if you don't have it installed then presumably you've explicitly removed it?

This might be interesting... this gave me an error of Invalid command 'SetEnv', perhaps misspelled or defined by a module not included in the server configuration:

SetEnv BASE /home/example
SetEnv DATA /home/example/data
SetEnv IMAGES https://images.example.com

But THIS worked:

RewriteRule ^ - [E=BASE:/home/example]
RewriteRule ^ - [E=DATA:%{ENV:BASE}/data]
RewriteRule ^ - [E=IMAGES:https://images.example.com]

It wouldn't let me combine the flags like [E=foo:bar, E=blah:blah], though, I had to have a new rule for each of them :-(

It also wouldn't let me do this:

RewriteRule ^ - [E=PUNC:'"`~!@#$%^&*()-_=+[]{}\|;:/?.><1234567890,]

It said unknown flag '', and escaping it with \" didn't change anything. It's not a crucial variable or anything, it's just interesting... is there a magic way to escape " here?

w3dk

9:26 am on Apr 10, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month




It wouldn't let me combine the flags like [E=foo:bar, E=blah:blah], though,


Because you have an erroneous space between the flags (which effectively splits the argument in two)... it should be [E=foo:bar,E=blah:blah]



RewriteRule ^ - [E=PUNC:'"`~!@#$%^&*()-_=+[]{}\|;:/?.><1234567890,]


It said unknown flag '', and escaping it with \" didn't change anything.


(That's two single quotes, not a double quote.) The problem here is the trailing comma - since the comma is the flag delimiter, you effectively have an "empty flag" at the end.

(You also need a double backslash to represent the literal backslash in the middle of the value - since the backslash is the escape character.)

However, I can't see a way to escape that trailing comma (to make it part of the env var value)? A conventional backslash escape does not seem to work here (or using 2 commas). Surrounding the value or name:value pair in double quotes does not help either - the trailing " simply becomes part of the value.

You can also set env vars with SetEnvIf (mod_setenvif) - which is more useful than SetEnv - and is arguably preferable than using mod_rewrite, when that is all you are doing. (However, if mod_env isn't available on your system then maybe mod_setenvif isn't either?)

csdude55

8:17 pm on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ahh, great catch on the spaces! I hate that I can't space things out better, hard-to-read code is a peeve of mine... I really want to do something like this (with the E's lined up vertically):

RewriteRule ^ -
[E = BASE:/home/example,
E = DATA:%{ENV:BASE}/data,
E = IMAGES:https://images.example.com]


The problem here is the trailing comma - since the comma is the flag delimiter, you effectively have an "empty flag" at the end.

Yup, you're right again! I've tried everything I can think of, too, but no go. I even tried variations of encoding it, like %2C, %{2C}, {2C}, etc. And I can't find where others have escaped a comma on Google, so I'm about to give up on it.

It doesn't help that the tester site I was using doesn't throw an error with the comma, so I'm having to test it on the live server :-(

I checked and I can definitely install mod_env, I don't know why it wasn't installed by default unless I turned it off when I built it last time. But mod_setenvif isn't in the list of options, so I don't know what's up with that. I suspect that using SetEnv would bypass the problem with the comma, I was just hesitant because I suspect that another module would slow down each pageload by a few microseconds... which could be why I turned off mod_env in the first place.

csdude55

11:43 pm on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Continuing down this path...

Can you define a pattern in [E] to be used later? This doesn't work, but I hope it explains what I'm getting at:

RewriteRule ^ - [E=PATTERN:foo|bar]

# blah/foo => blah/?var=foo
# blah/bar => blah/?var=bar
RewriteRule ^blah/(%{ENV:PATTERN})(?:/|$) blah/?var=$1

lucy24

1:36 am on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^blah/(%{ENV:PATTERN})(?:/|$) blah/?var=$1
Can you do that in the body of the rule? I thought it had to go in a RewriteCond.

w3dk

11:12 am on Apr 11, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month




RewriteRule ^blah/(%{ENV:PATTERN})(?:/|$) blah/?var=$1


The RewriteRule pattern is a regex (PCRE), you can't use variable expansion of the form %{VARIABLE} here, just as you can't use %{REQUEST_URI} either.

It doesn't help that the tester site I was using doesn't throw an error with the comma


? What do you mean by "the tester site"? If it's not throwing an error, what is it doing? (Setting the variable?!)

I suspect that another module would slow down each pageload by a few microseconds...


Although the .htaccess file itself is processed (and these env vars are therefore set) on every single HTTP request (CSS, JS, images, other scripts, etc. etc. as well as the page itself) - multiple times per "pageload". So these "env vars" are being unnecessarily (and possibly undesirably) set on and made available to "everything". (I had mentioned above, "global to every script running on the server", which isn't really what I meant and not strictly correct - as @phranque correctly point out above.)

csdude55

8:06 pm on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The RewriteRule pattern is a regex (PCRE), you can't use variable expansion of the form %{VARIABLE} here, just as you can't use %{REQUEST_URI} either.

Ohhhh, I see. I was of the impression that REQUEST_URI was implied so that HTTP_HOST, etc, wouldn't work, but that QUERY_STRING might... I've never needed to test that one, though, it's just how I thought it was.

Based on that, this alternative seems to work:

RewriteRule ^ - [E=PATTERN:foo|bar]

# blah/foo => blah/?var=foo
# blah/bar => blah/?var=bar
RewriteCond ^/blah/(%{ENV:PATTERN})(?:/|$)
RewriteRule ^blah/(.+)(?:/|$) blah/?var=$1


In my real script I have a list of 10 acceptable matches for PATTERN, and I have 5 RewriteRules that all test for those matches. After typing it all out, changing it to the above would actually be 14 bytes larger, so it's not the big space-saver that I'd hoped... but it would still be easier to maintain if I add more topics in the future. So unless processing an [E] is slower to process I might still go this direction.

What do you mean by "the tester site"? If it's not throwing an error, what is it doing? (Setting the variable?!)

Oh sorry, I'd mentioned it before in other threads and forgot that I hadn't posted it here:

[htaccess.madewithlove.be?share=e730cc07-6239-5cab-967f-c258552c88b2...]

I see now that, even though it doesn't throw an error, it dies after <. I removed the <, and it still dies at the ,. So no error, it just doesn't include anything after them. So probably a flaw on their site.

Although the .htaccess file itself is processed (and these env vars are therefore set) on every single HTTP request (CSS, JS, images, other scripts, etc. etc. as well as the page itself) - multiple times per "pageload". So these "env vars" are being unnecessarily (and possibly undesirably) set on and made available to "everything". (I had mentioned above, "global to every script running on the server", which isn't really what I meant and not strictly correct - as @phranque correctly point out above.)

Ohhhh, now I understand. You guys did say that before but it didn't quite click.

Hmm, that's something to consider. In my actual script I have this line to stop the htaccess from processing scripts where it shouldn't, which includes CSS and JS:

RewriteRule ^(?:(?:includes|images|cgi-bin|a_few_other_things)/)|404\.php - [L]

But I'm guessing that it still downloads the entire file for each request, even if they're not processed. That's a concern, I think... unless the .htaccess is cached or something, so it wouldn't actually be downloaded each time?

[edited by: phranque at 8:39 pm (utc) on Apr 11, 2020]
[edit reason] disable graphic smile faces [/edit]

lucy24

8:21 pm on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond ^/blah/(%{ENV:PATTERN})(?:/|$)
Is something missing?

unless the .htaccess is cached or something, so it wouldn't actually be downloaded each time?
Nope. htaccess is read--and its Regular Expressions are compiled--from top to bottom on every single request. That's what makes htaccess so useful for testing, even if most of the work is done in config: you can change things on the fly. In the other thread I mentioned my test site. I open the htaccess--yup, the “live” one, just the way you would never ever do for a real site--make changes, save, see what happens in the browser, repeat ad lib.

But I'm guessing that it still downloads the entire file for each request, even if they're not processed.
Not sure what this means. Can you re-word? * The unintended smiley didn't help either ;) which is another reason for using [ code ] markup.


* Punch line to Garrison Keillor story: “Uhm ... OK ... What are the ages of your children?

csdude55

4:39 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond ^/blah/(%{ENV:PATTERN})(?:/|$)
Is something missing?

Haha, mea culpa... I didn't test or anything, I was just typing. I guess this would have been better:

RewriteCond %{REQUEST_URI} ^/blah/(%{ENV:PATTERN})/?$


Not sure what this means. Can you re-word?

Sorry about that. Let me see if I can be more clear...

My .htaccess is currently about 6kb in size, with 182 lines (including comments that I'll eventually remove). I have a rule set up to stop running if the request belongs to certain directories or extensions on line 42, with roughly 1.8kb of text before that line:

RewriteRule ^(?:(?:includes|images|cgi-bin)/)|(?:404\.php|.js|.css)$ - [L]

So if the request is a CSS file, for example, it'll end on line 42 without processing the remaining 140 lines... but it still had to download the full 6kb .htaccess file.

Assuming that someone loads a page with 5 images, 2 js, and 1 css file, am I understanding correctly that there would be an additional 54kb of data downloaded (the 6kb .htaccess downloaded 9 times)?

Or is it going to download the .htaccess once, and then just run it 9 times?

If it's the first, then obviously a smaller file would always be better, and it defeats my goal for using [E].

But if it's the second, then it might still process faster.

lucy24

5:23 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but it still had to download the full 6kb .htaccess file
Say what now? Nobody “downloads” the .htaccess file, any more than they “download” the config file. The server reads it.

In fact, people couldn’t download .htaccess if they wanted to; that’s why your generic config file (the one that came with the server) has a line saying something like
<FilesMatch ^\.ht>
Require all denied
</FilesMatch>
(and this, in turn, is why it's an abominably bad idea to put all your access control rules inside a <Files *> envelope, as I’ve sometimes seen in tutorials).

csdude55

6:33 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe "download" is the wrong word here... I want to make sure we're saying the same thing here.

Let's say that I have a PHP script with all of my variables in it, called variables.php. It's 20kb.

Then I have index.php that uses
require_once 'variables.php';
at the top of the page. index.php is 15kb.

So when the user opens index.php, it's my understanding that the download size is 35kb (20kb for variables.php + 15kb for index.php).

If .htaccess is 6kb, would the browser be opening 35kb, or 41kb (the 35kb for the PHP scripts + 6kb for .htaccess)? Or worse, 35kb for the PHP scripts, 6kb for index.php, and another 6kb for variables.php?

Putting it another way, let's say that the user is on dial-up and has a download speed of 10kb per second. Ignoring compression, etc, would this page theoretically open in 3.5 seconds, 4.1 seconds, or other?

** Further, does this change once the data is in the Apache configuration instead of .htaccess?

phranque

9:29 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



So when the user opens index.php, it's my understanding that the download size is 35kb (20kb for variables.php + 15kb for index.php).

the download size is essentially the size of the document sent in the response as generated by index.php.
index.php could be very small yet create a very large html document or vice versa.

If .htaccess is 6kb, would the browser be opening 35kb, or 41kb (the 35kb for the PHP scripts + 6kb for .htaccess)?

the .htaccess file is opened by the server, not the browser.

Putting it another way, let's say that the user is on dial-up and has a download speed of 10kb per second. Ignoring compression, etc, would this page theoretically open in 3.5 seconds, 4.1 seconds, or other?

the time to open a page would include some server processing time and then some download time and some browser rendering time as well as some latency here and there.

Further, does this change once the data is in the Apache configuration instead of .htaccess?

if the server directives are in the server config file the file is read once when the server is (re)started.
if the server directives are in the .htaccess file the file is read essentially once each time the directory containing the .htaccess file is traversed by the server. (so one or more times per request)

lucy24

5:04 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If .htaccess is 6kb, would the browser be opening 35kb, or 41kb (the 35kb for the PHP scripts + 6kb for .htaccess)?
Nobody is opening .htaccess. Except the server.

one or more times per request
Hm, interesting point. Someone hereabouts--it may even have been you (phranque, not csdude)--once explained to me that, when receiving a request, each module in succession does its stuff, potentially including following the htaccess trail all the way up the line. So each separate module--mod_dir, mod_rewrite and so on--has to read through htaccess in order to pick out its own directives. This all takes time ... but on the broader scale of things, definitely not a whole lot of time.

csdude55

7:21 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's the most circular answer you've ever given me, phranque! LOL

But if I understand correctly, my take away is that once I move all of this to the Apache configuration (and maybe somehow turn off .htaccess completely) then it would be considerably more efficient to set [E] variables (when possible) in lieu of setting variables in the variables.php script that's included on every other PHP script of the site. In the configuration, they would be set once when Apache is rebuilt / restarted, and that's it.

In comparison, setting variables in the .htaccess would be considerably less efficient, as the server would read through every .htaccess in the current directory and every child directory for every HTTP request made.

Is that at least in the ballpark of being correct?

lucy24

8:48 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is that at least in the ballpark of being correct?
If you’ve now got a firm grip on the fact that the htaccess file never leaves the server, and nobody but the server can ever set eyes on it ... Yup.

SetEnvIf is based on looking at some aspect of the request and setting a variable accordingly.

SetEnv--which I wish they had called by a different name so it doesn’t sound like an extension of the same mod--simply sets the variable regardless.

Docs say [httpd.apache.org], among other things:
For portability reasons, the names of environment variables may contain only letters, numbers, and the underscore character. In addition, the first character may not be a number. Characters which do not match this restriction will be replaced by an underscore when passed to CGI scripts and SSI pages.
That kind of detail is important to internalize: the name has to be in the form [A-Za-z_]\w+ ... but when they say “letters” do they really mean “plain-ASCII letters”? Seems like it would have to mean that.

Elsewhere [httpd.apache.org] there’s some business about PassEnv which is probably worth closer study.

If you omit the value argument, the variable is set to an empty string.
This, too, is worth internalizing, because it’s different from SetEnvIf where if you don’t specify a value, it defaults to 1.