Forum Moderators: coopster

Message Too Old, No Replies

"Improving" extract()

         

csdude55

5:04 am on Sep 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I often have a ton of params that I would like to extract from the query string and convert to a regular variable, mainly so that I can default to whatever was entered but overwrite it later if necessary. But, of course, extract() leads to all kinds of security risks.

I wrote this up as an alternative:

// Usage example:
// parse($_GET, $varOne, $varTwo:is_numeric, $varThree);
function parse($arr=[], ...$vars) {
foreach ($vars as $key) {

// $varTwo:is_numeric will make sure that the value is numeric
// can use any function here, is_numeric() is just an example
list($str, $func) = explode(':', $key, 2);
global $$str;

if (!$func || $func($arr[$str]))
$$str = $arr[$str] ? false;
}
}


This lets me define which variables to extract from $_GET instead of grabbing everything, and it lets me require a function like "is_numeric" to ensure that the data is as expected.

But it's about 10 times slower than extract()! On 10,000 iterations my function finishes in 0.0199s, compared to extract() that ends in 0.002s.

Before I go down the rabbit hole, is there a built-in PHP function that does what I want and is safer than extract()?

If not, any suggestions on improvements for my script?

dstiles

8:03 am on Sep 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a function that can request a single known variable from GET or POST. Each variable is sanitized on read. I have a list of known variables for a given application and ask for each as required. If a variable is not known it will not be asked for.

I am not concerned with speed but with security so my solution may not suit you.

csdude55

6:05 pm on Sep 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not AS worried about security, since I block non-US IPs at the firewall and use mod_security. I've only had one issue with security in the last several years; it didn't accomplish anything, but the stream of attacks crashed the server.

Using filter_var() to filter and sanitize isn't a bad idea, though. Instead of using : to allow a function, I could use it to allow a constant filter instead:

Filter options:
[php.net...]

// parse($_GET, $varOne:FILTER_VALIDATE_INT, $varTwo:FILTER_SANITIZE_NUMBER_INT, $varThree);
function parse($arr=[], ...$vars) {
foreach ($vars as $key) {
list($key, $filter) = explode(':', $key, 2);
$filter ?= 'FILTER_DEFAULT';

global $$key;
$$key = filter_var($arr[$key], constant($filter)) ? false;
}
}


That was considerably slower than the original (0.0369, compared to 0.0199), but arguably better.

This variation only applies a filter when one is given, so it was a bit faster at 0.028. But of course it would probably faster or slower depending on the how many filters are specified:

function parse($arr=[], ...$vars) {
foreach ($vars as $key) {
list($key, $filter) = explode(':', $key, 2);

global $$key;
$$key = ($filter) ?
filter_var($arr[$key], constant($filter)) ? false :
$arr[$key] ? false;
}
}


Finally, I tried this version with filter_var_array(), but it was the slowest at 0.0389:

function parse($arr=[], ...$vars) {
foreach ($vars as $key) {
list($key, $filter) = explode(':', $key, 2);
$filter ?= 'FILTER_DEFAULT';

$data[$key] = $arr[$key];
$args[$key] = constant($filter);
}

$placeholder = filter_var_array($data, $args);

foreach ($placeholder as $key => $val) {
global $$key;
$$key = $val;
}
}


So of these, the original function is the fastest, but the second function in this post is probably the best one.

csdude55

7:10 pm on Sep 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Those are supposed to be $filter ??= and ?? false, sorry. It looks like [ code ] is stripping the ?? to singles :-/

phranque

10:10 pm on Sep 29, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It looks like [ code ] is stripping the ? to singles

it's not the style code that is stripping the "consecutive punctuation" - that happens everywhere.

i have reported this particular issue to the proper authorities...

csdude55

7:44 am on Sep 30, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's not a tragedy, just something that'll confuse future readers.

I could understand limiting to 4 repeat characters or something, but limiting to 1? What if I want to ask something emphatically?? LOL

dstiles

10:08 am on Sep 30, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> I block non-US IPs at the firewall

It's the other way around here. A major source of baddies in UK is USA. :( UK access is usually ok. I've firewalled most of russia and china and use geoip to block a few others.

Dimitri

11:18 pm on Sep 30, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



But it's about 10 times slower than extract()! On 10,000 iterations my function finishes in 0.0199s, compared to extract() that ends in 0.002s.

Optimizing code, is good, but here you are chasing micro seconds, and your script is not going to make 10k iterations. This will make no difference to the final user.

If it's not yet the case, update to the latest version of PHP, and you can save these micro seconds.

You can certainly use your time, at more useful things.

That being said:

global $$key;

This is risky to proceed like that. If your code becomes complex, or if you are reusing it in different scripts, you may end overwriting other global variables with the same name. So this is not a good practice, and this is not helping the readability of the code.

I would do something like that (it's too late, I didn't test this code)

const PARAM_TYPE_INT=0;
function parse($parameters_data)
{
foreach($parameters_data as $key=>$type)
{
$parameters=[];
switch($type)
{
case PARAM_TYPE_INT:
if(is_numeric($_GET[$key]??false))
{
$parameters[$key]=(int)$_GET[$key];
}
break;
// other cases...
}
}
return $parameters;
}

$parameters=parse(["test"=>PARAM_TYPE_INT]);


or you can avoid using function:


$parameter_test=is_numeric($_GET["test"]??false)?((int)$_GET["test"]):false;
// ...


But as I said, don't waste your time chasing this kind of micro seconds optimizations. Even if you have millions of visitors, this is not going to make a difference. There are certainly lot of more useful things you should focus on.

csdude55

5:49 pm on Oct 1, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's the other way around here. A major source of baddies in UK is USA. :( UK access is usually ok. I've firewalled most of russia and china and use geoip to block a few others.

@dstiles, I suspect that you're seeing faked IPs... a whole 'nother headache for me. Most baddies come from countries with no alliance to your country because they know that they can safely hack without any type of legal response.

I used to block anyone that appeared to be using a fake IP, but now the popularity of VPNs have ruined that. I swear, just about every security precaution I've ever done has been overruled with so-called "safety" software!

But as I said, don't waste your time chasing this kind of micro seconds optimizations. Even if you have millions of visitors, this is not going to make a difference. There are certainly lot of more useful things you should focus on.

@Dimitri, I've been micro-optimizing because I'm actually preparing for potentially millions of visitors. I'm planning a marketing campaign to start in January that should reach about 11 million people, so if I have a 10% response rate then micros matter.

I've discovered that the more time I can shave off of a page load, the more pages per session I get. It's like, everyone has a subconscious amount of time that they intend to spend on my site, and it's up to me whether they look at 5 pages or 10 pages in that time. I've done a little of little fixes here and there and increased the average page load time significantly, so if I can make a simple change to shave 20ms then I'll do it. It MIGHT not help... but it can't hurt.

In this case, I was hoping that there might be a built-in PHP function that I didn't know about, though, instead of reinventing the wheel.

robzilla

6:14 pm on Oct 1, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



so if I can make a simple change to shave 20ms then I'll do it

And you should, of course. 20ms is significant enough. A 0.0017ms difference per request is not. Not even with a million users. It's practically immeasurable. Can it hurt? It can if there are other performance bottlenecks that go unnoticed. Make sure you profile your code to find the actual bottlenecks, and don't forget that with a million users it's very possible you'll have bottlenecks unrelated to PHP.

Dimitri

10:34 pm on Oct 1, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



I perfectly understand your concerns, and I always value developers who are optimizing their code, so my remark is not a critic at all. But keep in mind that, when you run your 10k test, you are executing your calls in sequential order, (one after each other), but in real life, your server will execute several scripts at the same time.

So, this kind of function, will have absolutely no impact on your TTFB, or the time to download and render the page.

To speed up your scripts and pages, you have to review your data structures, optimize the HTML code that your scripts are producing, but also the page layout, media files, etc...

Simple examples :

- for DataBase, be sure to have indexes on keys you are using, load into memory these indexes, if you have enough RAM, you can even preload your whole DB into the OS's file cache!

- WebP images are smaller than their equivalent in JPEG (or PNG).

- Try to output HTML code, as soon as possible, and not when your script is finished. Some template engines require the whole page to be generated before beginning to send them to the client. To send HTML code earlier, you can use flush commands. This might, help your page to start rendering sooner.

- Do not hesitate to abuse caching. For example, let's say that on your pages, you have a list of the latest news, you don't have to fetch and regenerate the HTML code of this list, each time a page is accessed. Your code can generate the list, and then save the HTML code into a file, for sub sequences requests, your script can retrieve this file, instead of re-generated the HTML. You can check the last generation time, if you want to refresh the HTML code every 10 minutes for example. You can even generate fragments of your pages, in the background, using cron scripts.

- Again, use the latest PHP version, have a server with the maximum possible RAM, SSD drives if possible, and good number of CPU cores. Among the 3, from my experience, RAM comes first.

etc...

But again, from my experience, the most important is the readability of your code. This is very important to keep your code clean, and easily understandable.

csdude55

3:51 am on Oct 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I totally get what you're saying, Dmitri. I'm really just making sure to cross my i's and dot my t's before the big campaign, ya know? Last thing I want to do is have a big response that crashes the server.

But I've taken some of your advice to heart and modified the function to this:

// $arr is the array we're parsing, usually $_GET
// $vars is an associative array, $variable => $filter
// $more is a list of variables with no filter, pushed to a numeric array
function parse($arr, $vars=[], ...$more) {
if (is_string($vars))
$vars = [$vars => false];

if (!empty($more))
$vars += $more;

foreach ($vars as $key => $filter) {
// if there's no $filter then it becomes [0-9]+ => $variable, have
// to modify it to make $variable => false
if ($filter && is_numeric($key)) {
$key = $filter;
$filter = false;
}

// the goal here is to overwrite any pre-existing variables with the
// same name, so make sure you use unique variable names
global $$key;

// $filter will actually print as a number, and if it's not valid then it
// won't be a number. This is MUCH faster to process than validating
// with filter_list()
$$key = is_numeric($filter) ?
filter_var($arr[$key], $filter) :
$arr[$key];
}
}


Instead of defining a function, I'm using PHP filters:

[php.net...]

Then instead of using the : delimiter, I'm using Dimitri's earlier suggestion and sending the filter as the value in an associative array.

I'm trying to cover my bases, so these should all work fine:

// preferred, with all of the filters in an array and then those without filters next
parse($_GET, ['one' => FILTER_VALIDATE_INT, 'three' => FILTER_SANITIZE_NUMBER_INT, 'four' => FILTER_VALIDATE_INT], 'two', 'five']);

// "two" without a filter is inside of the array, but it's OK
parse($_GET, ['one' => FILTER_VALIDATE_INT, 'two', 'three' => FILTER_SANITIZE_NUMBER_INT, 'four' => FILTER_VALIDATE_INT], 'five');

// filters are totally optional
parse('one', 'two', 'three', 'four', 'five');

// same as above but unnecessarily in an array, but it's OK
parse(['one', 'two', 'three', 'four', 'five']);


The speed over 10,000 iterations is is only slightly slower than my original, 0.027s vs 0.0199s.

csdude55

5:07 am on Oct 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But for future readers, I just discovered that an invalid filter in PHP 8.x will throw a fatal error :-/ In PHP 7.x I can just check that it's numeric, but not in 8.

This works for PHP 8.x, but it's three times slower :-O

function parse($arr, $vars=[], ...$more) {
if (is_string($vars))
$vars = [$vars => false];

if (!empty($more))
$vars += $more;

// returns an associative array of all valid filters
$filter_list = array_flip(array_map('filter_id', array_combine(filter_list(), filter_list())));

foreach ($vars as $key => $filter) {
// if there's no $filter then it becomes [0-9]+ => $variable, modify to $variable => false
if ($filter && is_numeric($key)) {
$key = $filter;
$filter = false;
}

global $$key;

$$key = ($filter && isset($filter_list[$filter])) ?
filter_var($arr[$key], $filter) :
$arr[$key];
}
}

Dimitri

9:54 am on Oct 4, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Just in case it can help, or bring ideas , if you want to test if a variable contains an integer, you can do this :
$b_is_int=($var==(int)$var);

if you wan to convert a double to an integer (if $var is a string, it will return zero):
$var_int=(int)$var;