Forum Moderators: coopster

Message Too Old, No Replies

Alternative to using two preg replace()

         

csdude55

5:05 am on Oct 25, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My goal here is to ONLY have letters and numbers, with words delimited with a -. This is used for make search-engine-friendly links that look like:

example.com/this-is-a-test/123

In my case, the "this-is-a-test" part doesn't actually do anything, I'm just sticking the title of the page into the URL for search engines. I'd probably be better off if I removed stop words like "a" and "the", but maybe later.

This is the code I'm using:

$text = 'this is a ... "TEST" ?!';

$text = preg_replace("/\s+/", '-',
preg_replace("/&[\w#]{2,5};|^[^a-z0-9]+|[^a-z0-9\s]|[^a-z0-9]+$/", '',
strtolower($text))
);

// same as above, but with comments to explain each step
$text =
// finally, the only thing left should be letters, numbers, and \s
// convert \s to -
preg_replace("/\s+/", '-',

// line breaks added for readability
preg_replace("/
&[\w#]{2,5}; | // remove &...;
^[^a-z0-9]+ | // remove opening characters that aren't [a-z0-9]
[^a-z0-9\s] | // remove anything that's not a letter, number, or \s
[^a-z0-9]+$ // remove trailing characters that aren't [a-z0-9]
/", '',

// lowercase first, so no need to use /i above
strtolower($text))
);


Do you see any way to improve it? My concern is that this function is called 20-1000 times on a page, and using two preg_replace() and an strtolower() in each call is a lot.

explorador

2:07 am on May 15, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



csdude55: My goal here is to ONLY have letters and numbers, with words delimited with a -. This is used for make search-engine-friendly links that look like:

example.com/this-is-a-test/123
Did you solve this? I'm surprised nobody has replied, this is a bit old.
Do you see any way to improve it?

Big yes.
Your explanation says you are using 2 preg_replace, ok, I see them, but it's not just the number (how many times you use the preg_replace), it's also about the complexity of what you request, and the complexity of the string you are sending for conversion.

Recently faced a similar scenario, but also including accents (to convert á, é, etc... spanish, not just english). Sorry, I can't just "share" my code solution for pure technical reasons, I'll try to explain in order to contribute to your challenge, once you read the explanation you will understand why I don't think this is a "copy paste" solution, not saying you would do that directly.

1. I tried diff approaches, and at this moment I'm using 3 preg_replace (so, it's worse... right? no). The rest is str_replace and a custom strtolower function.
2. I tested over and over multiple calls (combos) in order to measure the impact of the script in execution time and memory, this was the key for me, not just "optimizing" the preg_replace. This allowed me to try diff approaches aiming for efficiency.
3. Tried diff versions of PHP, yes, that's extra work, but you'll never know what version will be installed on your server unless you can do something about it, interestingly enough, diff versions of PHP used diff amounts of resources for the same code.
4. As strange as it may seem, some stuff worked better detecting chains of text via loop/cycles instead of preg_replace, yep, not kidding.

I don't know if this extra info helps: watch out for utf8. I can't afford to explain the weird things I found grabbing text from a known file and see some strings were treated differently, I was going nuts using strtolower / mb_strtolower / utfencode/decode, and even trying to detect ut8... I found bugs, searched for hours trying diff approaches until I found out yes, there are bugs with utf8 conversions, and php instead of fixing them, decided to deprecate those functions. In my case this was relevant due to the tildes, and because somehow php as explained would treat strings differently.

I don't know how you are going to use your script, but if you are converting the same strings, consider storing them on some cache file or database, this way you will only convert them once, and then it will just a matter of searching.

Good luck.

lucy24

5:28 am on May 15, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm surprised nobody has replied, this is a bit old.
Sometimes posts mysteriously become invisible--and then, days or weeks or months later, just as mysteriously reappear. I don't think I remember this one either; the preg replace() would have jumped out at me beccause I only speak about three words of php, and that’s one of them.

The question reminds me vaguely of one page cluster where I've got a list of titles, like “Little House!”, paired with a group of image files, like “littlehouse.jpg”, so to get from point A to point B--flatten case and get rid of any non-alphanumerics--I have a line that says
$filename = strtolower(preg_replace("/\W/","",$inner[0]))