Welcome to WebmasterWorld Guest from 54.162.172.144

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Regex grab HTML tags keep back reference

     
10:56 am on Mar 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Dec 13, 2009
posts:945
votes: 0


I wrote a script for a list of allowed HTML recently and I'm wondering if anyone with a bit more experience than me could compare the three regex (All work) that I wrote to aquire the non-self closing tags, and tell me which one is likely to incur the least overhead - because this will be going into effect for user comments and could end up looping several hundred times on (some) page loads.

In use:
/<([^ \/>]+)([^>]+)?>(?m)([^(\<\/\\1\>)]+)(?-m)<\/\\1>/is

Others:
/<([^ \/>]+)([^>]+)?>(?m)(.*?)(?-m)<\/\\1>/is
/<([^ \/>]+)([^>]+)?>(?m)(.*?(?!<\/\\1>).*?)(?-m)<\/\\1>/is

Cheers in advance,

Mike
4:36 pm on Mar 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 12, 2005
posts:5966
votes: 0


>> looping several hundred times on (some) page loads.

I think this is more of a problem than how much overhead each of these regex's will have. To be honest if I had to guess I'd say that each of these would perform pretty close to the same if not exactly the same as far as anyone would be able to tell. If you want a more detailed analysis of their performance, that should be done on your box. Record the timestamp (to microseconds) before and after and find the difference. I still don't think it will make that much of a difference, though. You should work more on getting it so that it doesn't have to run on page loads at all, but perhaps, only when a user is submitted a comment, for example.
4:50 pm on Mar 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Dec 13, 2009
posts:945
votes: 0


Hmm, The problem is the way the site has been coded it is applying the convert-to-HTML as it pulls the stuff from the MySQL database...

Still, I'm not going to be letting people edit comments after posting, so I suppose I could just write a completley new system for comment saving that applies this during the insert

Thanks for replying - and unfortunatley the owner of the server I use is relying on Gentoo Portage for PHP updates and they still havn't cleared PHP 5.3 - so I can't do micro seconds :(
5:16 pm on Mar 12, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 17, 2009
posts:41
votes: 0


I'd go:
<([A-Z][A-Z0-9]*)>.*?</\1>

Personally...
9:54 am on Mar 13, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Dec 13, 2009
posts:945
votes: 0


The problem with that Chasehx is I want to allow the use of some attributes (which are ofcourse checked against an invalid list), and it is by far easier to validate both the tag and it's attributes with seperate back references.

Anyways, I've had a thought on a way of modifying every use of both my BB code function, my HTML function and my "webify" function to seriously reduce my overheads and still allowing editing (where I want it) which is so simple I can't believe it didn't occur to me before.

Apply the functions during the insert, and save both pre-function and post-function content in the database.