homepage Welcome to WebmasterWorld Guest from 54.227.215.140
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Regex grab HTML tags keep back reference
Readie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4096215 posted 10:56 am on Mar 12, 2010 (gmt 0)

I wrote a script for a list of allowed HTML recently and I'm wondering if anyone with a bit more experience than me could compare the three regex (All work) that I wrote to aquire the non-self closing tags, and tell me which one is likely to incur the least overhead - because this will be going into effect for user comments and could end up looping several hundred times on (some) page loads.

In use:
/<([^ \/>]+)([^>]+)?>(?m)([^(\<\/\\1\>)]+)(?-m)<\/\\1>/is

Others:
/<([^ \/>]+)([^>]+)?>(?m)(.*?)(?-m)<\/\\1>/is
/<([^ \/>]+)([^>]+)?>(?m)(.*?(?!<\/\\1>).*?)(?-m)<\/\\1>/is

Cheers in advance,

Mike

 

eelixduppy

WebmasterWorld Senior Member eelixduppy us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4096215 posted 4:36 pm on Mar 12, 2010 (gmt 0)

>> looping several hundred times on (some) page loads.

I think this is more of a problem than how much overhead each of these regex's will have. To be honest if I had to guess I'd say that each of these would perform pretty close to the same if not exactly the same as far as anyone would be able to tell. If you want a more detailed analysis of their performance, that should be done on your box. Record the timestamp (to microseconds) before and after and find the difference. I still don't think it will make that much of a difference, though. You should work more on getting it so that it doesn't have to run on page loads at all, but perhaps, only when a user is submitted a comment, for example.

Readie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4096215 posted 4:50 pm on Mar 12, 2010 (gmt 0)

Hmm, The problem is the way the site has been coded it is applying the convert-to-HTML as it pulls the stuff from the MySQL database...

Still, I'm not going to be letting people edit comments after posting, so I suppose I could just write a completley new system for comment saving that applies this during the insert

Thanks for replying - and unfortunatley the owner of the server I use is relying on Gentoo Portage for PHP updates and they still havn't cleared PHP 5.3 - so I can't do micro seconds :(

chasehx

5+ Year Member



 
Msg#: 4096215 posted 5:16 pm on Mar 12, 2010 (gmt 0)

I'd go:
<([A-Z][A-Z0-9]*)>.*?</\1>

Personally...

Readie

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4096215 posted 9:54 am on Mar 13, 2010 (gmt 0)

The problem with that Chasehx is I want to allow the use of some attributes (which are ofcourse checked against an invalid list), and it is by far easier to validate both the tag and it's attributes with seperate back references.

Anyways, I've had a thought on a way of modifying every use of both my BB code function, my HTML function and my "webify" function to seriously reduce my overheads and still allowing editing (where I want it) which is so simple I can't believe it didn't occur to me before.

Apply the functions during the insert, and save both pre-function and post-function content in the database.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved