homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

Extracting text enclosed beween HTML tags

 10:56 am on Oct 22, 2009 (gmt 0)

Hi there, I'm a big PHP rookie, so I'm still experimenting regular expressions.

That is, I'm asking my self what code to use to extract a string of text enclosed between all possible HTML tags?



 12:30 pm on Oct 22, 2009 (gmt 0)

echo strip_tags(preg_replace(array('@<head[^>]*?>.*?</head>@siu','@<style[^>]*?>.*?</style>@siu','@<script[^>]*?.*?</script>@siu','@<object[^>]*?.*?</object>@siu','@<embed[^>]*?.*?</embed>@siu','@<applet[^>]*?.*?</applet>@siu','@<noframes[^>]*?.*?</noframes>@siu','@<noscript[^>]*?.*?</noscript>@siu','@<noembed[^>]*?.*?</noembed>@siu','@<((br)¦(hr))@iu','@</?((address)¦(blockquote)¦(center)¦(del))@iu','@</?((div)¦(h[1-9])¦(ins)¦(isindex)¦(p)¦(pre))@iu','@</?((dir)¦(dl)¦(dt)¦(dd)¦(li)¦(menu)¦(ol)¦(ul))@iu','@</?((table)¦(th)¦(td)¦(caption))@iu','@</?((form)¦(button)¦(fieldset)¦(legend)¦(input))@iu','@</?((label)¦(select)¦(optgroup)¦(option)¦(textarea))@iu','@</?((frameset)¦(frame)¦(iframe))@iu',),array(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0","\n\$0", "\n\$0",),$html));


 12:32 pm on Oct 22, 2009 (gmt 0)

Let me know if it works for you....also you'll need to replace the broken pipe line -> ¦
with solid ones (the forum screens them out)

I'm pretty sure thats the right piece, I might have accidently excluded something because my filter is extremely long (I put all codes on one line instead of building on my systems memory)


 12:45 pm on Oct 22, 2009 (gmt 0)

The fact is that I need to manipulate text string without HTML tags, one by one, just because after manipulating the text I need to place it back were it was.


<div class="someclass">this is enclosed text</div>
<p> other text to be manipulated here</p>

-- do the job --
<div class="someclass">this is the new enclosed text</div>
<p> other text to has been manipulated here</p>

Basically I should (pseudocode):
- parse the html code
- for each match of enclosed text into HTML tags do:
----- see if text is between proper tag, for example skip (but keep) text into <meta>, <script>, <link> & <style> tags
----- manipulate enclose text if needed
- next match, start from top

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved