homepage Welcome to WebmasterWorld Guest from 54.204.77.26
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Extracting text enclosed beween HTML tags
Blutarsky




msg:4011450
 10:56 am on Oct 22, 2009 (gmt 0)

Hi there, I'm a big PHP rookie, so I'm still experimenting regular expressions.

That is, I'm asking my self what code to use to extract a string of text enclosed between all possible HTML tags?

 

miketheman




msg:4011478
 12:30 pm on Oct 22, 2009 (gmt 0)

echo strip_tags(preg_replace(array('@<head[^>]*?>.*?</head>@siu','@<style[^>]*?>.*?</style>@siu','@<script[^>]*?.*?</script>@siu','@<object[^>]*?.*?</object>@siu','@<embed[^>]*?.*?</embed>@siu','@<applet[^>]*?.*?</applet>@siu','@<noframes[^>]*?.*?</noframes>@siu','@<noscript[^>]*?.*?</noscript>@siu','@<noembed[^>]*?.*?</noembed>@siu','@<((br)¦(hr))@iu','@</?((address)¦(blockquote)¦(center)¦(del))@iu','@</?((div)¦(h[1-9])¦(ins)¦(isindex)¦(p)¦(pre))@iu','@</?((dir)¦(dl)¦(dt)¦(dd)¦(li)¦(menu)¦(ol)¦(ul))@iu','@</?((table)¦(th)¦(td)¦(caption))@iu','@</?((form)¦(button)¦(fieldset)¦(legend)¦(input))@iu','@</?((label)¦(select)¦(optgroup)¦(option)¦(textarea))@iu','@</?((frameset)¦(frame)¦(iframe))@iu',),array(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0","\n\$0", "\n\$0",),$html));

miketheman




msg:4011479
 12:32 pm on Oct 22, 2009 (gmt 0)

Let me know if it works for you....also you'll need to replace the broken pipe line -> ¦
with solid ones (the forum screens them out)

I'm pretty sure thats the right piece, I might have accidently excluded something because my filter is extremely long (I put all codes on one line instead of building on my systems memory)

Blutarsky




msg:4011480
 12:45 pm on Oct 22, 2009 (gmt 0)

The fact is that I need to manipulate text string without HTML tags, one by one, just because after manipulating the text I need to place it back were it was.

Example


<div class="someclass">this is enclosed text</div>
<p> other text to be manipulated here</p>

-- do the job --
<div class="someclass">this is the new enclosed text</div>
<p> other text to has been manipulated here</p>

Basically I should (pseudocode):
- parse the html code
- for each match of enclosed text into HTML tags do:
----- see if text is between proper tag, for example skip (but keep) text into <meta>, <script>, <link> & <style> tags
----- manipulate enclose text if needed
- next match, start from top

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved