| Extracting text enclosed beween HTML tags
|
Blutarsky

msg:4011450 | 10:56 am on Oct 22, 2009 (gmt 0) | Hi there, I'm a big PHP rookie, so I'm still experimenting regular expressions. That is, I'm asking my self what code to use to extract a string of text enclosed between all possible HTML tags?
|
miketheman

msg:4011478 | 12:30 pm on Oct 22, 2009 (gmt 0) | echo strip_tags(preg_replace(array('@<head[^>]*?>.*?</head>@siu','@<style[^>]*?>.*?</style>@siu','@<script[^>]*?.*?</script>@siu','@<object[^>]*?.*?</object>@siu','@<embed[^>]*?.*?</embed>@siu','@<applet[^>]*?.*?</applet>@siu','@<noframes[^>]*?.*?</noframes>@siu','@<noscript[^>]*?.*?</noscript>@siu','@<noembed[^>]*?.*?</noembed>@siu','@<((br)¦(hr))@iu','@</?((address)¦(blockquote)¦(center)¦(del))@iu','@</?((div)¦(h[1-9])¦(ins)¦(isindex)¦(p)¦(pre))@iu','@</?((dir)¦(dl)¦(dt)¦(dd)¦(li)¦(menu)¦(ol)¦(ul))@iu','@</?((table)¦(th)¦(td)¦(caption))@iu','@</?((form)¦(button)¦(fieldset)¦(legend)¦(input))@iu','@</?((label)¦(select)¦(optgroup)¦(option)¦(textarea))@iu','@</?((frameset)¦(frame)¦(iframe))@iu',),array(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0","\n\$0", "\n\$0",),$html));
|
miketheman

msg:4011479 | 12:32 pm on Oct 22, 2009 (gmt 0) | Let me know if it works for you....also you'll need to replace the broken pipe line -> ¦ with solid ones (the forum screens them out) I'm pretty sure thats the right piece, I might have accidently excluded something because my filter is extremely long (I put all codes on one line instead of building on my systems memory)
|
Blutarsky

msg:4011480 | 12:45 pm on Oct 22, 2009 (gmt 0) | The fact is that I need to manipulate text string without HTML tags, one by one, just because after manipulating the text I need to place it back were it was. Example <div class="someclass">this is enclosed text</div> <p> other text to be manipulated here</p>-- do the job -- <div class="someclass">this is the new enclosed text</div> <p> other text to has been manipulated here</p> Basically I should (pseudocode): - parse the html code - for each match of enclosed text into HTML tags do: ----- see if text is between proper tag, for example skip (but keep) text into <meta>, <script>, <link> & <style> tags ----- manipulate enclose text if needed - next match, start from top
|
|
|