Forum Moderators: coopster

Message Too Old, No Replies

PHP regular expression experience

         

orion_rus

12:11 pm on Aug 30, 2005 (gmt 0)

10+ Year Member



Hello world, i need to analize page with regular expression to cut off content from it. But the problem in following: the page is simple created all content is putted into tags <p>{content}</p> and there are many rubbish information if i have with such editing!
May be somebody faces this problem to make exactly right content from pages?
Thanks in advance!

R e b r a n d t

1:27 pm on Aug 30, 2005 (gmt 0)



You can try this:


$reg_exp = array('<(?i)style[\s\S]*?\/style>','<(?i)script[\s\S]*?\/script>','<!--[\s\S]*?-->','@<[\/\!]*?[^<>]*?>@si', '@<[\/\!]*?[^<>]*?@si','@
\/\*[\/\!]*?[^<>]*?\*\/@si','/&nbsp/');
$results = preg_replace($reg_exp, array(""), $your_html_goes_here);

After what $results should hold 'content only'. I should warn you that this code still has some problems (mostly with javascript detection)