Welcome to WebmasterWorld Guest from 3.226.243.130

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

PHP regular expression experience

     
12:11 pm on Aug 30, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 3, 2004
posts:598
votes: 0


Hello world, i need to analize page with regular expression to cut off content from it. But the problem in following: the page is simple created all content is putted into tags <p>{content}</p> and there are many rubbish information if i have with such editing!
May be somebody faces this problem to make exactly right content from pages?
Thanks in advance!
1:27 pm on Aug 30, 2005 (gmt 0)

New User

joined:Feb 2, 2005
posts:24
votes: 0


You can try this:


$reg_exp = array('<(?i)style[\s\S]*?\/style>','<(?i)script[\s\S]*?\/script>','<!--[\s\S]*?-->','@<[\/\!]*?[^<>]*?>@si', '@<[\/\!]*?[^<>]*?@si','@
\/\*[\/\!]*?[^<>]*?\*\/@si','/&nbsp/');
$results = preg_replace($reg_exp, array(""), $your_html_goes_here);

After what $results should hold 'content only'. I should warn you that this code still has some problems (mostly with javascript detection)

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members