Forum Moderators: coopster

Message Too Old, No Replies

PHP for HTML string parsing with CSS selectors

PHP for HTML string parsing with CSS selectors

         

OscarHiboux

10:09 pm on Oct 16, 2009 (gmt 0)

10+ Year Member



Hello!

Today I tried to parse a HTML flow in order to inject some contents at some places. First I've tried to use the DOMDocument native object. The problem with it is that it gets lost with CDATA and script comments like:

<script type='text/javascript'>/*<![CDATA[*/
var urls = new Array();
urls[0] = 'blablah';
// urls[...] = ...
var url = urls[Math.floor(Math.random()*urls.length)];
document.write('<div class="banner" style="background:transparent url('+url+') top left no-repeat;"></div>');
/*]]>*/</script>

I don't know if it is when it is parsing or when it is printing back the DOM tree but I get some different results and my page is simply broken. This by doing nothing with DOM after its loading, just:

$document = new DOMDocument();
@$document->loadHTMLFile('page.html');
echo $document->saveHTML();

I'm using libxml 2.7.3 for your information. I don't know if there's a known issue about that.

So, do you guys know a safe way to parse HTML documents, that might not be completely strict, then modify them slightly before rendering them?

I had a look on htmlpurifier but it seems complicated at the first glance. I don't know if that could help me going on with the DOM API...

Maybe there are some cool HTML swallow parsers in PHP?

Any idea could be great! :)

Thanks!

dreamcatcher

6:31 am on Oct 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HTML swallow parsers

Interesting terminology. lol. I`m not really sure I understand what you are attempting to do. Sorry.

dc