Parsing HTML in Mozilla

I'm trying to load the contents of a URL in JavaScript and get back a fully functioning DOM document. Things are easy if one is loading XML or XHTML, but I need to load HTML.

I've tried with XMLHttpRequest:
1. Overriding content-type with text/html and hoping responseXML would have a usable DOM
2. Inserting responseText into a newly created iframe element as innerHTML
3. Setting responseText as innerHTML on a DocumentFragment
4. Removing <html>, <head>, and <body> from responseText, and setting this as innerHTML on a newly created div element
And with iframes (which I'd rather avoid):
1. Setting URL src on an unattached iframe element and trying to read out its document dom

In the few occasions I've been able to coax out a DOM, for some reason XPath queries totally fail against it. The nodes are there, but I can't get a match. I'm no XPath expert by any means, and it's easily possible that I'm not getting all the contexts correct.

Anybody pull this off in Mozilla? It seems like this should be simple, since, you know, the main thing the browser does is load HTML.

_{Don't need any cross-platform advice, thanks, this is for an (unprivileged) Mozilla extension.}

Parsing HTML in Mozilla

jollymcfats

jollymcfats

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week