Forum Moderators: open

Message Too Old, No Replies

Parsing HTML in Mozilla

         

jollymcfats

1:16 am on Mar 24, 2005 (gmt 0)

10+ Year Member



I'm trying to load the contents of a URL in JavaScript and get back a fully functioning DOM document. Things are easy if one is loading XML or XHTML, but I need to load HTML.

  • I've tried with XMLHttpRequest:
    1. Overriding content-type with text/html and hoping responseXML would have a usable DOM
    2. Inserting responseText into a newly created iframe element as innerHTML
    3. Setting responseText as innerHTML on a DocumentFragment
    4. Removing <html>, <head>, and <body> from responseText, and setting this as innerHTML on a newly created div element

  • And with iframes (which I'd rather avoid):
    1. Setting URL src on an unattached iframe element and trying to read out its document dom

In the few occasions I've been able to coax out a DOM, for some reason XPath queries totally fail against it. The nodes are there, but I can't get a match. I'm no XPath expert by any means, and it's easily possible that I'm not getting all the contexts correct.

Anybody pull this off in Mozilla? It seems like this should be simple, since, you know, the main thing the browser does is load HTML.

Don't need any cross-platform advice, thanks, this is for an (unprivileged) Mozilla extension.

jollymcfats

8:15 pm on Mar 24, 2005 (gmt 0)

10+ Year Member



The XPath problem seemed to be simply that the
context
must be attached to the
document
you're calling
evaluate()
on. I would have been happier with XPath on my unattached fragments, but linking into the DOM is no big deal.

Number 4 above is working ok for me now, though I still feel like there's probably a better way.