Forum Moderators: open

Message Too Old, No Replies

parse DOM from HTML file

         

bwstyle

3:05 pm on Aug 8, 2007 (gmt 0)

10+ Year Member



I'm writing a Firefox plugin. I want to parse an HTML file that will be loaded within javascript.

I'm using Mozilla's file reading to load this HTML file. E.g.


var fstream = Components.classes["@mozilla.org/network/file-input-stream;1"].createInstance(Components.interfaces.nsIFileInputStream);
var sstream = Components.classes["@mozilla.org/scriptableinputstream;1"].createInstance(Components.interfaces.nsIScriptableInputStream);

As I understand, if this HTML was valid XHTML, there'd be no problem parsing with the DOM. I could use getElementsByTagName() to walk through it, or even XPath. But the following generates a parse error as it's not valid XHTML:

var parser = new DOMParser();
var Gdom = parser.parseFromString(FILE_AS_STRING, "text/xml" );
var Gdoc = Gdom.documentElement;

I'd like to parse using DOM, rather than using complicated regular expressions to go through an entire HTML page.

Does anyone have insight into my dilemma?

Fotiman

3:26 pm on Aug 8, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never done this sort of thing before, but is the global 'document' object available to you? If so, then you shouldn't need to parse anything, as the document object can be used with regular DOM methods.

bwstyle

4:02 pm on Aug 8, 2007 (gmt 0)

10+ Year Member



Yes, the "document" object is available to me, but I need to inspect the independent HTML file that exists on the filesystem. It seems I can create a DOM from XML, but not for a separate HTML document.

Originally, I was "GET"ting this independent HTML, putting it into an iframe, and trying to parse through it there. That's a browser security issue (I think)... you can't look through another document's DOM that's not on your domain.

That is why I've developed a procedure to GET the HTML, write it to the filesystem, read it from the filesystem, and then attempt to parse it. But getting this back into DOM form is now killing me...

mltsy

4:50 pm on Aug 23, 2007 (gmt 0)

10+ Year Member



Have you tried using "XML for <SCRIPT>"? I haven't use it, but am about to try it for a related issue, after searching around the web and finding your post along the way :)

[xmljs.sourceforge.net...]