|How to get content of html page using DOM|
How to get source of html page using DOM to save as separate file
| 4:57 am on Jun 1, 2010 (gmt 0)|
I need to save webpage source to another .html file by maintaining its current state.
| 6:54 am on Jun 1, 2010 (gmt 0)|
There are several issues with doing so (why would you want to?):
1. There is no good, reliable cross-browser methods of accessing the page's doctype declaration properly from the client side. This would be best retrieved from the server side by parsing the raw file for it's doctype declaration with php.
2. document.getElementsByTagName('html').innerHTML will contain the contents of the html element in it's "live" state. One problem I see already, is that IE does not seem to include quotes around element attributes such as id in elements which were added in via innerHTML (have not checked via createElement, or many other attributes, but is now irrelevant because of said problem existence). Also, the html element it's self, you may want to check it's attributes collection and include them, or parse that from the server side should be fine, highly unlikely any client-side code would mess with the html elements attributes.
p.s. welcome to webmasterworld!
| 7:02 am on Jun 1, 2010 (gmt 0)|
5. LOL, if the user is using Firebug, there will be an extra div added in to your page: <div firebugversion="1.5.3" style="display: none;" id="_firebugConsole"></div>
not to mention greasemonkey issues too...
| 7:16 am on Jun 1, 2010 (gmt 0)|
Thanks for the response.
2. Here I have problem in saving .html file.
As you said i have tried to get html content as follows,
document.getElementsByTagName('html').innerHTML; also maintains state. But entities like and & are also displayed.
3. So basically i need some way to get html source, using standard DOM interface to maintain state and also which displays entities as proper characters.
Thanks in advance
[edited by: kokilakr at 7:36 am (utc) on Jun 1, 2010]
| 7:31 am on Jun 1, 2010 (gmt 0)|
some more information,
2. I need this to work for QT Webkit.
| 7:34 am on Jun 1, 2010 (gmt 0)|
| 7:35 am on Jun 1, 2010 (gmt 0)|
Oh, well I'd have to say I think you may be SOL then..., but I don't know much about QT Webkit
| 7:43 am on Jun 1, 2010 (gmt 0)|
|Oh, well I'd have to say I think you're SOL then... |
What do you mean by SOL? Sadly Outta Luck?
How can I go forward?
| 7:49 am on Jun 1, 2010 (gmt 0)|
Not "Sadly" but that's very close, think doggy doo-doo
I don't know, those are all my suggestions, maybe someone else will pipe up with an idea or two.
| 9:13 pm on Jun 1, 2010 (gmt 0)|
| 1:23 pm on Jun 3, 2010 (gmt 0)|
| 1:36 pm on Jun 3, 2010 (gmt 0)|
Note, I seem to recall a few years ago having a problem with password field values not being included when doing something similar to this, so if you use the innerHTML approach, make sure you test it well.