Forum Moderators: open

Message Too Old, No Replies

Any way to read raw source code of a page?

(or read a remote page from a server)

         

Lance

7:48 pm on Nov 1, 2004 (gmt 0)

10+ Year Member



Is there any way to just read the raw, unparsed source code of the loaded page?

Short of that, is there any kind of "getPage" method that will let me read a page from a server and just have it as a string object?

dmorison

8:08 pm on Nov 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does document.body.innerHTML return what you want?

Lance

8:52 pm on Nov 1, 2004 (gmt 0)

10+ Year Member



No, it doesn't... It doesn't give me enough.

And I've tried document.documentElement.outerHTML, but that appears to have already been through the rendering engine because what comes out looks nothing like what went in, on IE anyway, which is where I need this to work. All the <TAGS> are <UPPERCASE> regardless of how they were in the source, and some other really strange thingsTM appear on some pages.

Here is an example:
Original:

<span>

Via document.documentElement.outerHTML:

<SPAN fixed_bound="true">*

Additionally, the layout (white space) is truncated.

*And as a side note:
I will worship at the feet of who ever can tell me what fixed_bound="true" means.

Bernard Marx

11:20 pm on Nov 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



innerŠouterHTML will return the browser's own internal view of the document.
This is 'live' - it shows the current state, so it's very useful for debugging any kind of script that does dynamic creationŠamendment.

IE, in particular, is well known for returning not completely valid code (upper case, unquoted attributes etc) - although it's getting better.

IE will also treat javascript 'expando' properties as attributes. This is the only possible explanation I have for your

fixed_bound = "true"
. Do you have a script that is adding this as a property to that element?

For geting the original code (other than view-source:). Here are a couple of avenues to follow (both same domain only):

1. The download behavior (quick'n'easy, but IE-only)
[msdn.microsoft.com...]

2. XMLHTTPRequest (most modern browsers; differing implementations)

Lance

2:01 am on Nov 2, 2004 (gmt 0)

10+ Year Member



Microsoft.XMLHTTP seems to be doing the trick. I got an "Access Denied" on the behavior. And apparently, XMLHTTP isn't ActiveX, so there are no security warnings to get in the way.

Thanks.

Oh, no expando properties... I have no idea where fixed_bound is coming from. It's not hurting anything, and can only even be seen when viewing the source using outerHTML, so I guess I'm not really worried about it.

And for some reason, in their infinite wisdom, MS decided view-source: was no longer needed in XP SP2.