Forum Moderators: open

Message Too Old, No Replies

Frames and accessing content

screen scraping possible?

         

Miki

6:06 am on Jan 8, 2004 (gmt 0)

10+ Year Member



I hope this is the right forum, as it is a two part question (frames + screenscraping).

frame1 loads page1.html. I would like to get mypage.html in frame2 to scrape the contents of frame1 using Javascript. Is this possible? frame1 will only ever hold dynamic content determined by stdin variables, btw, so I can't scrape by passing mypage.html an URL with a querystring - is there a way to access the cached browser frame1 contents?

Also, when experimenting with frames and the javascript dom, I noticed I could only get mypage.html to output top.frames['frame1'].name when frame1 held a local html page. mypage.html would output nothing when frame1 held an external webpage. Is something amiss?

(I'm using IE5.5 on Windows 2000, btw.)

tedster

7:11 am on Jan 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is something amiss?

Not at all - this is a standard and very necessary security restriction. Imagine the kinds of exploits the unscrupulous could generate if it were otherwise.

I'm assuming this is the situation you described in this thread:
[webmasterworld.com...]

Sorry, but I don't see javascript as a viable way to resolve your challenge because of the cross-domain security which you've already stumbled onto.

Miki

8:04 am on Jan 8, 2004 (gmt 0)

10+ Year Member



Hey, tedster, thanks for the reply. :) It's the same situation, but instead of trying to fake a POST, I want to know what I can do with frames.

I am currently using a Python script to perform a screen scrape, which appears to be allowed by the Mother site. Regarding the frames, however, something does seem wrong.

Normally, I'm able to access the name of the frame using Javascript. So if page1.html sits in frame1 and page2.html site in frame2, I am able to

document.write(top.frame['frame1'].name)

in page2.html and have it output: frame1

However, this only works when I'm viewing a local HTML file. Why does accessing the name variable of my own frame impede security? :(

Thanks again, btw. :)