<script src="http://blahblah"></script>I remember somebody telling me you can do this with perl, but I no longer have the info on how to do it. The script goes out and gets updated data and displays it on the browser.... I'd like to run a cron and get that data once a day and enter that data to the page in a static format rather than dynamic.
Thanks in advance for any suggestions!
Right now if I do I view source of the page it just shows the <script tag and not the actual html that my browser gets and displays.
I have no knowledge with writing CGI scripts so I will need some "leading" :)
I also tried using cURL last night and didn't get any different results.
Thanks :)
The info provided to me from my merchant is in the form of a <script tag and this builds the page on the browser on the fly, with the latest info from the merchant. All you see when you do a browser view source is the <script src="http://blahblah"></script> part. But the page displays multiple items, text and images, and this is the part I'm trying to retrieve.
I'd like to run a cron to get the HTML data of the contents of the script once a day and then save it to a text file. This way I can use an include to present the results in a static format rather than dynamic. But I have no clue how to do this.
For example, you might want the same bit of javascript in all your web pages. Rather than repeat the sam JS code in all pages, you can put it in a separate file and then use a script tag with a SRC attribute to refer to it.
What happens then is that when the page is loaded, the script is also fetched and executed IN THE BROWSER - but when you do a page view, you don't see the contents of the script file, just the script tag (as you do).
The fetch idea you have would only work if the script executed on the web server - which is not the case here.
Simon
There are a few different ways to do it, but basically :
1. Grab the html from the merchant site using your cronned, perl or php script or wget and save it to a text file daily.
2. Use SSI in your html file to include that text file.
or, do 1, but use perl or php to save the data and the remainer of your page as your html file. No text file or SSI.
My perl is rusty, so I can't give code examples. sorry.
var lsn_hid='***'; var lsn_eid='******'; var lsn_oid='***'; var lsn_u1=''; var lsn_click='http://somewebsite.com/cgi-bin/click?id=******&var=****.'+'***'+'&type=14&catid='+'1'+'&hid='+'***';
document.write('');
1) Command line switches and Mozilla?
2) there is a JavaScript Interpreter module out there for perl.
Check CPAN for JavaScript and Mozilla for spidermonkey
[search.cpan.org...]
[mozilla.org...]
( I used theme only ready installed. they are not perfect. but in combination with lwp and perl they can do what you need ... perhaps ^^)
Let's call the current static page on your site "mypage.htm"
If that page contains JS or SSI that produces content and inserts it into "mypage.htm" on-the-fly, it MAY or MAY NOT be possible to pre-fetch the content and automatically insert it as static content.
If the content can be retrieved (so you can paste it into your page and make the entire page appear as static content originating from your domain), you will most likely be able to do it with LWP (as others have suggested).
The possibilty that you MAY NOT be able to do this could arise from:
1. the server where the content originates relies on the URL of the calling page to produce the content, (ie- their server says, "produce HTML for JS or SSI requests originating from "http://domain.tld/specific-page.htm")
2. the content constantly changes and is produced specifically at time/date request is made, (you could end up publishing outdated info even if you ran your scraper script every hour)...
3. the approach you are taking is against the TOS of the data provider you are attempting to scrape from.
There is most likely "some way to do it", but you may not get what you expect from the resulting data.
var lsn_hid='***'; var lsn_eid='******'; var lsn_oid='***'; var lsn_u1=''; var lsn_click='http://somewebsite.com/cgi-bin/click?id=******&var=****.'+'***'+'&type=14&catid='+'1'+'&hid='+'***'; document.write('');
Chances are you can build the var data as a query string in the form of:
#
$URL = 'http://somewebsite.com/cgi-bin/click';
$varName_1 = 'lsn_hid';
$varData_1 = 'data';
$varName_2 = 'lsn_eid';
$varData_2 = 'data';
$varName_3 = 'lsn_oid';
$varData_3 = 'data';
etc..
Then feed it to LWP as a request to scrape that URL
#
$scrapeURL = "$URL?$varName_1=$varData_1&$varName_2=$varData_2&$varName_3=$varData_3"; #(etc..)
#
.