Forum Moderators: coopster

Message Too Old, No Replies

Scraping a Secure Site

is it possible

         

Knowles

10:45 am on Jun 8, 2004 (gmt 0)

10+ Year Member



I am working on a project for work and I need data from another website we use for work. However we do do not have access to a datafeed to pull the information in an easy way so we need to scrape the screen for all the information we need. So far we have tried to use file_get_contents to get it to pull the information but we only receive the access denied page. Does anyone know an effective way to scrape a secure site?

jpjones

10:52 am on Jun 8, 2004 (gmt 0)

10+ Year Member



By secure, I'm guessing you mean requires username & password, rather than SSL?

If so, you'll need to figure out how the access is controlled to the page. Is it basic authentication, through something like .htaccess? If yes, you can use curl and supply a username and password.

If it's something a little more elaborate, then it could be cookie based. If so, you'll need to develop a script which logs in first (presumbly thorugh a web form on the site), reads the cookies and passes the cookies back to the site on each request. Again, curl can be used to do this.

Excuse me if i'm barking up the wrong tree.

RichD

10:57 am on Jun 8, 2004 (gmt 0)

10+ Year Member



file_get_contents should work. You may need to change the user agent that php sends if the remote site is using it to decide what content to send - ini_set('user_agent','put your user agent here');.

You will also need to ensure that php is compiled with the openSSL library. I can't find a link where I found that out, but it only started working for me once I added that in.

Edit: assumed it was secure as in SSL, hopefully one of the replies will be on the right track ;)

Knowles

10:14 pm on Jun 8, 2004 (gmt 0)

10+ Year Member



To be honest I am not sure if its not a bit of both. I will give both options a shot.