Forum Moderators: coopster

Message Too Old, No Replies

Reading a webpage

Grr...

         

adni18

9:56 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all. I need to make a PHP script that reads the source code of a webpage (i.e. [google.com...] How can this be done? Tia. :-)

adni18

10:28 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



edit: this is really important. I need an answer ASAP. (don't mean to be impatient)

jatar_k

10:51 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



look at curl
[curl.haxx.se...]
[php.net...]

or open a socket
[php.net...]

henry0

10:51 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Tia,

Don't shoot me if I am wrong!

I think that I have seen a Pear package that does it
by just caling the class

anyone has the info?

regards
Henry

edit: changed function by class! /edit.

[edited by: henry0 at 11:04 pm (utc) on Aug. 11, 2005]

adni18

10:59 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lol. TIA (Thanks In Advance). Thanks!

Edit: Ok, now i have:

<?php
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";

fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
?>

Now how do I get rid of all the headers at the top?

henry0

11:06 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry
I knew I did not remember well
it is indeed a Curl package :)

adni18

11:08 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I edited my post.

Iguana

11:17 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm a bit confused by the answers here. Surely you just file open?

$query = "http://www.widget.com/page.htm";

$ret = "";
$handle = fopen($query, "r");
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
$ret .= $buffer;
}
fclose($handle);

It works for me

adni18

11:24 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I get an infinite number of these:

Warning: <fgets() or feof()>: supplied argument is not a valid stream resource in <snipped>.php on line <6 or 7>

jatar_k

11:47 pm on Aug 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



[php.net...]

If PHP has decided that filename specifies a registered protocol, and that protocol is registered as a network URL, PHP will check to make sure that allow_url_fopen is enabled. If it is switched off, PHP will emit a warning and the fopen call will fail.

that setting should really be off

adni18

12:23 am on Aug 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well I dont have access to the server's settings, so i need something else.

adni18

2:48 am on Aug 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



....any ideas?

jatar_k

2:51 am on Aug 12, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



are we talking about headers?

this thread wandered a bit, I got confused

adni18

11:18 pm on Aug 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Talking about getting rid of the headers at the beginning and that 0 that appear at the end of the output of the code I posted.

adni18

12:07 am on Aug 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nevermind, I know how to get rid of them (use explode()) but now, how is it possible to get a specific file off google, like /images/logo.gif?

<?
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";

fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
?>