Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Web scraping

Need some help with web scraping

     
1:44 pm on Mar 10, 2014 (gmt 0)

New User

5+ Year Member

joined:May 3, 2010
posts: 7
votes: 0


I'm trying to get baseball scores each day and use it in a script to show on my site. Anyone familiar with web scraping and can point me to some sample php scripts on doing scraping? I don't want rss. Thanks.
2:26 am on Mar 26, 2014 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


Welcome to WebmasterWorld, bayridge.

You can build your own spider/bot using PHP and the cURL API. The PHP manual pages have some examples:

[php.net...]
3:52 am on Apr 5, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


You can even do it using $data=file_get_contents("http://example.com");

Then you can process the content of $data which is a big string containing your web page.

...or something more sophisticated:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html=file_get_contents("http://www.example.com");
$doc->loadHTML( $html);

Now you have the page loaded in a DOMDocument object as $doc and you can extract anything you need using getElementsByTagName.

Or use DOMXPath and it's query functions.

Get the idea?

Several ways to do it.

Here's some examples using the methods I described:
[anchetawern.github.io...]

There's a really nice step by step tutorial for DIY scraping programming here:
[oooff.com...]
3:01 pm on Apr 5, 2014 (gmt 0)

New User

5+ Year Member

joined:May 3, 2010
posts: 7
votes: 0


Thanks for your help. I will give it a try.
4:37 pm on Apr 5, 2014 (gmt 0)

New User

5+ Year Member

joined:May 3, 2010
posts: 7
votes: 0


Not working for me. Tried it and get errors.

Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 4

Warning: file_get_contents(http://www.example.com) [function.file-get-contents]: failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 4

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Empty string supplied as input in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 5

Used your script
<?php
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html=file_get_contents("http://www.example.com");
$doc->loadHTML( $html);
?>
4:40 pm on Apr 5, 2014 (gmt 0)

New User

5+ Year Member

joined:May 3, 2010
posts: 7
votes: 0


First example worked ok from pokemon site.

Second one didn't work

Parse error: syntax error, unexpected T_VARIABLE in /home/content/m/i/k/mikey/html/mysite/scrape02.php on line 2

script
<?php
2. $url = 'http://www.oooff.com';
3. $output = file_get_contents($url);
4. echo $output;
5. ?>
4:50 pm on Apr 5, 2014 (gmt 0)

New User

5+ Year Member

joined:May 3, 2010
posts: 7
votes: 0


Oops. I are a idiot.

Forgot to remove line numbers 1-5 in last example.

Thanks.