homepage Welcome to WebmasterWorld Guest from 54.227.215.140
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Web scraping
Need some help with web scraping
bayridge



 
Msg#: 4652704 posted 1:44 pm on Mar 10, 2014 (gmt 0)

I'm trying to get baseball scores each day and use it in a script to show on my site. Anyone familiar with web scraping and can point me to some sample php scripts on doing scraping? I don't want rss. Thanks.

 

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4652704 posted 2:26 am on Mar 26, 2014 (gmt 0)

Welcome to WebmasterWorld, bayridge.

You can build your own spider/bot using PHP and the cURL API. The PHP manual pages have some examples:

[php.net...]

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4652704 posted 3:52 am on Apr 5, 2014 (gmt 0)

You can even do it using $data=file_get_contents("http://example.com");

Then you can process the content of $data which is a big string containing your web page.

...or something more sophisticated:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html=file_get_contents("http://www.example.com");
$doc->loadHTML( $html);

Now you have the page loaded in a DOMDocument object as $doc and you can extract anything you need using getElementsByTagName.

Or use DOMXPath and it's query functions.

Get the idea?

Several ways to do it.

Here's some examples using the methods I described:
[anchetawern.github.io...]

There's a really nice step by step tutorial for DIY scraping programming here:
[oooff.com...]

bayridge



 
Msg#: 4652704 posted 3:01 pm on Apr 5, 2014 (gmt 0)

Thanks for your help. I will give it a try.

bayridge



 
Msg#: 4652704 posted 4:37 pm on Apr 5, 2014 (gmt 0)

Not working for me. Tried it and get errors.

Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 4

Warning: file_get_contents(http://www.example.com) [function.file-get-contents]: failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 4

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Empty string supplied as input in /home/content/m/i/k/mikey/html/mysite/scrape01.php on line 5

Used your script
<?php
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html=file_get_contents("http://www.example.com");
$doc->loadHTML( $html);
?>

bayridge



 
Msg#: 4652704 posted 4:40 pm on Apr 5, 2014 (gmt 0)

First example worked ok from pokemon site.

Second one didn't work

Parse error: syntax error, unexpected T_VARIABLE in /home/content/m/i/k/mikey/html/mysite/scrape02.php on line 2

script
<?php
2. $url = 'http://www.oooff.com';
3. $output = file_get_contents($url);
4. echo $output;
5. ?>

bayridge



 
Msg#: 4652704 posted 4:50 pm on Apr 5, 2014 (gmt 0)

Oops. I are a idiot.

Forgot to remove line numbers 1-5 in last example.

Thanks.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved