Forum Moderators: coopster

Message Too Old, No Replies

Trouble with spider links and count keyword script

php spider, php count

         

ToastyPost

5:17 pm on Sep 24, 2011 (gmt 0)

10+ Year Member



Hi,

I am writing a script to spider my page of twitter results, and get the URLs and their content, then count the number of times the search term shows up in each URLs content - to find the URL with the most occurrences of the keyword.

I wrote the script below and have it working - but it still takes a looooong time to return the results.

There has to be a more efficient or better way to do this.

Can you help?

Thanks,
Robert

<?php

$linkpage=file_get_contents("http://www.toastypost.com/twit/?q=orlando%20brown");
$start = strpos($linkpage,'border="0" /></a>');
$end = strpos($linkpage,'Copyright',$start);
$linkpage = substr($linkpage,$start,$end-$start);

preg_match_all(
'/href="http:\/\/t.co\/(.*?)"/s',
$linkpage,
$links,
PREG_SET_ORDER
);

foreach ($links as $link) {
$alink='http://t.co/'.$link[1];

echo $alink.' has ';

$html = file_get_contents($alink);
preg_match_all(
'/Orlando Brown/s',
$html,
$posts,
PREG_SET_ORDER
);
$result = count($posts);
echo $result.' instances or Orlando Brown<br><br>';
}
?>

penders

8:10 pm on Sep 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How looooong does it take to run?

Some of those links do take quite a few seconds to return a page, so multiply that by the number of links and you have quite a delay it would seem.

ToastyPost

9:07 pm on Sep 24, 2011 (gmt 0)

10+ Year Member



It runs 10 to 30 seconds - then if too long it times out and gives me an internal server error.

Is there a better way to do this?

TY :)