Forum Moderators: open
Using Perl on Linux+Apache, I'm trying to parse this webpage:
[www2.google.com...]
in order to obtain the number of backlinks.
But I get an error message. On the other hand, I'm able to parse pages like:
[google.com...] or
[google.com...]
I'm using 'LWP::Simple' Perl module.
I suppose that Google does not allow parse their results. Is this right?
Does anybody have any experience?
Thank you very much.
I suppose that Google does not allow parse their results. Is this right?
Donīt [s]end automated queries to Google in an attempt to monitor your site's ranking. [google.com]
Thatīs the legal side of things[1]. However, technically there is nothing they can do to prevent you from parsing their results once you obtained them. Itīs just some stupid HTML document after all.
Running your queries from a fixed ip address might be a bad idea. Running thousands of automated queries from a dynamic ip address might be a bad idea, since Google might block access to their site for the whole ip block.
Using automated queries in a sensible manner (donīt hammer Googleīs server with requests - one query a day) will probably work ok.
Using LWP::Simple [search.cpan.org] you cannot change the UA string. But something like the following code will work:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(timeout => 30,
............................ agent => 'some real browser UA string',
............................);
#
$response =
$ua->get('http://www2.google.com/search?q=link:http://www.yahoo.com');
Your aim of building a "Google-update-Start-Detect-Machine" suggests, however, that you wonīt use this in a sensible manner. Since you probably want to run it every minute or so to detect the start of an update. There is no doubt that such a high frequency of requests will get you into trouble.
Andreas
--------------------------
[1] One might argue that trying a figure out when a new update starts is no "attempt to monitor your site's ranking". But...
Not if you use a large number of proxies, and change them every day. :)
<added>Ouch! just read that you want to make this using Perl-Linux-Apache.. so I think you mean from a server, your server maybe.. be very very careful hehe ;)</added>
cminblues