Forum Moderators: open

Message Too Old, No Replies

Count indexed pages

count indexed google pages

         

conm

1:33 pm on May 31, 2004 (gmt 0)



Hello,

I am wondering if somehow it is possible to get a count/percentage on the pages that are not indexed (returning url as title and no cache) but still show up in google when doing a 'site:' search.

For smaller sites one can have an idea on what the percentage is but for a 250K+ pages site it is too difficult and we are experiencing a lot of pages 'disappearing'/becoming unsearchable.

Any feedback will be appreciated.
Michael

bsterz

2:03 pm on May 31, 2004 (gmt 0)

10+ Year Member



Liberally stolen from other posts:

#!/usr/bin/perl -w

use SOAP::Lite;

my @results;
my $key='Your Key Here'; #Google API key
my $query= "site:www.example.com";
my $maxResults = "50";
my $rescount = 0;
open(OUTFILE, ">NoCache.txt") or die "couldn't open the data file : $!";
for ($i=0;$i<=($maxResults/10);$i++){

$googleSearch = SOAP::Lite -> service("http://api.google.com/GoogleSearch.wsdl");
$result = $googleSearch -> doGoogleSearch($key, $query, 2*$i, 10, "false", "", "false", "", "latin1", "latin1");
foreach $r(@{$result->{'resultElements'}}){
$rescount++;
print "$rescount:\n";
print "" . $r->{'URL'} . " " . $r->{'title'} . "\n";
if(!$r->{'cachedSize'}){ # If there is no size listed for the cached page.
print OUTFILE "NoCache " . $r->{'URL'} . "\n";
}
}
}

You have to install the SOAP modules for perl. This is not problem as you would run this type of script on your local box anyhoo..I did this one on a Windoze box, but it should work fine in a *nix environment. Shoot out to Google and snag the API key if you don't have one - it's free. This seems to work fine on a couple of sites I have. I have never used this code for this, just hacked it out real quick. This is not intended to be an example of advanced perl coding, yada yada, just tryin to help a brutha out. Your mileage may vary, performed on a closed course by a trained driver, do not try this at home..

See ya,

Bill