Welcome to WebmasterWorld Guest from 54.157.222.62

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

GoogleBot log tool?

Log exact times of indexing activity?

   
12:53 pm on Oct 17, 2006 (gmt 0)

5+ Year Member



Is it possible to install a script of some kind that logs the google bots activity on my site?

One thing I would like to know is the exact times the bot indexes my index page.

1:08 pm on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some time ago I used to use a php browser sniffer script which would send me an email everytime a Googlebot visited a page. But I'm sure there are more sophisticated options available.

You could also download the raw logs and run a script (Perl or suchlike) to pull out the Googlebot records.

1:19 pm on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Quick and dirty from the commandline:-

tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log

Run that in a screen to leave it going in the background.

TJ

2:20 pm on Oct 17, 2006 (gmt 0)

5+ Year Member



Trillian - Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

Harry - My ISP wont let me acess the raw logfiles

[edited by: djingo at 2:43 pm (utc) on Oct. 17, 2006]

2:49 pm on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

You'd have to have shell access to the server.

2:57 pm on Oct 17, 2006 (gmt 0)

5+ Year Member



Its not possible with my ISP.
4:50 pm on Oct 17, 2006 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



a host with no raw log access and no shell access is a bad host

raw log access is necessary

11:24 pm on Oct 17, 2006 (gmt 0)

5+ Year Member



I am going to get a new ISP when my subscription runs out.

I found a little script though that tracks raw bot activity: [psi-tech.com.au...] and it seems to do the job effectively.

11:17 am on Oct 18, 2006 (gmt 0)

5+ Year Member



you can write a 5 lines script that checks if $_SERVER['HTTP_USER_AGENT'] contains 'Googlebot'
5:09 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



those are pretty easy to write

you can also do them by user agent as well as ip, just in case there is a sneaky one around ;)

though you will get false info with people spoofing the googlebot user agent, simple to switch in some browsers or when they are using cURL

1:43 pm on Oct 20, 2006 (gmt 0)

5+ Year Member



Actually I would like to get my hands on a script that logs "everything about everybody"

Basically

ip
referring url
target url
date

for all activity on my site.

Anyone who knows where I can get such a script?

(I cant do PHP my self unfortunately)

3:17 pm on Oct 20, 2006 (gmt 0)

10+ Year Member



Is it possible to install a script of some kind that logs the google bots activity on my site?
One thing I would like to know is the exact times the bot indexes my index page.

The

tail -f path/to/httpd/access.log grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log
as suggested by trillianjedi, will only work on a Linux system!

Here a small PHP scriped that will do the job.

The script will send an email when Google hits your site.
(if you want you can also write this to a database, but than you will need a bit more code)

It will do a host name lookup (to see if it's really comming from the Google network), and if so send you an email on what page the hit was on.
The time of the email will give you the exact times.

You will need to add (or include) this code on every page.

<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $HTTP_SERVER_VARS["REMOTE_ADDR"];
$host = @gethostbyaddr($HTTP_SERVER_VARS["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

[edited by: Noel at 3:21 pm (utc) on Oct. 20, 2006]

3:55 pm on Oct 20, 2006 (gmt 0)

5+ Year Member



Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?
4:36 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



if you want php advice then go to the PHP forum [webmasterworld.com]

but

$HTTP_SERVER_VARS is deprecated, use $_SERVER

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

also, everything you are asking for would be available if you had raw logs

just get a new host, look at how much time and therefore money this host is already costing you.

8:37 pm on Oct 21, 2006 (gmt 0)

10+ Year Member



Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

Sure.. Only add (or include) it on/to your index.php

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

jatar_k is correct with this. The script WILL DO a host lookup everytime to see if it's really from Google. If you have access to a database with all the "google" IP's it will be way faster!

$HTTP_SERVER_VARS is deprecated, use $_SERVER

Again true! See the updated script below!


<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($_SERVER["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($_SERVER["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $_SERVER["REMOTE_ADDR"];
$host = @gethostbyaddr($_SERVER["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

12:20 pm on Oct 30, 2006 (gmt 0)

10+ Year Member



The new google tool in [google.com...] will allow you to see a graph of when googlebot visited and how much work it did.