Welcome to WebmasterWorld Guest from 54.167.181.242

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

GoogleBot log tool?

Log exact times of indexing activity?

     
12:53 pm on Oct 17, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts:63
votes: 0


Is it possible to install a script of some kind that logs the google bots activity on my site?

One thing I would like to know is the exact times the bot indexes my index page.

1:08 pm on Oct 17, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 21, 2002
posts:1051
votes: 0


Some time ago I used to use a php browser sniffer script which would send me an email everytime a Googlebot visited a page. But I'm sure there are more sophisticated options available.

You could also download the raw logs and run a script (Perl or suchlike) to pull out the Googlebot records.

1:19 pm on Oct 17, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7249
votes: 0


Quick and dirty from the commandline:-

tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log

Run that in a screen to leave it going in the background.

TJ

2:20 pm on Oct 17, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts:63
votes: 0


Trillian - Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

Harry - My ISP wont let me acess the raw logfiles

[edited by: djingo at 2:43 pm (utc) on Oct. 17, 2006]

2:49 pm on Oct 17, 2006 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12929
votes: 200


Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

You'd have to have shell access to the server.

2:57 pm on Oct 17, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts:63
votes: 0


Its not possible with my ISP.
4:50 pm on Oct 17, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


a host with no raw log access and no shell access is a bad host

raw log access is necessary

11:24 pm on Oct 17, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts:63
votes: 0


I am going to get a new ISP when my subscription runs out.

I found a little script though that tracks raw bot activity: [psi-tech.com.au...] and it seems to do the job effectively.

11:17 am on Oct 18, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Feb 15, 2006
posts:201
votes: 0


you can write a 5 lines script that checks if $_SERVER['HTTP_USER_AGENT'] contains 'Googlebot'
5:09 pm on Oct 18, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


those are pretty easy to write

you can also do them by user agent as well as ip, just in case there is a sneaky one around ;)

though you will get false info with people spoofing the googlebot user agent, simple to switch in some browsers or when they are using cURL

1:43 pm on Oct 20, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts: 63
votes: 0


Actually I would like to get my hands on a script that logs "everything about everybody"

Basically

ip
referring url
target url
date

for all activity on my site.

Anyone who knows where I can get such a script?

(I cant do PHP my self unfortunately)

3:17 pm on Oct 20, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 21, 2003
posts:159
votes: 0


Is it possible to install a script of some kind that logs the google bots activity on my site?
One thing I would like to know is the exact times the bot indexes my index page.

The

tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log
as suggested by trillianjedi, will only work on a Linux system!

Here a small PHP scriped that will do the job.

The script will send an email when Google hits your site.
(if you want you can also write this to a database, but than you will need a bit more code)

It will do a host name lookup (to see if it's really comming from the Google network), and if so send you an email on what page the hit was on.
The time of the email will give you the exact times.

You will need to add (or include) this code on every page.

<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $HTTP_SERVER_VARS["REMOTE_ADDR"];
$host = @gethostbyaddr($HTTP_SERVER_VARS["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

[edited by: Noel at 3:21 pm (utc) on Oct. 20, 2006]

3:55 pm on Oct 20, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 28, 2006
posts: 63
votes: 0


Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?
4:36 pm on Oct 20, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


if you want php advice then go to the PHP forum [webmasterworld.com]

but

$HTTP_SERVER_VARS is deprecated, use $_SERVER

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

also, everything you are asking for would be available if you had raw logs

just get a new host, look at how much time and therefore money this host is already costing you.

8:37 pm on Oct 21, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 21, 2003
posts:159
votes: 0


Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

Sure.. Only add (or include) it on/to your index.php

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

jatar_k is correct with this. The script WILL DO a host lookup everytime to see if it's really from Google. If you have access to a database with all the "google" IP's it will be way faster!

$HTTP_SERVER_VARS is deprecated, use $_SERVER

Again true! See the updated script below!


<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($_SERVER["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($_SERVER["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $_SERVER["REMOTE_ADDR"];
$host = @gethostbyaddr($_SERVER["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

12:20 pm on Oct 30, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:May 19, 2003
posts:70
votes: 0


The new google tool in [google.com...] will allow you to see a graph of when googlebot visited and how much work it did.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members