homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
GoogleBot log tool?
Log exact times of indexing activity?
djingo




msg:3124034
 12:53 pm on Oct 17, 2006 (gmt 0)

Is it possible to install a script of some kind that logs the google bots activity on my site?

One thing I would like to know is the exact times the bot indexes my index page.

 

HarryM




msg:3124049
 1:08 pm on Oct 17, 2006 (gmt 0)

Some time ago I used to use a php browser sniffer script which would send me an email everytime a Googlebot visited a page. But I'm sure there are more sophisticated options available.

You could also download the raw logs and run a script (Perl or suchlike) to pull out the Googlebot records.

trillianjedi




msg:3124054
 1:19 pm on Oct 17, 2006 (gmt 0)

Quick and dirty from the commandline:-

tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log

Run that in a screen to leave it going in the background.

TJ

djingo




msg:3124107
 2:20 pm on Oct 17, 2006 (gmt 0)

Trillian - Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

Harry - My ISP wont let me acess the raw logfiles

[edited by: djingo at 2:43 pm (utc) on Oct. 17, 2006]

netmeg




msg:3124135
 2:49 pm on Oct 17, 2006 (gmt 0)

Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

You'd have to have shell access to the server.

djingo




msg:3124147
 2:57 pm on Oct 17, 2006 (gmt 0)

Its not possible with my ISP.

jatar_k




msg:3124315
 4:50 pm on Oct 17, 2006 (gmt 0)

a host with no raw log access and no shell access is a bad host

raw log access is necessary

djingo




msg:3124820
 11:24 pm on Oct 17, 2006 (gmt 0)

I am going to get a new ISP when my subscription runs out.

I found a little script though that tracks raw bot activity: [psi-tech.com.au...] and it seems to do the job effectively.

webdudek




msg:3125392
 11:17 am on Oct 18, 2006 (gmt 0)

you can write a 5 lines script that checks if $_SERVER['HTTP_USER_AGENT'] contains 'Googlebot'

jatar_k




msg:3125925
 5:09 pm on Oct 18, 2006 (gmt 0)

those are pretty easy to write

you can also do them by user agent as well as ip, just in case there is a sneaky one around ;)

though you will get false info with people spoofing the googlebot user agent, simple to switch in some browsers or when they are using cURL

djingo




msg:3128452
 1:43 pm on Oct 20, 2006 (gmt 0)

Actually I would like to get my hands on a script that logs "everything about everybody"

Basically

ip
referring url
target url
date

for all activity on my site.

Anyone who knows where I can get such a script?

(I cant do PHP my self unfortunately)

Noel




msg:3128650
 3:17 pm on Oct 20, 2006 (gmt 0)

Is it possible to install a script of some kind that logs the google bots activity on my site?
One thing I would like to know is the exact times the bot indexes my index page.

The
tail -f path/to/httpd/access.log grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log
as suggested by trillianjedi, will only work on a Linux system!

Here a small PHP scriped that will do the job.

The script will send an email when Google hits your site.
(if you want you can also write this to a database, but than you will need a bit more code)

It will do a host name lookup (to see if it's really comming from the Google network), and if so send you an email on what page the hit was on.
The time of the email will give you the exact times.

You will need to add (or include) this code on every page.

<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $HTTP_SERVER_VARS["REMOTE_ADDR"];
$host = @gethostbyaddr($HTTP_SERVER_VARS["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

[edited by: Noel at 3:21 pm (utc) on Oct. 20, 2006]

djingo




msg:3128726
 3:55 pm on Oct 20, 2006 (gmt 0)

Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

jatar_k




msg:3128795
 4:36 pm on Oct 20, 2006 (gmt 0)

if you want php advice then go to the PHP forum [webmasterworld.com]

but

$HTTP_SERVER_VARS is deprecated, use $_SERVER

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

also, everything you are asking for would be available if you had raw logs

just get a new host, look at how much time and therefore money this host is already costing you.

Noel




msg:3130146
 8:37 pm on Oct 21, 2006 (gmt 0)

Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

Sure.. Only add (or include) it on/to your index.php

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

jatar_k is correct with this. The script WILL DO a host lookup everytime to see if it's really from Google. If you have access to a database with all the "google" IP's it will be way faster!

$HTTP_SERVER_VARS is deprecated, use $_SERVER

Again true! See the updated script below!


<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($_SERVER["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($_SERVER["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $_SERVER["REMOTE_ADDR"];
$host = @gethostbyaddr($_SERVER["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>


justguy




msg:3139639
 12:20 pm on Oct 30, 2006 (gmt 0)

The new google tool in https://www.google.com/webmasters/sitemaps will allow you to see a graph of when googlebot visited and how much work it did.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved