http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 38.103.63.16
register, login, search, glossary, subscribe, help, library, PubCon, announcements , recent posts, unanswered posts
Subscribe and Support WebmasterWorld
Home / Forums Index / The Webmaster World / Website Analytics - Tracking and Logging
Forum Library : Charter : Moderators: Receptional

Website Analytics - Tracking and Logging

  
GoogleBot log tool?
Log exact times of indexing activity?
djingo


#:3124034
 12:53 pm on Oct. 17, 2006 (utc 0)

Is it possible to install a script of some kind that logs the google bots activity on my site?

One thing I would like to know is the exact times the bot indexes my index page.

HarryM


#:3124049
 1:08 pm on Oct. 17, 2006 (utc 0)

Some time ago I used to use a php browser sniffer script which would send me an email everytime a Googlebot visited a page. But I'm sure there are more sophisticated options available.

You could also download the raw logs and run a script (Perl or suchlike) to pull out the Googlebot records.

trillianjedi


#:3124054
 1:19 pm on Oct. 17, 2006 (utc 0)

Quick and dirty from the commandline:-

tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log

Run that in a screen to leave it going in the background.

TJ

djingo


#:3124107
 2:20 pm on Oct. 17, 2006 (utc 0)

Trillian - Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

Harry - My ISP wont let me acess the raw logfiles

[edited by: djingo at 2:43 pm (utc) on Oct. 17, 2006]

netmeg


#:3124135
 2:49 pm on Oct. 17, 2006 (utc 0)

Where do I have to run that line - in the command promt? It says windows dont recognise the command tail...

You'd have to have shell access to the server.

djingo


#:3124147
 2:57 pm on Oct. 17, 2006 (utc 0)

Its not possible with my ISP.

jatar_k


#:3124315
 4:50 pm on Oct. 17, 2006 (utc 0)

a host with no raw log access and no shell access is a bad host

raw log access is necessary

djingo


#:3124820
 11:24 pm on Oct. 17, 2006 (utc 0)

I am going to get a new ISP when my subscription runs out.

I found a little script though that tracks raw bot activity: http://www.psi-tech.com.au/psi_sewatcher.html and it seems to do the job effectively.

webdudek


#:3125392
 11:17 am on Oct. 18, 2006 (utc 0)

you can write a 5 lines script that checks if $_SERVER['HTTP_USER_AGENT'] contains 'Googlebot'

jatar_k


#:3125925
 5:09 pm on Oct. 18, 2006 (utc 0)

those are pretty easy to write

you can also do them by user agent as well as ip, just in case there is a sneaky one around ;)

though you will get false info with people spoofing the googlebot user agent, simple to switch in some browsers or when they are using cURL

djingo


#:3128452
 1:43 pm on Oct. 20, 2006 (utc 0)

Actually I would like to get my hands on a script that logs "everything about everybody"

Basically

ip
referring url
target url
date

for all activity on my site.

Anyone who knows where I can get such a script?

(I cant do PHP my self unfortunately)

Noel


#:3128650
 3:17 pm on Oct. 20, 2006 (utc 0)

Is it possible to install a script of some kind that logs the google bots activity on my site?
One thing I would like to know is the exact times the bot indexes my index page.

The
tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log
as suggested by trillianjedi, will only work on a Linux system!

Here a small PHP scriped that will do the job.

The script will send an email when Google hits your site.
(if you want you can also write this to a database, but than you will need a bit more code)

It will do a host name lookup (to see if it's really comming from the Google network), and if so send you an email on what page the hit was on.
The time of the email will give you the exact times.

You will need to add (or include) this code on every page.

<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $HTTP_SERVER_VARS["REMOTE_ADDR"];
$host = @gethostbyaddr($HTTP_SERVER_VARS["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>

[edited by: Noel at 3:21 pm (utc) on Oct. 20, 2006]

djingo


#:3128726
 3:55 pm on Oct. 20, 2006 (utc 0)

Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

jatar_k


#:3128795
 4:36 pm on Oct. 20, 2006 (utc 0)

if you want php advice then go to the PHP forum

but

$HTTP_SERVER_VARS is deprecated, use $_SERVER

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

also, everything you are asking for would be available if you had raw logs

just get a new host, look at how much time and therefore money this host is already costing you.

crawltrack


#:3129956
 4:58 pm on Oct. 21, 2006 (utc 0)

Hy,

To track Googlebot and many other bots you can use CrawlTrack.

Yous will see, it easy to install and you will know exactly when and which page of your site has been visited by Googlebot.

Noel


#:3130146
 8:37 pm on Oct. 21, 2006 (utc 0)

Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?

Sure.. Only add (or include) it on/to your index.php

also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something.

jatar_k is correct with this. The script WILL DO a host lookup everytime to see if it's really from Google. If you have access to a database with all the "google" IP's it will be way faster!

$HTTP_SERVER_VARS is deprecated, use $_SERVER

Again true! See the updated script below!


<?php
// Lets send an email when Google hits the site :-)
// MAKE SURE that you set the "your@emailaddress" part
if ($_SERVER["HTTP_X_FORWARDED_FOR"]!= ""){
$host = @gethostbyaddr($_SERVER["HTTP_X_FORWARDED_FOR"]);
}else{
$IP = $_SERVER["REMOTE_ADDR"];
$host = @gethostbyaddr($_SERVER["REMOTE_ADDR"]);
}

if(eregi("googlebot",$host))
{
$emailaddress = "your@emailaddress";
mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI']."");
}
?>


justguy


#:3139639
 12:20 pm on Oct. 30, 2006 (utc 0)

The new google tool in https://www.google.com/webmasters/sitemaps will allow you to see a graph of when googlebot visited and how much work it did.

 

Home / Forums Index / The Webmaster World / Website Analytics - Tracking and Logging
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
WebmasterWorld ® and PubCon ® are a Registered Trademarks of WebmasterWorld Inc.
© WebmasterWorld Inc. / SearchEngineWorld 1996-2008 all rights reserved