HarryM

msg:3124049 | 1:08 pm on Oct 17, 2006 (gmt 0) |
Some time ago I used to use a php browser sniffer script which would send me an email everytime a Googlebot visited a page. But I'm sure there are more sophisticated options available. You could also download the raw logs and run a script (Perl or suchlike) to pull out the Googlebot records.
|
trillianjedi

msg:3124054 | 1:19 pm on Oct 17, 2006 (gmt 0) |
Quick and dirty from the commandline:- tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log Run that in a screen to leave it going in the background. TJ
|
djingo

msg:3124107 | 2:20 pm on Oct 17, 2006 (gmt 0) |
Trillian - Where do I have to run that line - in the command promt? It says windows dont recognise the command tail... Harry - My ISP wont let me acess the raw logfiles [edited by: djingo at 2:43 pm (utc) on Oct. 17, 2006]
|
netmeg

msg:3124135 | 2:49 pm on Oct 17, 2006 (gmt 0) |
| Where do I have to run that line - in the command promt? It says windows dont recognise the command tail... |
| You'd have to have shell access to the server.
|
djingo

msg:3124147 | 2:57 pm on Oct 17, 2006 (gmt 0) |
Its not possible with my ISP.
|
jatar_k

msg:3124315 | 4:50 pm on Oct 17, 2006 (gmt 0) |
a host with no raw log access and no shell access is a bad host raw log access is necessary
|
djingo

msg:3124820 | 11:24 pm on Oct 17, 2006 (gmt 0) |
I am going to get a new ISP when my subscription runs out. I found a little script though that tracks raw bot activity: [psi-tech.com.au...] and it seems to do the job effectively.
|
webdudek

msg:3125392 | 11:17 am on Oct 18, 2006 (gmt 0) |
you can write a 5 lines script that checks if $_SERVER['HTTP_USER_AGENT'] contains 'Googlebot'
|
jatar_k

msg:3125925 | 5:09 pm on Oct 18, 2006 (gmt 0) |
those are pretty easy to write you can also do them by user agent as well as ip, just in case there is a sneaky one around ;) though you will get false info with people spoofing the googlebot user agent, simple to switch in some browsers or when they are using cURL
|
djingo

msg:3128452 | 1:43 pm on Oct 20, 2006 (gmt 0) |
Actually I would like to get my hands on a script that logs "everything about everybody" Basically ip referring url target url date for all activity on my site. Anyone who knows where I can get such a script? (I cant do PHP my self unfortunately)
|
Noel

msg:3128650 | 3:17 pm on Oct 20, 2006 (gmt 0) |
Is it possible to install a script of some kind that logs the google bots activity on my site? One thing I would like to know is the exact times the bot indexes my index page. |
| The | tail -f path/to/httpd/access.log ¦ grep "Mediapartners-Google" >> /home/djingo/adsense-bot.log |
| as suggested by trillianjedi, will only work on a Linux system! Here a small PHP scriped that will do the job. The script will send an email when Google hits your site. (if you want you can also write this to a database, but than you will need a bit more code) It will do a host name lookup (to see if it's really comming from the Google network), and if so send you an email on what page the hit was on. The time of the email will give you the exact times. You will need to add (or include) this code on every page. <?php // Lets send an email when Google hits the site :-) // MAKE SURE that you set the "your@emailaddress" part if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]!= ""){ $host = @gethostbyaddr($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]); }else{ $IP = $HTTP_SERVER_VARS["REMOTE_ADDR"]; $host = @gethostbyaddr($HTTP_SERVER_VARS["REMOTE_ADDR"]); } if(eregi("googlebot",$host)) { $emailaddress = "your@emailaddress"; mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI'].""); } ?> [edited by: Noel at 3:21 pm (utc) on Oct. 20, 2006]
|
djingo

msg:3128726 | 3:55 pm on Oct 20, 2006 (gmt 0) |
Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit?
|
jatar_k

msg:3128795 | 4:36 pm on Oct 20, 2006 (gmt 0) |
if you want php advice then go to the PHP forum [webmasterworld.com] but $HTTP_SERVER_VARS is deprecated, use $_SERVER also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something. also, everything you are asking for would be available if you had raw logs just get a new host, look at how much time and therefore money this host is already costing you.
|
Noel

msg:3130146 | 8:37 pm on Oct 21, 2006 (gmt 0) |
| Noel looks cool thanks. Is it possible to alter the script so it only sends an email if its the index page thats hit? |
| Sure.. Only add (or include) it on/to your index.php | also if you want speed then you should probably stay away from gethostbyaddr and use a iplist in a db or something. |
| jatar_k is correct with this. The script WILL DO a host lookup everytime to see if it's really from Google. If you have access to a database with all the "google" IP's it will be way faster! | $HTTP_SERVER_VARS is deprecated, use $_SERVER |
| Again true! See the updated script below! <?php // Lets send an email when Google hits the site :-) // MAKE SURE that you set the "your@emailaddress" part if ($_SERVER["HTTP_X_FORWARDED_FOR"]!= ""){ $host = @gethostbyaddr($_SERVER["HTTP_X_FORWARDED_FOR"]); }else{ $IP = $_SERVER["REMOTE_ADDR"]; $host = @gethostbyaddr($_SERVER["REMOTE_ADDR"]); } if(eregi("googlebot",$host)) { $emailaddress = "your@emailaddress"; mail("".$emailaddress."", "Google detected", "Host name is: " . $host . "\n page hit was on: " . $_SERVER['REQUEST_URI'].""); } ?> |
|
|
justguy

msg:3139639 | 12:20 pm on Oct 30, 2006 (gmt 0) |
The new google tool in https://www.google.com/webmasters/sitemaps will allow you to see a graph of when googlebot visited and how much work it did.
|
|