Forum Moderators: open

Message Too Old, No Replies

Create a Googlebot log with perl and SSI?

         

Jesse_Smith

8:04 pm on Feb 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there some way to use SSI and a perl script to make a log that only includes the Googlebot?

Damian

8:18 pm on Feb 11, 2003 (gmt 0)

10+ Year Member




Here's one way:

Find a tracking script that logs through ssi.

In the file which you include with the ssi you add the following line or something similar:

if ($ENV{'HTTP_USER_AGENT'} =~ /googlebot/) {

# do whatever the script usually does.
# ..writing the visitor data to a log

} # end if

WebRankInfo

8:43 pm on Feb 11, 2003 (gmt 0)

10+ Year Member



If you're interested you could try GoogleStats. I developped this open source free application...

Jesse_Smith

9:17 pm on Feb 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. I might use that for my vBulletin domain.

For perl, would this work?

$database = "/home/sites/site21/web/cgi-local/google.txt";

if ($ENV{'HTTP_USER_AGENT'} =~ /googlebot/) {
open (DATABASE,">>$database");
print DATABASE "REMOTE_ADDR - HTTP_USER_AGENT\n";
close(DATABASE);
}

This is the exact same code that I used for a contest to get visitors E-mail addresses. I put this code in the perl script that I use for the top of my page money maker text links area.

Update: Uggg, in the gogole.txt file it shows up as

REMOTE_ADDR - HTTP_USER_AGENT
REMOTE_ADDR - HTTP_USER_AGENT

Now I'll try...

print DATABASE " $ENV{'REMOTE_ADDR'} - $ENV{'HTTP_USER_AGENT'}\n";

xlcus

9:27 pm on Feb 11, 2003 (gmt 0)

10+ Year Member



Is there some way to use SSI and a perl script to make a log that only includes the Googlebot?

Are you using Apache? And do you have access to it's configuration?

If so, you can easily make a separate Googlebot log without any scripting. Here's a section of my Apache configuration file with the important bits in bold...

<VirtualHost 192.168.0.100>
ServerName example.com
ServerAlias example.com *.example.com
DocumentRoot /sites/example.com/htdocs
ErrorLog /sites/example.com/logs/error_log
[b]SetEnvIfNoCase User-Agent Googlebot isrobot=true[/b]
SetEnvIf Remote_Addr "192\.168\.0" dontlog
[b]SetEnvIf isrobot true dontlog
CustomLog /sites/example.com/logs/access_log combined env=!dontlog
CustomLog /sites/example.com/logs/robot_log combined env=isrobot[/b]
</VirtualHost>

hetzeld

6:17 am on Feb 12, 2003 (gmt 0)

10+ Year Member



Hi xlcus,
Isn't the trivial command "grep -vi googlebot < path_to_full_logfile > google_logfile" giving the same result, although not in realtime, or am I missing something?

Dan

hetzeld

6:28 am on Feb 12, 2003 (gmt 0)

10+ Year Member



Hi all,

For those of you running under a Apache/PHP/mySQL environment, GoogleStats would do the job. It's an open source product available in english and french.

Dan

PS: I'm not sure it would be against the TOS to post the URL but, once again, Google is your friend with that single word search term. You may even use the "I'm feeling lucky" button for once ;)

littleman

8:40 am on Feb 12, 2003 (gmt 0)



It's great to see you contributing to the webmaster community with an open source project. Now that we are all aware of your project, please respect the board and do not over promote it here.

pfritz

9:57 am on Feb 12, 2003 (gmt 0)

10+ Year Member



All my log files have *.log extention.
I use the similiar string to parse logs:
cat *.log ¦ grep oogleb >crawl
or
cat *.log ¦ grep 216.239.46 >deepcrawl

xlcus

10:15 am on Feb 12, 2003 (gmt 0)

10+ Year Member



Isn't the trivial command "grep -vi googlebot < path_to_full_logfile > google_logfile" giving the same result

Yeah, but this can take some time with really big log files (eg. 150meg+) and with a separate real time log file you can "tail -f" the file to monitor the Google requests as they come in.

WebRankInfo

10:16 pm on Feb 12, 2003 (gmt 0)

10+ Year Member



littleman, did you know that hetzeld wasn't the creator of GoogleStats?
He's an user and even didn't give the URL. What's the problem?

littleman

10:40 pm on Feb 12, 2003 (gmt 0)



please check your stickymail