Forum Moderators: phranque

Message Too Old, No Replies

How to track bot visits to my website?

         

yigber

9:35 am on Nov 6, 2007 (gmt 0)

10+ Year Member



From some odd reasons I would like to get an alert by email when a bot like Googlebot visits my site. A simple way to do that is to put a RewriteRule for /robots.txt in .htaccess that redirects to a php script. <snip>

This method sucks from several reasons: It depends on GET to robots.txt, can't know what actual url is visited, etc.

I was wondering if there's a better way and how it can be implemented on a shared-hosting account running apache.

Thanks!

[edited by: jdMorgan at 1:03 pm (utc) on Nov. 6, 2007]
[edit reason] No personal URLs, please. See TOS. [/edit]

jdMorgan

1:09 pm on Nov 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're on shared hosting the solutions are limited. Include a PHP snippet on each page, or use SSI on each page to invoke a PERL script when each page is fetched. The script can then log the access and send you an e-mail. In this model, the pages are the "wrappers" for the script.

Alternatively, you could use a script to serve all requests from your site, after logging and sending an e-mail. In this case, the script is the "wrapper" for the pages.

In either case, you may want to take steps so that an e-mail is not sent for each and every page request, and additionally for the second method, for each image requested from your server. This will likely involve a timer and a dynamic list of recently-used robot user-agents or IP addresses.

Jim

yigber

2:55 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



Thanks jdMorgan.

To implement a "wrapper" will I have to put a line in every .htaccess in every directory in the site? Is there a way to do this with several lines of code in one place?

Thanks again.

jdMorgan

4:53 pm on Nov 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



.htaccess has nothing to do with either proposed solution because you stated that you're on shared hosting. This precludes using a RewriteMap to execute a script before the server enters the content-handling phase of the API, as you could do if you had full server config access.

You must include a PHP or SSI script call in each page for which you wish to log access/send an e-mail. If using PHP, you might also be able to use PHP's auto-prepend function to include the script call on every PHP page.

Jim

yigber

5:34 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



Both Bluehost and HostGator give me .htaccess.
The solution I used [to rewrite robots.txt requests to a script that sends me e-mail] is based on .htaccess and works ok, but it can be done better and that's what I'm trying to figure out.

The PHP/SSI solutions you mentioned, requires modifying each served page!

[edited by: jdMorgan at 7:00 pm (utc) on Nov. 6, 2007]
[edit reason] No personal URLs, please. See Terms Of Service. [/edit]

jdMorgan

10:26 pm on Nov 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In total, I proposed three solutions above, one of which require modifying the pages. The other two do not.

Jim