Forum Moderators: phranque
We have a directory where we setting it up to charge advertisers on a pay per click basis. We have a cgi program that records the clickthroughs from a client link on our directory to the client website.
My question is whether the variable:
$ENV{HTTP_USER_AGENT}
is the one that we should use to determine which clicks are from spiders/bots and not from actual directory users?
In other words would we need to set up a list of these spider/bot user agent names in order to identify those that we don't want to record as paid click-throughs? (So when the user agent matches one of those we set up in our list, the cgi program would still let the spider follow the link to the website, but it would not record it in our stats for the client).
I mainly want to confirm that we should use the $ENV{HTTP_USER_AGENT} to identify the spiders/bots, or whether there is another, better way to do this so that clients don't get charged for visiting spiders/bots that come from our directory to their sites.
Spambots and such come to mind. You could try using a bot trap to prevent those from getting too far into the directory. It will not be 100%. I can't think of anything that would be.
Maybe some one else will have additional input.