I have built a very basic spider trap that records possilbe spiders into a tbl_possible_spiders table in my cloaking database. I then carry out a class C domain check to see whether any known spiders have the same class C address and if they do I add it to the tbl_known_spiders table.
At the moment the only criteria I am using to add a possible spider to my possible_spiders table is the User Agent not being "mozilla".
Q1) What other common phrases in the user agent can I use to dismiss a request as a possible spider. i.e. is there a standard term that Opera and other common browsers use in their User Agent tag?
Q2) Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent. i.e. Would class C check mean the page took too long to load? Are there any techniques that could be applied without causing the page to load slowly?
Q3) If the spider request isn't a known spider and gets delivered the non-optimised page what kind of negative effect will this have if another of the engines spiders which was a known spider indexed the optimised page a few minutes before
hope all that makes sense,
thanks,
Nick