Forum Moderators: open
I then carry out a class C domain check to see whether any known spiders have the same class C address and if they do I add it to the tbl_known_spiders table.
At the moment the only criteria I am using to add a possible spider to my possible_spiders table is the User Agent not being "mozilla".
Q1) What other common phrases in the user agent can I use to dismiss a request as a possible spider. i.e. is there a standard term that Opera and other common browsers use in their User Agent tag?
Q2) Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent. i.e. Would class C check mean the page took too long to load? Are there any techniques that could be applied without causing the page to load slowly?
Q3) If the spider request isn't a known spider and gets delivered the non-optimised page what kind of negative effect will this have if another of the engines spiders which was a known spider indexed the optimised page a few minutes before
hope all that makes sense,
thanks,
Nick
Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent.
Not in user agent, but take a look at HTTP_FROM
Most major spiders use this variable to show their e-mail address. But for normal users, 99.99 percent of them surf with browsers that are properly configured NOT to use this variable.