Welcome to WebmasterWorld Guest from 54.226.194.180

Forum Moderators: open

Message Too Old, No Replies

Phrases to look for in a spider trap

     

nickc001

10:42 am on May 16, 2002 (gmt 0)

10+ Year Member



I have built a very basic spider trap that records possilbe spiders into a tbl_possible_spiders table in my cloaking database.

I then carry out a class C domain check to see whether any known spiders have the same class C address and if they do I add it to the tbl_known_spiders table.

At the moment the only criteria I am using to add a possible spider to my possible_spiders table is the User Agent not being "mozilla".

Q1) What other common phrases in the user agent can I use to dismiss a request as a possible spider. i.e. is there a standard term that Opera and other common browsers use in their User Agent tag?

Q2) Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent. i.e. Would class C check mean the page took too long to load? Are there any techniques that could be applied without causing the page to load slowly?

Q3) If the spider request isn't a known spider and gets delivered the non-optimised page what kind of negative effect will this have if another of the engines spiders which was a known spider indexed the optimised page a few minutes before

hope all that makes sense,

thanks,

Nick

johnhamman

12:37 pm on May 16, 2002 (gmt 0)

10+ Year Member



What language are you programing this in?
Class C check probably won't hurt your load time at all also.

nickc001

3:26 pm on May 16, 2002 (gmt 0)

10+ Year Member



I am using ASP

Everyman

3:32 pm on May 16, 2002 (gmt 0)



Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent.

Not in user agent, but take a look at HTTP_FROM

Most major spiders use this variable to show their e-mail address. But for normal users, 99.99 percent of them surf with browsers that are properly configured NOT to use this variable.

volatilegx

4:45 pm on May 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Everyman

Do you have a list of spiders that use HTTP_FROM ?

johnhamman

7:00 pm on May 17, 2002 (gmt 0)

10+ Year Member



Another thing you may want to do for a performance increase is to program the spider trap into a .dll file,if you have that available. I am doing the same except in asp.net.
john
 

Featured Threads

Hot Threads This Week

Hot Threads This Month