homepage Welcome to WebmasterWorld Guest from 54.226.0.225
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Phrases to look for in a spider trap
nickc001




msg:675993
 10:42 am on May 16, 2002 (gmt 0)

I have built a very basic spider trap that records possilbe spiders into a tbl_possible_spiders table in my cloaking database.

I then carry out a class C domain check to see whether any known spiders have the same class C address and if they do I add it to the tbl_known_spiders table.

At the moment the only criteria I am using to add a possible spider to my possible_spiders table is the User Agent not being "mozilla".

Q1) What other common phrases in the user agent can I use to dismiss a request as a possible spider. i.e. is there a standard term that Opera and other common browsers use in their User Agent tag?

Q2) Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent. i.e. Would class C check mean the page took too long to load? Are there any techniques that could be applied without causing the page to load slowly?

Q3) If the spider request isn't a known spider and gets delivered the non-optimised page what kind of negative effect will this have if another of the engines spiders which was a known spider indexed the optimised page a few minutes before

hope all that makes sense,

thanks,

Nick

 

johnhamman




msg:675994
 12:37 pm on May 16, 2002 (gmt 0)

What language are you programing this in?
Class C check probably won't hurt your load time at all also.

nickc001




msg:675995
 3:26 pm on May 16, 2002 (gmt 0)

I am using ASP

Everyman




msg:675996
 3:32 pm on May 16, 2002 (gmt 0)

Are there any other criteria i can check when the page is requested that might help identify a possible spider request apart from user agent.

Not in user agent, but take a look at HTTP_FROM

Most major spiders use this variable to show their e-mail address. But for normal users, 99.99 percent of them surf with browsers that are properly configured NOT to use this variable.

volatilegx




msg:675997
 4:45 pm on May 17, 2002 (gmt 0)

Everyman

Do you have a list of spiders that use HTTP_FROM ?

johnhamman




msg:675998
 7:00 pm on May 17, 2002 (gmt 0)

Another thing you may want to do for a performance increase is to program the spider trap into a .dll file,if you have that available. I am doing the same except in asp.net.
john

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved