Forum Moderators: open

Message Too Old, No Replies

New Forum Proposal: Spider Traps

         

pendanticist

10:02 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Greetings,

I've mentioned before that I feel there is a need for some type of tutorial regarding Spider Traps [webmasterworld.com].

Doing a site search a few minutes ago, there are additional posts about this subject.

In Spider Trap needs GOOD Spider list [webmasterworld.com] - Dreamquick [webmasterworld.com] brings up an important point...

First I'm probably going to ask a *very* stupid question, but if you want to identify good/bad spiders (presumably by behaviour) then surely the bad ones are bad irregardless of whether they are spambots or just a name-brand SE having a bad day?


Phrases to look for in a spider trap
[webmasterworld.com] nickc001 mentions "class C domain check" to which there are a couple of unanswered questions that could use addressing. Especially for those of us who know next to nothing about class C domain checks or "ASP" and "HTTP_FROM".

Skimmer_lid raises a specific point about Thunderstone spider violating robots.txt? [webmasterworld.com] that could be important regarding other spiders, not so much Thunderstone.

In SpiderTrap Caught these IPs [webmasterworld.com] prozz provides a listing of spiders in need of banning. There are also several others who posted relevant material well worth perusing.

Reading through Google WAP Proxy and robots.txt [webmasterworld.com] we can see how a spider trap may end up banning a cell phone user as well as a solution.

sinyala1 also posses an interesting concept in Combination User Agent / IP
How does it work?
[webmasterworld.com].

The point to all this is not entirely about spider traps!

I'm issuing a Quorum Call to look into the possibility of instituting some methodology of:

  • Notifying everyone about those spiders in need of trapping

    and

  • The building of spider traps.
  • Many of our sites are being bludgeoned by malconfigured, malicious or otherwise stupid [webmasterworld.com] spiders that download, scrap or otherwise rape us of our content.

    We must take a stand.

    How 'bout a new forum dedicated specifically to spider traps?

    If not, how 'bout some notification system?

    I realize there are inherent dangers to this concept, but the continued abuse by spiders must be addressed.

    Now that I've opened a big old can a worms: Anyone?

    I will not be back until later in the day as I have some appointments.

    Thank You.

    Pendanticist.

    fathom

    12:04 pm on Jan 24, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Not to take away from pendanticist's spiders

    New Forum Proposal - WebmasterWorld Community Center should have

    Quote of the Day chronological - since Brett doesn't have much else to do. heehee ;)

    Dreamquick

    2:28 pm on Jan 24, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Interesting idea but I'm very much on the fence about whether it would work or not...

    A good port of call for spider questions is generally "Search Engine Spider Identification" as aside from the SE stuff a lot of other "might be an SE, might not - help!" type topics appear.

    If I spot something odd in my logs, internal tracking or spider trap I'll often post details there so that I can learn more about it and so that anyone else who sees it can benefit from that discussion.

    At the moment the spider trap stuff doesn't really have an absolute and obvious home (I was pondering posting a work-in-progress document about spider-traps last night and I was torn between "webmaster general" and "spider identification" as to where it would go).

    - Tony

    pendanticist

    2:36 pm on Jan 24, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The original Title was: Quorum Call not "New Forum Proposal: Spider Traps"

    Additionally, for those who might wonder, this thread was moved here. I posted in Webmaster General.

    Please, stay on topic. I do not wish this thread to become polluted with requests for other forums.

    Thank You. :)

    Pendanticist.

    Check6

    2:46 pm on Jan 24, 2003 (gmt 0)

    10+ Year Member



    I'm not sure how many people are aware of it but the main audit bureau's keep lists. The ABCe in the UK and their equivalent in the USA do, as I know they share them and it's a requirement of the audit process that the list is applied to the log files as robotic traffic isn't user traffic in audit terms.

    Spider traps and lists of spiders IMNSHO go hand in hand.

    amznVibe

    3:29 pm on Jan 24, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I applaud a proposal for a new forum. Would keep it much easier to find and follow specific threads on such important subject matter. I for one never know whether to post my questions/ideas into the spider identification forum or the cgi/perl scripting forum (and neither is quite right).

    nancyb

    3:51 pm on Jan 24, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Oh, yes! I haven't a clue how to write a spider trap and although I follow all the posts in the Search Engine Spider Ident forum, I'm still playing "catch up" in order to learn how to trap some of these bad bots. I can see from my logs that some come an grab hundreds of pages a day, over an over and I now this is not good.

    I also scared stiff of using .htaccess to disallow them because the first time I used an htaccess I had to get the hosting service to clean up the mess I made when my site couldn't be accessed at all! :( So now I only use htaccess to 301 redirect changed files.

    Brett_Tabke

    5:10 pm on Jan 26, 2003 (gmt 0)

    WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



    I think the spider id forum can handle those types of questions pendanticist. It might just be a case of expanding the scope a bit more.

    pendanticist

    1:45 am on Jan 30, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    It might just be a case of expanding the scope a bit more.

    Perhaps you could elaborate?

    Pendanticist.