Forum Moderators: open
I've mentioned before that I feel there is a need for some type of tutorial regarding Spider Traps [webmasterworld.com].
Doing a site search a few minutes ago, there are additional posts about this subject.
In Spider Trap needs GOOD Spider list [webmasterworld.com] - Dreamquick [webmasterworld.com] brings up an important point...
First I'm probably going to ask a *very* stupid question, but if you want to identify good/bad spiders (presumably by behaviour) then surely the bad ones are bad irregardless of whether they are spambots or just a name-brand SE having a bad day?
Phrases to look for in a spider trap [webmasterworld.com] nickc001 mentions "class C domain check" to which there are a couple of unanswered questions that could use addressing. Especially for those of us who know next to nothing about class C domain checks or "ASP" and "HTTP_FROM".
Skimmer_lid raises a specific point about Thunderstone spider violating robots.txt? [webmasterworld.com] that could be important regarding other spiders, not so much Thunderstone.
In SpiderTrap Caught these IPs [webmasterworld.com] prozz provides a listing of spiders in need of banning. There are also several others who posted relevant material well worth perusing.
Reading through Google WAP Proxy and robots.txt [webmasterworld.com] we can see how a spider trap may end up banning a cell phone user as well as a solution.
sinyala1 also posses an interesting concept in Combination User Agent / IP
How does it work? [webmasterworld.com].
The point to all this is not entirely about spider traps!
I'm issuing a Quorum Call to look into the possibility of instituting some methodology of:
and
Many of our sites are being bludgeoned by malconfigured, malicious or otherwise stupid [webmasterworld.com] spiders that download, scrap or otherwise rape us of our content.
We must take a stand.
How 'bout a new forum dedicated specifically to spider traps?
If not, how 'bout some notification system?
I realize there are inherent dangers to this concept, but the continued abuse by spiders must be addressed.
Now that I've opened a big old can a worms: Anyone?
I will not be back until later in the day as I have some appointments.
Thank You.
Pendanticist.
A good port of call for spider questions is generally "Search Engine Spider Identification" as aside from the SE stuff a lot of other "might be an SE, might not - help!" type topics appear.
If I spot something odd in my logs, internal tracking or spider trap I'll often post details there so that I can learn more about it and so that anyone else who sees it can benefit from that discussion.
At the moment the spider trap stuff doesn't really have an absolute and obvious home (I was pondering posting a work-in-progress document about spider-traps last night and I was torn between "webmaster general" and "spider identification" as to where it would go).
- Tony
Additionally, for those who might wonder, this thread was moved here. I posted in Webmaster General.
Please, stay on topic. I do not wish this thread to become polluted with requests for other forums.
Thank You. :)
Pendanticist.
Spider traps and lists of spiders IMNSHO go hand in hand.
I also scared stiff of using .htaccess to disallow them because the first time I used an htaccess I had to get the hosting service to clean up the mess I made when my site couldn't be accessed at all! :( So now I only use htaccess to 301 redirect changed files.