| 4:04 pm on Apr 29, 2008 (gmt 0)|
Thanks for the suggestion.
Crawler or Spider, or robot (bot), does that really matter to those that know?
| 8:45 pm on Apr 29, 2008 (gmt 0)|
What I meant was that the title is too limiting and needs to include crawlers that are not originating from search engines.
| 7:54 am on May 1, 2008 (gmt 0)|
Good point, and becoming more of a problem every day.
| 4:15 pm on May 1, 2008 (gmt 0)|
It is a good point. I think a lot of questions, including ones in other forums like Webmaster General and sometimes PHP, deal with things like honeypots, ban lists, catching bad bots and such.
Might not be a bad idea to lump the good guys with the bad guys in one set of discussions since IDing the good guys is only an issue because there are bad guys and vice versa.
| 6:46 pm on May 1, 2008 (gmt 0)|
How about something more inclusive like:
"Search Engine Spider and Other Automated Activity Identification"
With the charter being to help identify automated activity from Search Engines, Crawlers, Link Checkers, Scrapers, Botnets, etc."
Just a thought.
| 10:45 pm on May 1, 2008 (gmt 0)|
a little clumsy. I like Hobbs proposal unless you want to just cut right to it
Good Bot, Bad Bot Identification
| 4:26 pm on May 8, 2008 (gmt 0)|
I like Hobbs short one, too.
"Search Engine & Crawler Identification"
Or focusing on agents being identified:
"Search Engine Spider and Automated Client Identification"
| 12:46 pm on May 12, 2008 (gmt 0)|
the problem is that we really want to stay away from general purpose bot/ip identification. We don't like to get into senarios where we are id'ing private individuals. Posting their IP address is often considered an attack on privacy. We have also had numerous incidents where people would post ip's and hope that the intended ip would be the victim of a ddos attack (which has happened several times). The attackee, then comes back here and rants, raves, and posts all sorts of threatening stuff.
JD is on the right track there, but I wonder how we avoid the privacy issues?
| 8:31 am on May 15, 2008 (gmt 0)|
|We don't like to get into senarios where we are id'ing private individuals |
And I think that's why Brett you guys made that forum pre-moderated, also hiding the last IP octet rule except for recognized search engines takes care of those scenarios.
If anything, we need to spot, analyze and discuss everything new crawling our sites search engines or not, specially those originating from known hosting data centers, this interest ranges from business survival down to a hobby levels like plane & train spotting, I sure like to know more about them before getting run over by one.
Deep inside I wish it becomes as simple as "Crawler Identification", search engines thrown in as a bonus :-)