Welcome to WebmasterWorld Guest from 54.159.214.27

Forum Moderators: lawman

Requesting forum name change

add Crawlers please

   
8:38 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member



"Search Engine Spider Identification" by itself is good, but scope needs to expand to cover benign and creepy crawler identification by user agent, IP ranges and behavior, they are non search engine related, topic can't find a better home on WW than here where it is covered anyway, but always feels like an illegitimate child.

How about changing:
"Search Engine Spider Identification"
to become
"Search Engine & Crawler Identification"

Thank you

4:04 pm on Apr 29, 2008 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks for the suggestion.

Crawler or Spider, or robot (bot), does that really matter to those that know?

8:45 pm on Apr 29, 2008 (gmt 0)

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member



What I meant was that the title is too limiting and needs to include crawlers that are not originating from search engines.
7:54 am on May 1, 2008 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Good point, and becoming more of a problem every day.
4:15 pm on May 1, 2008 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It is a good point. I think a lot of questions, including ones in other forums like Webmaster General and sometimes PHP, deal with things like honeypots, ban lists, catching bad bots and such.

Might not be a bad idea to lump the good guys with the bad guys in one set of discussions since IDing the good guys is only an issue because there are bad guys and vice versa.

6:46 pm on May 1, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



How about something more inclusive like:
"Search Engine Spider and Other Automated Activity Identification"

With the charter being to help identify automated activity from Search Engines, Crawlers, Link Checkers, Scrapers, Botnets, etc."

Just a thought.

10:45 pm on May 1, 2008 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



a little clumsy. I like Hobbs proposal unless you want to just cut right to it

Good Bot, Bad Bot Identification

4:26 pm on May 8, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I like Hobbs short one, too.

"Search Engine & Crawler Identification"

Or focusing on agents being identified:

"Search Engine Spider and Automated Client Identification"

Jim

12:46 pm on May 12, 2008 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the problem is that we really want to stay away from general purpose bot/ip identification. We don't like to get into senarios where we are id'ing private individuals. Posting their IP address is often considered an attack on privacy. We have also had numerous incidents where people would post ip's and hope that the intended ip would be the victim of a ddos attack (which has happened several times). The attackee, then comes back here and rants, raves, and posts all sorts of threatening stuff.

JD is on the right track there, but I wonder how we avoid the privacy issues?

8:31 am on May 15, 2008 (gmt 0)

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member



We don't like to get into senarios where we are id'ing private individuals

And I think that's why Brett you guys made that forum pre-moderated, also hiding the last IP octet rule except for recognized search engines takes care of those scenarios.

If anything, we need to spot, analyze and discuss everything new crawling our sites search engines or not, specially those originating from known hosting data centers, this interest ranges from business survival down to a hobby levels like plane & train spotting, I sure like to know more about them before getting run over by one.

Deep inside I wish it becomes as simple as "Crawler Identification", search engines thrown in as a bonus :-)

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month