Forum Moderators: phranque
What I'm thinking about is more immediate in nature.
A daily compilation of newly emerging negative entities, posted to the same thread as I did in this thread msg #:3 [webmasterworld.com].
Perhaps such a "Daily Listing" of rougue bots/spiders (and Harvestors along with the Spambots), would help Webmasters preemptively protect themselves from these entities by just checking this "Daily Listing" against their own logs.
Just my thoughts.
Pendanticist.
Pretty idealistic idea but i would post what i find. The problem i see though is that there are so much rougue bots and agents that a daily list would quickly become unhandable. Some of them even only can get identified by their ip. A alphabetical list or small db would do the job. OTOH this kind of black lists are a popular target for ddos attacks - i wouldn't which that happen to WebmasterWorld!
Personally, i sometimes have considered making an "accept list" in stead of a b-list, as there are so many bots around, but i constantly see cases where this or the other malicious bot also has legit use(r)s and that makes it hard to use either option.
/claus
That's definitely not nice. Especially as i think too many readers might be too quick to ban bots, without first estimating if this or that IP/UA will likely be a problem to them, and perhaps without even seeing traffic from it.
Otoh, control will not be feasible or even possible in that type of forum, imho. Competitor postings will likely drown in the daily buzz - at least they do for me, but then again, i don't rush off to ban all and everything on rumours alone.
>> moth balls
I'm very sorry to hear this. It's not a discussion-type forum quite like the others, but for more, say, "serious" bot watchers that forum is one-of-a-kind. The occasional post that really is of great value more than compensates for the daily double-triple-postings, insignificant finds, or postings that are directly wrong.
There's so many odd user-agents listed in one place that you'll always be able to find some pointers to a specific one. I believe this has some SE marketing value for WebmasterWorld as well, as this is often the only place apart from those webalizer files, that you can find them.
Plus, changes in UA-strings and IP's are often indicators of more important changes at the SE companies. One such example is the recent threads mentioning a well known SE that started violating robots.txt, continued to do so for a while and now has at least three new beta-bots out there.
Only thing nagging me is that the postings are sometimes done in other forums, but that's due to uninformed posters, there's no cure for that as RTFM does not work. Mothballing will possibly just make those posts appear all over the place in stead.
/claus
But, we don't look at rogue bots with the same perspective on the Micro level....
I can hear the chuckles of those who unleash these rogue agents in the background now....
Pendanticist.
There are also a lot of new se's under development out there, I think so long as a bot follows the rules and does not hog resources it should be allowed.
It is also very easy to make false judgements on what a bot is up to. Perhaps sometimes bot operators are doing a worthy project and the bot encounters an error and fails, The site owner then sees it requesting exessive pages or failing to get robots.txt
I think bot banning has to be left to the indevidual site admin.
Mack.
I think bot banning has to be left to the indevidual site admin.
I would think you'd have a hard time finding anyone to dis-agree with you on that point.
Cloaking Device Made for Spammers [webmasterworld.com] raises some critical issues which, to me, are part-and-parcel to 'tracking and logging'.
Far be if for me to appear as though I'm intentionally challenging board protocols. I am not.
I am concerned that this technology (Cloaking Device Made for Spammers) is being off-set by newer, faster rogue bots harvesting our material at ever increasing rates and having less resources with which to help us pre-emptively protect our content and to that end, am definately interested in Creepy Crawler Collection (weekly date) [webmasterworld.com] thread.
Too bad....
Pendanticist.