Forum Moderators: open
You know, where anyone (who has registered) can submit a bot that they discover. A moderator could then check it out, possibly with the aid of some automated tools:
* perform a reverse lookup
* visit the IP using http:80
* search google for the IP# or name about the bot
In addition to a database that could be searched online, you could put out various periodically updated files of use to webmasters. For example:
* malicious bots
* known search-engine spiders
If such a system does not exist, do people think that there would be need for one? Is anyone keen to develop and promote it? I could do the database coding, I'm good with PHP/mySQL, and could possibly host it...
digitaleus's original idea might work but I think having a person as the absolute moderator is presenting a single point of failure where there doesn't need to be one...
A distributed client model works much better because you are not relying on individual users to correctly identify the elements required and you can pre-process to a certain degree.
If you added to this further by requiring multiple confirmations from separate sources alongside a trust model based on past performance you start to diminish the role of moderator and move that job to more of an administrative role.
The reason I say this is that nothing sucks more than data not being added because the owner/moderator is away or sick, or perhaps just not wanting to add the particular user-agent you have found.
A nice automated system side-steps this problem but still allows you to maintain a "pure" feed for those who want absolutes while at the same time providing a "dev" feed for those who want the bleeding edge data.
- tony
These days I'm quite content with the WHOIS facilities and web searches (most of mine are through google.) Along with the data I've accumulated (If only I could get it updated and together in one file (it's too large for NotePad.)
There u can find
Unknown UAs
Indexing UAs
Search Engine UAs
Other UAs
Offline Browser UAs
Validator UAs
Email Collector/Spam UAs
and you can generate config files such as .htaccess files for blocking some sort of bots u can specify.
greetings from germany,
Marcel.
Then, each time that you get a hit to your website (or each time a new IP address/useragent combination comes along every once in a while) you can query the database to see if you should serve the page. Queries could be made through something quick such as a DNS lookup like some blackhole lists use.
I've outlined it a bit at [gotany.org...]
I can't figure out a few parts of it, though, such as how to prevent the database from being filled with false reports, and agreeing on how much of what behavior gets a spider banned.
If you have any suggestions or input, I'd love to hear it.
i think this is impossible. Because regarding email and RBLs it's unrelevant whether the email is delivered $now or $now+10s. But when accessing a Webserver it's important that the machine can serve webpages as fast as possible. There's no time for looking up UA or IP in a remote database.
this is my point of view - i think this plan is doomed to failure.
greetings,
Marcel.