Welcome to WebmasterWorld Guest from 18.104.22.168
You know, where anyone (who has registered) can submit a bot that they discover. A moderator could then check it out, possibly with the aid of some automated tools:
* perform a reverse lookup
* visit the IP using http:80
* search google for the IP# or name about the bot
In addition to a database that could be searched online, you could put out various periodically updated files of use to webmasters. For example:
* malicious bots
* known search-engine spiders
If such a system does not exist, do people think that there would be need for one? Is anyone keen to develop and promote it? I could do the database coding, I'm good with PHP/mySQL, and could possibly host it...
digitaleus's original idea might work but I think having a person as the absolute moderator is presenting a single point of failure where there doesn't need to be one...
A distributed client model works much better because you are not relying on individual users to correctly identify the elements required and you can pre-process to a certain degree.
If you added to this further by requiring multiple confirmations from separate sources alongside a trust model based on past performance you start to diminish the role of moderator and move that job to more of an administrative role.
The reason I say this is that nothing sucks more than data not being added because the owner/moderator is away or sick, or perhaps just not wanting to add the particular user-agent you have found.
A nice automated system side-steps this problem but still allows you to maintain a "pure" feed for those who want absolutes while at the same time providing a "dev" feed for those who want the bleeding edge data.
These days I'm quite content with the WHOIS facilities and web searches (most of mine are through google.) Along with the data I've accumulated (If only I could get it updated and together in one file (it's too large for NotePad.)
There u can find
Search Engine UAs
Offline Browser UAs
Email Collector/Spam UAs
and you can generate config files such as .htaccess files for blocking some sort of bots u can specify.
greetings from germany,
Then, each time that you get a hit to your website (or each time a new IP address/useragent combination comes along every once in a while) you can query the database to see if you should serve the page. Queries could be made through something quick such as a DNS lookup like some blackhole lists use.
I've outlined it a bit at [gotany.org...]
I can't figure out a few parts of it, though, such as how to prevent the database from being filled with false reports, and agreeing on how much of what behavior gets a spider banned.
If you have any suggestions or input, I'd love to hear it.
i think this is impossible. Because regarding email and RBLs it's unrelevant whether the email is delivered $now or $now+10s. But when accessing a Webserver it's important that the machine can serve webpages as fast as possible. There's no time for looking up UA or IP in a remote database.
this is my point of view - i think this plan is doomed to failure.