| 8:44 am on Feb 11, 2003 (gmt 0)|
you could try spiderhunter.com
I have heard of it being down at the moment though.
| 4:50 pm on Feb 11, 2003 (gmt 0)|
Here's a bot database created by one of the members here: http: //joseluis.pellicer.org/ua/
I believe it's mostly concerned with User Agents... not sure if IP addresses are tracked.
I keep track of IP numbers of search engine spiders at http: //www.iplists.com/
| 8:01 pm on Feb 11, 2003 (gmt 0)|
Personally I'm a fan of http;//www.psychedelix.com/agents.html
digitaleus's original idea might work but I think having a person as the absolute moderator is presenting a single point of failure where there doesn't need to be one...
A distributed client model works much better because you are not relying on individual users to correctly identify the elements required and you can pre-process to a certain degree.
If you added to this further by requiring multiple confirmations from separate sources alongside a trust model based on past performance you start to diminish the role of moderator and move that job to more of an administrative role.
The reason I say this is that nothing sucks more than data not being added because the owner/moderator is away or sick, or perhaps just not wanting to add the particular user-agent you have found.
A nice automated system side-steps this problem but still allows you to maintain a "pure" feed for those who want absolutes while at the same time providing a "dev" feed for those who want the bleeding edge data.
| 11:21 pm on Feb 11, 2003 (gmt 0)|
Initally when I started with htaccess and monitoring my logs I used the aformentioned links and a few more to acquire any information possible.
There may be more than the four below which I used to use:
These days I'm quite content with the WHOIS facilities and web searches (most of mine are through google.) Along with the data I've accumulated (If only I could get it updated and together in one file (it's too large for NotePad.)
| 7:08 am on Feb 15, 2003 (gmt 0)|
I think a very good database is [joseluis.pellicer.org...]
There u can find
Search Engine UAs
Offline Browser UAs
Email Collector/Spam UAs
and you can generate config files such as .htaccess files for blocking some sort of bots u can specify.
greetings from germany,
| 6:18 pm on Feb 15, 2003 (gmt 0)|
I have been considering building a realtime malicious spider database. It would probably act like the realtime blackhole lists for open spam realys. If your website was visited by a spider that disobeyed your robots.txt, or clicked through to your bot trap, or otherwise demonstrated it to be a malicious spider, you could report its IP address and useragent to the database.
Then, each time that you get a hit to your website (or each time a new IP address/useragent combination comes along every once in a while) you can query the database to see if you should serve the page. Queries could be made through something quick such as a DNS lookup like some blackhole lists use.
I've outlined it a bit at [gotany.org...]
I can't figure out a few parts of it, though, such as how to prevent the database from being filled with false reports, and agreeing on how much of what behavior gets a spider banned.
If you have any suggestions or input, I'd love to hear it.
| 2:34 pm on Feb 16, 2003 (gmt 0)|
i think this is impossible. Because regarding email and RBLs it's unrelevant whether the email is delivered $now or $now+10s. But when accessing a Webserver it's important that the machine can serve webpages as fast as possible. There's no time for looking up UA or IP in a remote database.
this is my point of view - i think this plan is doomed to failure.