Forum Moderators: open
[edited by: Brett_Tabke at 11:55 am (utc) on Oct. 4, 2003]
[edit reason] fix a couple urls [/edit]
There's some User-Agents that have changed name, eg. JetCar -> FlashGet or those bots/technologies/scripts that can have more names, eg. libwww <-> LWP, but in these cases both names should be there. The "Java/..." group is an okay exception, and so is "libwww (all kinds of)" as in these cases, the important part of the UA string is common.
I see the last entry "Zibie spider" is not listed under "Java/...". It probably should be both places. Anyway, 17 pages down it's already a very good tool :)
>> If you think this is a bad idea
No way. It's the best one i've seen for a long time - not many threads make it to my bookmarks :)
/claus
[edited by: claus at 2:37 pm (utc) on Oct. 4, 2003]
Would be even nicer to have IP ranges, companies, first spotting of UA etc etc etc. I'm sure lots of people operate sites with lists/IP's such as this, i'll remember to point them to this thread too :-)
Being at page 30 now, I wanted to "owner edit" my first post to update it, but it seems impossible. Is it perhaps due to Brett's intervention? Any Admin, please help. Thanks.
For IP issues like Cyveillance, I'd suggest a separate thread. Yes, I'd like to update it regularly.
I've also been compiling a master list of bad IPs, Requests, URIs, Referrers, UAs and other such nonsense. It's based on observations here and my own log files. The .htaccess is now a disgusting 35.5kb+ in size, but is pretty comprehensive. If anyone is interested, I can post or stickymail it.
Mark.
other related pages on WebMasterWorld:
The Perfect Ban List [webmasterworld.com]
Modified "bad-bot" perl script from stapel/jdMorgan/Key_Master [webmasterworld.com]
How to protect from site copiers like teleport? [webmasterworld.com]
Does anyone redirect bad bots to scumware sites? [webmasterworld.com]
robots.txt tutorial [webmasterworld.com]
UA list collected by member transistor (thanks for this!) : [joseluis.pellicer.org...]
All this gave me an idea - would it be possible to start some sort of 'user contributor' permanent thread (stuck up top) where we all could report bots and 'new' bots and the good/bad actions of each? If the conclusions of each report (final fingering?) could then be referenced added to a Library document much like the result of Bull's hard work - a list of spider names with a link to the details about it - that would make the purpose of this forum very clear and I'm sure all would contribute. A bit more work for the moderator, perhaps...
Just a thought.
It wouldn't likely be an effective solution anyway :(
As much effort as has been taken to take note of Bull's accumulated list some folks still make basic inquiries rather than reading.
A Good example is a recent five count thread in which two links were provided early on and then later in the thread another participant introduces the same IP range.
IMO the best thing each of us (who have any time spent here) is to accmulate links which will assist other inquiries rather than extending or promoting new threads.
As I recall the forum has a date-limit on threads which are no longer active. (Not sure what effect this has on restarting threads which existed in the months the forum was down?)
The ability to find these depends entirely upon one's knowledge of their existtence or ability to use the site search.
Yup : )
I know that this place isn't a Webmaster encyclopedia but rather a place to deal with current issues - my wrong. Still, it would be nice to start an always-up-to date 'spider database' somewhere. I'll give it some thought - some of the others I've found are horribly outdated. It would be a great tool in the war on spammers for sure. 'Specially since the web-aware world is doubling every three years - we'll never be able to keep up with all the name-changing and other tactics if we 'laisse aller'. Bots aren't hard to deal with today thanks to Apache - : ) - but fingering them and finding out what they want sometimes is.