Forum Moderators: open
Is it common for a bot use many IP addresses, but if I filter the IP address out, could I be filtering real traffic.
Any decent articles on this subject?
Thanks.
They also sometimes come from IP addresses that are owned by broadband ISPs that are also used by human browsers.
I've recently I've been exploring identifying bots by behavior. Using several honeypots that lure bots but not humans, in my spare time I've begun going through my recent logs putting each user-agent IP combination through a series of tests to identify whether that combo shows various kinds of bot behavior:
Did they request more than 10 pages within 20 seconds? more than 10 pages within 30 seconds?
Did they request robots.txt?
Did they request any of the honeypot pages, and if so, how many?
Did they NOT request style sheets?
Did they NOT request image files?
I don't yet know how many tests a given user-agent IP combination has to pass or fail to be identified as a likely to be a bot.
Over time, I'm hoping to discern patterns that can be associated with different kinds of bots. I'm also looking at the behavior of known bots like e-mail siphon to see if any other user-agent IP combinations in my logs exhibit similar behavior.
I don't, as yet, have any useful results, but I wanted to share the approach.
Don't recall (course I haven't looked in a while) a specific page giving lengthy explantions. At least not gathered together in one document.
Bots are rapidly changing with new innovations constant. It is never-ending. :-(
spam
robot
bad bot
UA list
a combination of those or even just putting the IP or the UA of a suspected bot along with bot or spam in the keyword will turn stuff up...
actually that's how i found this board, most of the information i have found useful came from here.