Forum Moderators: open
I was thinking altering the SQL that does this to delete any visitor where the following applies:
lower(user_agent) is like '%bot%'
lower(user_agent) is like '%spider%'
lower(user_agent) is like '%crawler%'
I would then add an in statement with all the other critters out there who don't use those three words in their UA.
Before I go and do this, can anyone think of a situation where this would delete someone who was a real human? I don't know of any browsers with those words in the UA, but I could be missing something.
In this case when a session is initiated we capture that cookie information including the IP and user agent and write it to a database table.
I am not sure why you want to know OS/server info. My question relates to data cleansing after it is in the Oracle table. We delete records that are known to be bots using a procedure. What I want to do is alter that procedure to delete any record where "bot", "spider", or "crawl" is found in the user agent string. Before I do that I just want to make sure that there are not any common user agents used by humans for normal browsing that contain those three words. Can you think of any?
That's how I block bots, everything that doesn't match that gets tossed, so much for cell phones and people using Lynx, but my site doesn't work for them anyway.
Then you need to subfilter the user agent as there's about 100 items I knock out, and anything with "http://", "crawler", "spider", "download" and "robot" in it is pretty safe to zap.
Just "bot" alone will nail things is probably shouldn't, like BOTtom, BOTher, BOTtle, you get the point, you need to do Googlebot, Spambot, etc. one at a time or perhaps check anything that matches the for "bot" to make sure the word ends in bot and isn't in the middle or something.