Forum Moderators: DixonJones

Message Too Old, No Replies

Bad Bots Again

Bots that do not announce themselves

         

SrWebmaster

1:44 am on Jul 14, 2006 (gmt 0)

10+ Year Member



It's relatively easy to block bots that announce themselves in HTTP_USER_AGENT but what about bots that leave a footprint of:

HTTP_USER_AGENT = hkgoxvfarmyfyjfwafdgvfbel
HTTP_ACCEPT = text/html, text/plain

In all cases, the HTTP_USER_AGENT is a random lengh of random alpha characters.

Receptional

6:37 am on Jul 14, 2006 (gmt 0)



With a bit of luck, that will be usingthe same IP number each time. Ban it by IP.

DanA

7:53 am on Jul 14, 2006 (gmt 0)

10+ Year Member



Unfortunately, they come from everywhere. I began to make a list of IPs two weeks ago and have more than a hundred as of today.

the_nerd

9:11 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I just ran a test query against a sample of 200000 request I had ready in a database.

SELECT Left(uagent,10) AS uagent
FROM ex060704
GROUP BY Left(uagent,10);

Result 54 records.

Mozilla
msnbot/
Opera/9
ColdFus
Yahoo-M
Opera/8

Those 6 account for about 99.5% of all requests.

Just as a starting point: anaylse a couple of million request this way, create the list of starting strings with length 10 (or wahtever you like)

Hand edit this list (you will probably weed out another half of them) - white-flag the Rest. This way you'll probably get rid of the ramdom user-agent-strings.

Combine this with a honey pot (link that no user will normally click on) - and wave them good-bye.

I'd check every once in a while if new browser or SEs are knocking.

2cts ...

nerd.

Receptional

1:55 pm on Jul 14, 2006 (gmt 0)



That's going to also knock out legit bots though I would guess, (as well as my PDA or webTV).

The white list will mostly work, but once on the defensive like that, you are going to need serious ongoing maintainence to keep the white list up to date, which will get forgotten and over time there will be an erosion on genuine visitors who are inadvertantly being banned for using a new Beta browser or a new platform.

Dixon.