Forum Moderators: phranque

Message Too Old, No Replies

I need a bad bot trap

looking for the easiest way to ban bots than don't ask for robots.txt

         

onedumbear

11:29 pm on Oct 23, 2003 (gmt 0)

10+ Year Member



I was'nt sure where else to post this...
I have gotten very tired of bots that don't ask for the robots.txt first.
I would like to ban all the bots that do not ask for the robots.txt file.
Can someone post or sticky a working code for my .htaccess file that will do this?
I have no experience with this particular process so please keep it idiot proof.

caine

11:52 pm on Oct 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



try this:

Close to perfect .htaccess ban list [webmasterworld.com].

also the same forum, provides quite a lot of details about .htaccess implementation, and adding rogue robots.

onedumbear

12:16 am on Oct 24, 2003 (gmt 0)

10+ Year Member



thanks caine,
i did see that, but was wondering if there were something more simple that would just ban any bot that did'nt ask for the robots.txt
I did'nt really want to learn anything either (he, he),my brain is tired today and i dont wanna. whaaa!
So i was hoping for a simple "here you go", cut, copy and paste job.
I'll just learn what i have to... tommorow.

caine

12:21 am on Oct 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of my colleague's has read the thread, and its constituent parts - and it was necessary apprently, were dropping it online soon.

There's no such thing as a cut and paste .htaccess, but near perfect ain't bad.

If you head down the link a good 50+ posts, the list get pretty comprehensive with reasonable straight forward instructions on application. keep your eye's open its a while since i read that thread.

jdMorgan

12:30 am on Oct 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



onedumbear,

There is a script (and several derivatives) posted here that is close to what you are asking for. The script does not block based on whether robots.txt is requested, it blocks based on whether robots.txt is obeyed. The problem with blocking based on robots.txt requests is that you have to "track" by IP number or hostname (which might change, a la Google) and you have to remember the robots.txt request for each IP address or hostname - sometimes for several days. This results in a database that is large, difficult to determine purge criteria for, and generally a pain.

Here's a link to a later thread, and you can follow the backlinks in the thread to get back to the original post by key_master containing the background and theory of operation: [webmasterworld.com...]

It works well.

Jim