Forum Moderators: open

Message Too Old, No Replies

misbehaved bots

how to trap misbehaved bots?

         

indiandomain

10:19 am on May 2, 2003 (gmt 0)

10+ Year Member



hey guys
i need to block misbehaved bots from clicking on the links on my site.
i have a solution on hand but before i implement it i need your advice.

this is the scenario.
the bot enters my site and clicks on all possible links on my site[spams my site].

to trap the bot i have a blind link on my site.
all ips that enter through this link have to be the ips of bots and not humans..
i then record these ips and feed it into a black list
On each html page i'll have a code to check if this IP in black list.
If yes ill'll redirect the bot elsewhere.

what do you guys think?
will this work?

i noticed that i have a problem of using a blind link with the google bot.
im stuck on that issue.

let me know.

thanks
indiandomain

carfac

4:43 am on May 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello ID:

Yes, it would work.... but there are BETTER ways!

Keymaster has written a wonderful little script that does just about what you describe, but perhaps a bit "cleaner." (your line "On each html page i'll have a code to check if this IP in black list.
If yes ill'll redirect the bot elsewhere. " makes me wonder a bit... KM's script does it in the request)

Anyway, search for Keymasters Spider Trap... that will do you JUST RIGHT, I think!

And if you need help- let us know!

dave

jdMorgan

5:24 am on May 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



indiandomain,

Here's a recent link you can follow back to the script - there's some "recommended reading" in there that helps explain things: [webmasterworld.com...]

Jim

indiandomain

6:36 am on May 3, 2003 (gmt 0)

10+ Year Member



thanks carfac,jim

i found the script i was looking for.
you are guys have been very helpful.
:-)

indiandomain

1:34 pm on May 7, 2003 (gmt 0)

10+ Year Member



guys a little help needed on the trap.pl script by keymaker.

somehow the trap.pl cant write in .htaccess when i run it.

my .htaccess looks like this

<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>

am i missing any line?
i think there is a SetEnvIf directive needed,

can someone help me here.

thanks

jdMorgan

1:57 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



indiandomain,

Until the script can write to .htaccess, the actual contents of .htaccess don't matter.

Make sure that .htaccess permissions are set to 644, and verify with your hosting service that the script executes as "owner". Also check that the script is using the correct path to put the newly-written .htaccess file in your web root ("home page") directory.

As for the code needed in .htaccess to implement the blocking, here's what I'm using:


# Block bad-bots using lines written by bad_bot.pl script above
SetEnvIf Request_URI "^(/403.*\.html¦/robots\.txt)$" allowit
<Files *>
order deny,allow
deny from env=ban
allow from env=allowit
</Files>

This will block requests from any IP address which has set the environment variable "ban" except that access to my custom 403 pages and robots.txt are always allowed for all requests - even those from banned IPs.

The missing SetEnvIf directives are what the script is trying to write to .htaccess. When you get it working they will appear at the beginning of your .htaccess file.

Note that for testing purposes, you can modify the name of the file that the script writes to. This may be handy while you are trying to figure out the file permissions settings and why the script can't write to your .htaccess file.

HTH,
Jim

indiandomain

2:13 pm on May 7, 2003 (gmt 0)

10+ Year Member



jim thanks

ive seen your script at [webmasterworld.com...]

i guess ur .htaccess is meant for your script at [webmasterworld.com...]

i was following keymasters trap.pl script at [webmasterworld.com...]

he said to use this in ,htaccess
<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>

cant i use this .htaccess

i tried using yours and it didnt work

-Sohail

jdMorgan

2:47 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sohail,

You said your script could not write to .htaccess.

If it can't write to .htaccess, then what you put in .htaccess does not matter yet. I strongly recommend you get this working one step at a time.

The script writes lines which say, "SetEnvIf Remote_Addr ^212\.198\.0\.96$ ban" to the beginning of .htaccess, and the code you add manually to .htaccess checks that environment variable and actually does the blocking. Each time a spider is trapped, the script adds a new "SetEnvIf" line to the beginning of .htaccess. But if it can't write to .htaccess - as you stated - then it won't work at all. So get the writing problem fixed first.

The script I posted is a simple modification of key_master's original. I added file-locking so that multiple instances of the script, running as different process threads, would not interfere with each other and overwrite each other's new entries in .htaccess. It is a minor improvement, probably needed only on very busy sites. As an example:

Bad-bot A requests disallowed file, so server process A opens .htaccess for read/write, and reads current .htaccess contents.
Bad-bot B requests disallowed file, process B opens .htaccess for read/write, reads current contents (which do not yet include Bad-bot A's IP address).
Process A adds Bad-bot A's IP address to its copy of .htaccess, writes the result to .htaccess and closes it.
Process B adds Bad-bot B's IP address to its copy of .htaccess, writes the result to .htaccess - overwriting process A's newly-written version with Bad-bot A's IP in it - then closes it.

Process B therefore destroys the entry written by Process A, because process B read the original file before Process A was able to modify it. File-locking prevents this problem.

Stapel took my version and added the capability to notify the webmaster by e-mail when new intruders are trapped. If that is useful to you, then use her version. Whichever version you choose to use, make sure you use the whole "package." Get it working, and then modify it as desired to suit your needs. Once you see the whole thing in action, it's a lot easier to understand.

File paths and write permissions are actually one of the most difficult things about getting it to work, since they depend on your server setup.

HTH,
Jim

indiandomain

3:29 pm on May 7, 2003 (gmt 0)

10+ Year Member



jim i got it now.

when i use chmod 644 for .htaccess the script cant write to .htaccess
when i use chmod 777 it works...

any idea why 644 isnt working?
is 777 safe?

jdMorgan

6:50 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



777 safe?...

It is less safe.

I don't know why yours doesn't work with 644 - This all depends on how your server is set up, and the permissions granted to scripts running in the user accounts.

Try this: Start with 444. Change each 4 to a 6 one-at-a-time. Test each setting to see if it works. If not, go back to 444, and then change each 4 to a 7 one-at-a-time. This will limit the "exposure" of the script to write and execute access.

I just checked mine again. It is set to 644, which is Owner read/write, Group read, and World read.

Note that these settings don't affect your site's security from an HTTP (Web) access view. What they do affect is security from FTP if you allow users to log in using FTP, and from Telnet and FTP if your host does not do a good job of keeping their customers out of each other's accounts. If your hosting company does their job properly, you don't need to worry.

Jim