Forum Moderators: open

Message Too Old, No Replies

Using ASP to ban bots

re: older thread

         

SuzyUK

4:23 pm on Jan 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Korkus 2000 or anyone

I just read an older thread [webmasterworld.com] which answers a question I was going to ask...

in this thread you say you are calling IP and user agent and putting the script in your global.asa

I don't have a global.asa file on server yet should it just go in the root directory? and If possible could you detail the code I would require to put in it..(shortened will do I can collect the IP's I want to ban) does it cover IP blocks? can IP's be added and called from a database? or is it best to create an array?

Sorry if this seems cheeky to ask, but I just need an example to save me spending too many hours on it ;) I'm fairly new to ASP..

I like the response.redirect URL too...:)

Suzy

korkus2000

6:16 pm on Jan 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a great overview of the global.asa
[w3schools.com...]

This thread also has more detail
[webmasterworld.com...]

Basically you place a file called global.asa on your root. I pull from a database the IP address and user agent. It will take regular asp code inside the events. I have had several people tell me that they have done this and it works great, so its not just me that does it successfully.

Use your code in the session start event. Every time a bot hits it starts a new session so it will fire every time.

SuzyUK

9:39 pm on Jan 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Korkus..I was on the case anyway..I'm a little scared of putting the function into the global.asa as I've been reading that it can muck things up if you get it wrong.

However if I write it as a sub and put it at the bottom of the global.asa the call the sub like this

sub Session_OnStart
subBanEm ' call the bad bots sub function
end sub

would that do?

I've tested the (subBanEm)function in a normal .asp page and it's working OK

or is this a really long way to do it?

Suzy

korkus2000

11:28 pm on Jan 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like it would be the same as just putting it in the event. I would put it in the event. Yes you can muck things up so do be careful.

Xoc

6:37 pm on Jan 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, a bot will probably begin a session on every page. Since a bot doesn't typically store cookies, each hit starts a new session.

Sessions work by storing a cookie with a unique session id number in it. Each time a request is made, that cookie is sent up. Each time a page is sent down, the timeout on the cookie is refreshed. The session id number is used to lookup session variables on the server. Sessions maintain the info on the server, by default, for 20 minutes.

However, Sessions consume memory on the server. So if you suddenly get a lot of hits on your web site, then all of the memory gets consumed, as each session has to maintain that memory for 20 minutes.

Since a bot doesn't store the cookie, the web server thinks each hit is a new session. So a bot can run through all the server memory really fast. Best advice is just turn sessions off and don't use them.

If you want to ban a bot, the IIS Manager has a place where you can ban a particular IP address or domain. Better to use the IP address, since if you ban a domain, each hit has to do a reverse IP address lookup. On the Directory Security tab of the web site properties dialog is a section that is labeled "IP address and domain name restrictions" with an edit button. You can then ban a particular IP address or range of addresses. Even better is to ban it at the firewall.

If you can't get to this dialog (because your ISP doesn't let you), then add a script to the top of each page on your web site. Easiest way to maintain that is to actually add a server-side include to the top of each page, referencing a common script.

[edited by: Xoc at 9:43 pm (utc) on Jan. 19, 2003]

SuzyUK

7:46 pm on Jan 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Xoc....a lot of that went flying over my head ;)

but let me think out loud.

Actually, a bot will probably begin a session on every page. Since a bot doesn't typically store cookies, each hit starts a new session.

makes sense, but if you have the code in your session_onstart would that not catch them as they trying to start a new session?

Session work by storing a cookie with a unique session id number in it. Each time a request is made, that cookie is sent up. Each time a page is sent down, the timeout on the cookie is refreshed. The session id number is used to lookup session variables on the server. Sessions maintain the info on the server, by default, for 20 minutes.

However, Sessions consume memory on the server. So if you suddenly get a lot of hits on your web site, then all of the memory gets consumed, as each session has to maintain that memory for 20 minutes.


again the same question if no session then no 20 minute interval


Since a bot doesn't store the cookie, the web server thinks each hit is a new session. So a bot can run through all the server memory really fast. Best advice is just turn sessions off and don't use them.

would like to but I need them for LCID (although someone did mention another way of doing this) and again if no session cookie would that not mean the script checks for the URL/UA each time

If you want to ban a bot, the IIS Manager has a place where you can ban a particular IP address or domain. Better to use the IP address, since if you ban a domain, each hit has to do a reverse IP address lookup. On the Directory Security tab of the web site properties dialog is a section that is labeled "IP address and domain name restrictions" with an edit button. You can then ban a particular IP address or range of addresses.

Even better is to ban it at the firewall. If you can't get to this dialog (because your ISP doesn't let you), then add a script to the top of each page on your web site. Easiest way to maintain that is to actually add a server-side include to the top of each page, referencing a common script.

now here's where you lost me..site is hosted and I don't know how to configure any of these settings you mention..but I could include the (url checker)script in each page as I already have the header built as an include..are you saying it would be better to put the script in this as opposed to the global.asa

Seriously confused :)
Suzy

Xoc

9:55 pm on Jan 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, if you have to have sessions, and you can't get to the IIS Manager dialogs or firewall, then the global.asa is the right place. Make sure that you return a 403 or 404 error (see [ietf.org ] sections 10.4.4 and 10.4.5), or a response.redirect as mentioned in the previous thread.

You do have a line in the robots.txt to tell the spider to go away, right, but it's being persistent.

SuzyUK

9:52 am on Jan 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yes Xoc thanks...I have a response.redirect (I liked the fbi idea ;) but I just redirected it to a dummy directory with a blank page) and then disallowed the directory from the "good bots" in robots.txt

This particular bot (Fresh Bot DOT) doesn't even fetch robots.txt so it's pointless putting it in there I think it's a spam bot as it just keeps going to a guestbook page..

Thanks for all your help guys, I have the script working now so I'll see if it works or causes any adverse effects..

Suzy

SuzyUK

7:58 pm on Feb 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok it all seems to working in that it's causing no adverse effects, but I still see my banned bots hitting the index page in question..although they are no longer following any of the other links so I presume it's the script that's sending them away

However the e-mail addresses I want to protect from being harvested are on this default page..

1. can they still collect the addresses from this quick visit?

2. Shouldn't the logs show them as fetching the "trap" page too? if they're being redirected there.

3.Is there any way I could write some form of confirmation that I could view to let me know they are indeed going where I want them too?

Suzy

Xoc

6:41 am on Feb 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah...always have to be careful with Response.Redirect. It is a directive to the client browser or bot to go to the new site. However, if you still send down the text in the body of the message, then the bot could ignore the redirect and still index. The way around that is put a Response.End() immediately after the Response.Redirect.

SuzyUK

4:37 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks XOC will try that..but I'd thought that as the script is in the Global.asa it would redirect them before they even got to the page..

what about if I use a response buffer on the page in question or indeed all pages? would that help anything?

Suzy

RossWal

5:34 pm on Feb 5, 2003 (gmt 0)

10+ Year Member



Suzy,
I'm thinking the page is 'touched' by IIS before global asa is called. My reasoning is that an invalid page request would probably not fire off the global.asa code, thus IIS probably first retrieves the page, then goes to global asa.

Ross