Forum Moderators: goodroi

Message Too Old, No Replies

Looking for advice on blocking IP's

Beginners guide needed

         

Vimes

5:21 am on Mar 10, 2006 (gmt 0)

10+ Year Member



Hi,

Ok the situation I find myself in is this,

I’ve created a bot trap and blocked it in the robots.txt file then placed a link to it on my homepage. It’s been running for a week now, and I’ve got a list of ip address some have a home page referrer and a browser / user-agent and others without the referrer and just an user-agent / browser tag.

So I whois’d the ip addresses and came back with a mixed bunch of results.

Most return ISP information, some have website on them and there are some that are black listed.

Now I’m unsure of the next course of action I should take. I’d like to find out more information about these before I start blocking ISP addresses. Could any of the more experienced members give this NooB a basic step by step list of procedures that they take before blocking the ip address? Or should I be recording more data in the trap, am I missing a crucial piece of the pie.
Obviously I don’t want to be blocking half the world.

Vimes.

Pfui

10:45 pm on Mar 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You could always research the bots themselves. In time, you'll become very familiar with the bad ones by UA (User-agent), also the bad IP addresses and/or host names (ISPs). I just compiled and posted [webmasterworld.com] a list of my favorite, most reliable 'robot research' sites so perhaps that will come in handy for you.

Thing is, seeing as how you know certain bots ignored your robots.txt instructions, I'm unsure as to why you're not blocking them right now. I mean, that's the purpose of a bad bot trap -- to catch bad bots:) So if a UA was obviously a bad bot, give it the boot.

Iczer

2:29 pm on Apr 7, 2006 (gmt 0)

10+ Year Member



If you know it is a bot, you can ban it. The problem now is that if someone is using Google Web Accelerator or Fasterfox and is prefetching, they will trigger your bot trap. So checking against a bad bot and bad domains list is a good idea. I realize this is not much help, but banning everything in your list might ban some innocent ip's.

dmje

3:50 am on Apr 15, 2006 (gmt 0)

10+ Year Member



Question for anyone, I see the following IP, 68.36.15.126, in my server logs many many times as it is trying to crawl pages but is doing so incorrectly which generates tons of 404 errors.

To give an example it looks for /public_html/myitem.html which does not exist, the item actually is in public_html/somedirectory/myitem.html. I have done a reverse lookup on the IP and all I can determine is that is it coming from comcast cable.

Is this something I need to be concerned about? If so how do I block that IP using the robots.txt file?

Pfui

3:13 pm on Apr 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry but you can't completely block or ban IPs via robots.txt, you can just Disallow User-agents (UAs), and then just those respecting robots.txt.

You can block the IP via .htaccess in a couple of different ways depending on your server, etc. We'd need more info from you about what you can do to be able to help you out.

(If you're not sure what server software you're running, do a WHOIS on yourself using http://domaintools.com [domaintools.com], formerly whois-dot-sc.)