Forum Moderators: goodroi
Ok the situation I find myself in is this,
I’ve created a bot trap and blocked it in the robots.txt file then placed a link to it on my homepage. It’s been running for a week now, and I’ve got a list of ip address some have a home page referrer and a browser / user-agent and others without the referrer and just an user-agent / browser tag.
So I whois’d the ip addresses and came back with a mixed bunch of results.
Most return ISP information, some have website on them and there are some that are black listed.
Now I’m unsure of the next course of action I should take. I’d like to find out more information about these before I start blocking ISP addresses. Could any of the more experienced members give this NooB a basic step by step list of procedures that they take before blocking the ip address? Or should I be recording more data in the trap, am I missing a crucial piece of the pie.
Obviously I don’t want to be blocking half the world.
Vimes.
Thing is, seeing as how you know certain bots ignored your robots.txt instructions, I'm unsure as to why you're not blocking them right now. I mean, that's the purpose of a bad bot trap -- to catch bad bots:) So if a UA was obviously a bad bot, give it the boot.
To give an example it looks for /public_html/myitem.html which does not exist, the item actually is in public_html/somedirectory/myitem.html. I have done a reverse lookup on the IP and all I can determine is that is it coming from comcast cable.
Is this something I need to be concerned about? If so how do I block that IP using the robots.txt file?
You can block the IP via .htaccess in a couple of different ways depending on your server, etc. We'd need more info from you about what you can do to be able to help you out.
(If you're not sure what server software you're running, do a WHOIS on yourself using http://domaintools.com [domaintools.com], formerly whois-dot-sc.)