Forum Moderators: open
This just hit one of our sites, asked for robots.txt and home page. Trouble is, the bot shouldn't have asked for anything after robots.txt as it simply disallows all crawlers:
User-agent: *
Disallow: /
From their info page:
MultiCrawler honors the Robots Exclusion Protocol.
Also, I prefer to block IPs rather than user agents where possible, as bots often change UA to try to look like browsers.
Also, I prefer to block IPs rather than user agents where possible, as bots often change UA to try to look like browsers.
Mokita,
You may use both or a combination of both effectively.
EX:
A valid IP range by a well used provider in which denying the range would affect too many innocents.
Than, UA or UA with a condition of IP is more focused.
Don
You guys are overly harsh on this one.It's an actual real semantic web research project running in a university.
Don't you want the next Google killer to escape the labs? wink
According to kiki, they are the 2d coming ;)
It's a diicult task in morals and personal preferences to keep "many third party" orgs within a delicate balance between what benefits webmasters (and their sites) and the tendency of the third party to leave their body excretions on the webmasters doorstep ;)