Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies


tools request, ignore robots.txt

10:52 pm on Jan 6, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1

The org's aims are honorable, but their tools dishonor robots.txt. Every. Single. Time.

Examples from the past few days where the robots.txt they Get is always, and only --

User-agent: *
Disallow: /

-- and is immediately, and repeatedly ignored:

BOIA-Scan-Agent/LC 1.0 (www.boia.org)
06:55:17 /robots.txt
06:55:18 /homepage.html

BOIA-Scan-Agent/LC 1.0 (www.boia.org)
06:12:25 /robots.txt
06:12:26 /homepage.html

LinkChecker/7.3 (+http://linkchecker.sourceforge.net/)

01/0515:55:44 /robots.txt
01/0515:55:45 /homepage.html
01/0515:55:52 /robots.txt
01/0516:09:58 /robots.txt
01/0516:09:59 /homepage.html
01/0516:10:00 /robots.txt

Note hits from both Hosts:

= Mendon Cox Communications

= Cumberland Cox Communications

Apparent referrers (by registered users?) are typically .edu, and also repetitive. But I seriously doubt individuals are sitting there entering my site's home page into boia.org's 'free scan' box over and over and over again at all, let alone for months on end.

Bottom Line:

Regardless of Host/IP, UA, and/or REF, robots.txt is always ignored.
7:06 am on Jan 7, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 1, 2011
posts: 192
votes: 0

The organizations that think they have an "honorable" purpose are the worst offenders of all.

Info Trackers and Mark Scanners as I usually call them.
The Mark scanners especially, out hunting for trademark abuse and stolen copyrighted images and content.

They think that because they serve a "righteous" purpose they have the right to rip off whole web-sites and every image on a site to check it all. Over and Over.

Mark Monitor is one of the worst of all, although there are quite a few of them. Cyveilance and Name Intelligence (also owns Domain Tools [dot] com) just to mention a couple.

Plus, Mark monitor has now added at least one shell company (recently named "Brand Certified" with different IP ranges) to hide some of their activities behind.

I have added all the IP ranges I know for them to my DNSBL blocks as policy blocks. I do not want to see them, and they are met with nothing but 403's.

I have absolutely no stolen content, and they are welcome to "investigate" by hand using normal human visitors.
But just as I would not let all these "private detectives" into my house to riffle through all my closets and drawers to check if I might own counterfeit Nike's, fancy hand-bags, or have a drawer full of stolen gold-watches, without the police and a court-order at their side, I will not have them steal my server and network bandwidth either, "just in case I might be a thief". They are really abusive, running much faster that any other normal bots.

Having a "righteous" cause does not make network and server bandwidth theft legal.
By Texas law, I could haul them off to court for hacking and illegal access. So also in many other states that have similar laws against "unauthorized" access to a network or server.

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members