Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

the good, the bad and the above the law?

10:03 pm on May 12, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 845

#1 The Good
A while back I had a visit from an exquisitely well-behaved robot. Before anything else, it picked up robots.txt and assimilated its contents. It then went around reading all pages and following all "<a href" links, carefully omitting anything inside a Disallowed directory. It preceded each GET with a HEAD, and spaced its visits an average of 5 seconds apart. I was so gratified that I didn't even investigate its credentials. They could be the most evil people in the world; they've got a polite robot and it's welcome any time.

(Aside: It did mystify me by trying to find a batch of nonexistent index pages, but this turned out to be my fault. I'd recently added links in one directory and, er, goofed in the addresses. Thanks, robot!)

Timing wasn't ideal, though, because just a few days later I changed my mind about one directory, disallowed it in robots.txt and added "nofollow" to all its links. (Belt and suspenders principle.)

#2 The bad
Within hours, an unrelated robot drifted by and picked up a few random pages from the now-disallowed directory. Later still, it picked up the revised robots.txt. Ten hours later it came by again and picked up nothing but four pages in the disallowed directory.

My .htaccess file now contains this line (obfuscation done manually because text editor's rotate-13 is broken):

RewriteCond %{HTTP_USER_AGENT} Tbbtyrobg [OR]
RewriteCond %{REMOTE_ADDR} 72\.14\.\d+\.\d+
RewriteRule silence/ - [F]

(The syntax of the rule looks wrong to me, but it's the only thing I could find that works.)

#3 The above-the-law
Had it been any other robot, it would have gone straight into the "Deny from" list. Evidently they are outsourcing their robots.txt handling, rather than processing it on the spot like the Good Robot above. Grr.

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members