Page is a not externally linkable
jdMorgan - 6:08 am on Jun 13, 2006 (gmt 0)
That 'attention-getting' tactic just isn't likely to work. I'm willing to accept that Yahoo! and all the other major search engines make a good-faith effort to comply with robots.txt, but that coding errors, bugs, database disconnects, and misunderstandings of the 'protocol' do happen. The only reason I ban any major 'bot from any page or cloak any page is to keep that page out of the index. And the only reasons I do that are: Bottom line is that I'm a realist and a pragmatist; This is business. So I don't ban anybody out of malice or spite. I just decide if I need their traffic or not, and if not, 403. If Yahoo! were to publish a statement that they intended to disregard robots.txt in the future, I still wouldn't ban them. But they'd be seeing a heckuva lot more in the Vary: User-agent class... ;) I posted the exact structure of robots.txt that Slurp China is choking on above, with the URLs obscured to comply with the WebmasterWorld TOS and my own desire for privacy. But other than those changes, the example is a letter-perfect rendition of my actual code. I think Yahoo! can easily test it themselves, if they're so inclined. Also, the problem is in parsing User-agent names, most likely. Anybody could do a 'less risky' test by disallowing just a single URL-path to Slurp China if they wanted to. I suspect they'd see the same failure I did. Jim
If *all* the webmasters who read here at WebmasterWorld banned *all* Yahoo user-agents...
Yahoo probably wouldn't notice.
Brought to you by WebmasterWorld: http://www.webmasterworld.com