Forum Moderators: open
Google visits regularly and honors the robots.txt, as do the other major search engines. The file validates so there seems to be no logical reason for the indexing.
(My spider links page will be generating another 50 or 100 thousand bogus URL's to follow later today. Should keep some errant bot busy for a few extra minutes.)
Jeevesguy:
Is there a current email address for reporting a problem with your robot? I sent a message via your site's "feedback" form. (Again.)
What are the odds of getting all references to a particular site removed from your index?
Wayne
Was briefly 'live' as a would-be shopping site and was linked, again briefly, through a only couple of shared files from several of my content sites. It's more "on hold" now than "under construction."
The spider trap directory was always disallowed and for some time now, the whole site is. (Ever since Ask Jeeves first spidered the trap directory last April.)
I'm not one to suspect Jeeves has any motive here. I have the same directory on my other sites and have never seen them fall into the trap on any of them. That's why I looked again at my robots.txt file.
Wayne
When I used the robots.txt validation tool I found that later in the file another set of disallow entries (different than above) were included. A cut and paste screw-up, I guess. The file validated but told me there were duplicates. In effect, I was rewriting the file halfway through and the bot seems to have restarted itself at the point it saw the second User-agent: *.
Checked all my other robots.txt files to make sure they're ok.
Thanks to JeevesGuy for contacting me via stickymail.
Wayne