Apologies if this has been discussed here already; I couldn't find anything already posted about it.
Recently I wrote a little Perl utility called <widget.cgi> to help me administer a site. Of course I never linked to it from anywhere, and my logfiles and log reports aren't visible on the web without a password. I also don't have any toolbars installed in my browser.
Well, just today I see that this file is showing up in a search for my domain name, because of the output it generates. (Yeah, I should have blocked any IP besides my own from running it, but that's another matter.) I wondered how Yahoo could have found it.
I did a <link:page> command to see what page linked to the CGI file, it the one result that came back was <domain.com/directory>, which is just a directory listing, since </directory> has no <index.html> file. There have never been any links to </directory> itself, just to the things inside it. I've since added an empty <index.html> file to prevent casual viewing of that directory.
So it looks to me as though Yahoo is looking for every file in a directory it can find, if it's allowed to see them, even if they're not linked.
[edited by: martinibuster at 4:06 pm (utc) on Jan. 13, 2005]
[edit reason] widgetized [/edit]