Forum Moderators: open
Recently I wrote a little Perl utility called <widget.cgi> to help me administer a site. Of course I never linked to it from anywhere, and my logfiles and log reports aren't visible on the web without a password. I also don't have any toolbars installed in my browser.
Well, just today I see that this file is showing up in a search for my domain name, because of the output it generates. (Yeah, I should have blocked any IP besides my own from running it, but that's another matter.) I wondered how Yahoo could have found it.
I did a <link:page> command to see what page linked to the CGI file, it the one result that came back was <domain.com/directory>, which is just a directory listing, since </directory> has no <index.html> file. There have never been any links to </directory> itself, just to the things inside it. I've since added an empty <index.html> file to prevent casual viewing of that directory.
So it looks to me as though Yahoo is looking for every file in a directory it can find, if it's allowed to see them, even if they're not linked.
[edited by: martinibuster at 4:06 pm (utc) on Jan. 13, 2005]
[edit reason] widgetized [/edit]
Turn off the autoindexing option on your server.
Yeap - but what's with programs like nessus - how do they locate directories not linked anywhere and not autoindexed due index.html?
I do not have forbidden dirs in robots.txt for hiding reasons and do block this directories within the <DIRECTORY> container. How can they scan/browse for my dirs?
Thanks, xcomm
Place an Option -Indexes statement if server is Apache
If your server is other than Apache, read the fine Manual ,you are on your own, good luck.
Option statement options:
[+¦-]Indexes
If a URL which maps to a directory is requested, and there is no DirectoryIndex (e.g., index.html) in that directory, then mod_autoindex will return a formatted listing of the directory.
For instance:
The so called "highlight" bug in phpBB ... now why did both the phpBB folks and the PHP folks release fixes aimed at this ...
Interesting being a nobody .... oh well back to the drawing board.
"Recently I wrote a little Perl utility called <widget.cgi>"
Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?
There are tons of such tracking systems out there.
Internet connected systems leak information like a split in half supertankers.
Yahoo's bot may also choping off the tail end of urls as it does it work .... like maybe a buffer (hunk of memory) somewhere is too small ....
In any event, if the webserver isn't producing an index for the directory, a path exists through _code on your system_ that allows such a directory to be produced.
Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?
Nope. My program didn't access other pages, and I never went to another external page from it.
Now is it really Yahoo that is hiting the directory or something faking being Yahoo?
If they're faking Yahoo then they're doing a pretty good job, since that means they also got the unlinked directory into Yahoo's database (which was the point of my original post).