Yahoo finding secret, unlinked pages? - (deprecated) Yahoo SE and Directory forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Yahoo finding secret, unlinked pages?

Yahoo seems to try to find all files in a directory

MichaelBluejay

8:40 pm on Jan 10, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Apologies if this has been discussed here already; I couldn't find anything already posted about it.

Recently I wrote a little Perl utility called <widget.cgi> to help me administer a site. Of course I never linked to it from anywhere, and my logfiles and log reports aren't visible on the web without a password. I also don't have any toolbars installed in my browser.

Well, just today I see that this file is showing up in a search for my domain name, because of the output it generates. (Yeah, I should have blocked any IP besides my own from running it, but that's another matter.) I wondered how Yahoo could have found it.

I did a <link:page> command to see what page linked to the CGI file, it the one result that came back was <domain.com/directory>, which is just a directory listing, since </directory> has no <index.html> file. There have never been any links to </directory> itself, just to the things inside it. I've since added an empty <index.html> file to prevent casual viewing of that directory.

So it looks to me as though Yahoo is looking for every file in a directory it can find, if it's allowed to see them, even if they're not linked.

[edited by: martinibuster at 4:06 pm (utc) on Jan. 13, 2005]
[edit reason] widgetized [/edit]

theBear

3:22 pm on Jan 13, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Turn off the autoindexing option on your server.

johnlim

3:34 pm on Jan 14, 2005 (gmt 0)

10+ Year Member

How to "Turn off the autoindexing option on your server"?

xcomm

10:43 pm on Jan 14, 2005 (gmt 0)

10+ Year Member

Hi theBear,

Turn off the autoindexing option on your server.

Yeap - but what's with programs like nessus - how do they locate directories not linked anywhere and not autoindexed due index.html?
I do not have forbidden dirs in robots.txt for hiding reasons and do block this directories within the <DIRECTORY> container. How can they scan/browse for my dirs?

Thanks, xcomm

theBear

7:46 pm on Jan 19, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

In the proper directory container:

Place an Option -Indexes statement if server is Apache

If your server is other than Apache, read the fine Manual ,you are on your own, good luck.

Option statement options:

[+¦-]Indexes

If a URL which maps to a directory is requested, and there is no DirectoryIndex (e.g., index.html) in that directory, then mod_autoindex will return a formatted listing of the directory.

theBear

7:52 pm on Jan 19, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I haven't a clue about that particular scanning tool .... however it uses known holes and several products have been known to provide ways of doing regular directory reading through them as opposed to using the servers methods.

For instance:

The so called "highlight" bug in phpBB ... now why did both the phpBB folks and the PHP folks release fixes aimed at this ...

Interesting being a nobody .... oh well back to the drawing board.

MichaelBluejay

9:59 am on Jan 21, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

None of this really has anything to do with my post. I can prevent Yahoo from getting to stuff I don't want it to get to in varous ways. The issue was, it looks like Yahoo is scanning directories, even if the directory itself isn't linked from somewhere (just something IN the directory is linked).

theBear

3:54 pm on Jan 21, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You said:

"Recently I wrote a little Perl utility called <widget.cgi>"

Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?

There are tons of such tracking systems out there.

Internet connected systems leak information like a split in half supertankers.

theBear

4:03 pm on Jan 21, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Now is it really Yahoo that is hiting the directory or something faking being Yahoo?

Yahoo's bot may also choping off the tail end of urls as it does it work .... like maybe a buffer (hunk of memory) somewhere is too small ....

In any event, if the webserver isn't producing an index for the directory, a path exists through _code on your system_ that allows such a directory to be produced.

MichaelBluejay

11:22 am on Jan 26, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?

Nope. My program didn't access other pages, and I never went to another external page from it.

Now is it really Yahoo that is hiting the directory or something faking being Yahoo?

If they're faking Yahoo then they're doing a pretty good job, since that means they also got the unlinked directory into Yahoo's database (which was the point of my original post).