homepage Welcome to WebmasterWorld Guest from 50.17.21.7
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
Forum Library, Charter, Moderators: martinibuster

Yahoo Search Engine and Directory Forum

    
Yahoo finding secret, unlinked pages?
Yahoo seems to try to find all files in a directory
MichaelBluejay




msg:831932
 8:40 pm on Jan 10, 2005 (gmt 0)

Apologies if this has been discussed here already; I couldn't find anything already posted about it.

Recently I wrote a little Perl utility called <widget.cgi> to help me administer a site. Of course I never linked to it from anywhere, and my logfiles and log reports aren't visible on the web without a password. I also don't have any toolbars installed in my browser.

Well, just today I see that this file is showing up in a search for my domain name, because of the output it generates. (Yeah, I should have blocked any IP besides my own from running it, but that's another matter.) I wondered how Yahoo could have found it.

I did a <link:page> command to see what page linked to the CGI file, it the one result that came back was <domain.com/directory>, which is just a directory listing, since </directory> has no <index.html> file. There have never been any links to </directory> itself, just to the things inside it. I've since added an empty <index.html> file to prevent casual viewing of that directory.

So it looks to me as though Yahoo is looking for every file in a directory it can find, if it's allowed to see them, even if they're not linked.

[edited by: martinibuster at 4:06 pm (utc) on Jan. 13, 2005]
[edit reason] widgetized [/edit]

 

theBear




msg:831933
 3:22 pm on Jan 13, 2005 (gmt 0)

Turn off the autoindexing option on your server.

johnlim




msg:831934
 3:34 pm on Jan 14, 2005 (gmt 0)

How to "Turn off the autoindexing option on your server"?

xcomm




msg:831935
 10:43 pm on Jan 14, 2005 (gmt 0)

Hi theBear,

Turn off the autoindexing option on your server.

Yeap - but what's with programs like nessus - how do they locate directories not linked anywhere and not autoindexed due index.html?
I do not have forbidden dirs in robots.txt for hiding reasons and do block this directories within the <DIRECTORY> container. How can they scan/browse for my dirs?

Thanks, xcomm

theBear




msg:831936
 7:46 pm on Jan 19, 2005 (gmt 0)

In the proper directory container:

Place an Option -Indexes statement if server is Apache

If your server is other than Apache, read the fine Manual ,you are on your own, good luck.

Option statement options:

[+¦-]Indexes

If a URL which maps to a directory is requested, and there is no DirectoryIndex (e.g., index.html) in that directory, then mod_autoindex will return a formatted listing of the directory.

theBear




msg:831937
 7:52 pm on Jan 19, 2005 (gmt 0)

I haven't a clue about that particular scanning tool .... however it uses known holes and several products have been known to provide ways of doing regular directory reading through them as opposed to using the servers methods.

For instance:

The so called "highlight" bug in phpBB ... now why did both the phpBB folks and the PHP folks release fixes aimed at this ...

Interesting being a nobody .... oh well back to the drawing board.

MichaelBluejay




msg:831938
 9:59 am on Jan 21, 2005 (gmt 0)

None of this really has anything to do with my post. I can prevent Yahoo from getting to stuff I don't want it to get to in varous ways. The issue was, it looks like Yahoo is scanning directories, even if the directory itself isn't linked from somewhere (just something IN the directory is linked).

theBear




msg:831939
 3:54 pm on Jan 21, 2005 (gmt 0)

You said:

"Recently I wrote a little Perl utility called <widget.cgi>"

Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?

There are tons of such tracking systems out there.

Internet connected systems leak information like a split in half supertankers.


theBear




msg:831940
 4:03 pm on Jan 21, 2005 (gmt 0)

Now is it really Yahoo that is hiting the directory or something faking being Yahoo?

Yahoo's bot may also choping off the tail end of urls as it does it work .... like maybe a buffer (hunk of memory) somewhere is too small ....

In any event, if the webserver isn't producing an index for the directory, a path exists through _code on your system_ that allows such a directory to be produced.

MichaelBluejay




msg:831941
 11:22 am on Jan 26, 2005 (gmt 0)

Is it possible that somewhere somehow that the widget.cgi routine url got out into the "wild" via your very own browser providing it as the refering page to a website that in turn created a link to it via a system that they run that shows referers and refering urls?

Nope. My program didn't access other pages, and I never went to another external page from it.

Now is it really Yahoo that is hiting the directory or something faking being Yahoo?

If they're faking Yahoo then they're doing a pretty good job, since that means they also got the unlinked directory into Yahoo's database (which was the point of my original post).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved