Stopping directory access

Forum Moderators: phranque

Message Too Old, No Replies

Stopping directory access

reggy

12:57 pm on Jun 20, 2007 (gmt 0)

Hi,
when i look at the no of pages google has indexed for my site, am seeing a lot of my directorys being listed

Example

/images/
/images1/
/bluewidgets/

How can i stop this from happening?

If say i was to disallow them in the robots.txt file, wont it stop the .html files inside the directory from being indexed?

Thanks

jdMorgan

1:20 pm on Jun 20, 2007 (gmt 0)

> If say i was to disallow them in the robots.txt file, wont it stop the .html files inside the directory from being indexed?

Yes it would, because robots.txt uses prefix-matching; If the to-be-requested URL prefix matches the prefix specified in a Disallow directive, then that URL won't be fetched.

The usual solution on Apache is to use


Options -Indexes

in httpd.conf or .htaccess to forbid directory access. If an attempt is made to fetch a directory and no DirectoryIndex-defined document exists, then a 403-Forbidden response is returned.

However, the question remains: How did Google find URLs pointing to your directories? I am seeing Yahoo! Slurp attempting to spider unlinked directories recently, but I don't recall Google ever attempting to fetch unlinked URLs (with the obvious exceptions of robots.txt and sitemap.xml).

Jim

reggy

3:52 pm on Jun 21, 2007 (gmt 0)

Thanks jdmorgan - i believe your soln has worked fine - i guess google must have picked up the directories in a old sitemap file that i had