Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Category index pages don't exist - but are indexed in google

         

Rasputin

2:44 pm on Aug 25, 2011 (gmt 0)

10+ Year Member



I have raised this question on the google webmasterblog but think it will be interesting to members of this forum as well.

We all know it is good practice to put an empty index.php file or similar in each directory of a site so that people can't see a list of all the files in that directory just by typing in
site:www.domain.com/category/

I had overlooked this for some directories that are generated automatically for an image gallery I use which had hundreds of subdirectories like www.domain.com/category/section1, section2 etc

The issue is that google indexes these lists as if they are pages, but of course they are not really site pages at all, they are just browser generated lists of the files that exist in the directory.

You can see this for any other site that hasn't put blank index files in every directory by searching site:www.domain.com/ index (the word index is necessary because the indexed page is titled 'Index of...directory name')

So I have about 500 completely meaningless pages indexed by google. I'm wondering if this could cause me any problems, particularly with panda which looks at the quality of pages on the whole site? (These pages are a very poor user experience!)

I can't help thinking it is also a bad idea to let others find lists of all pages on a site in this way (where directories don't have the index file added).

Any idea whether I should worry, or how to get the pages removed from the index?

Thanks

deadsea

5:01 pm on Aug 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They are NOT browser generated. Your web server is generating the list of files into a page. Browsers and crawlers get back a valid HTML document at that url.

Rather than put a blank index document, I typically redirect the directory back to the root. In .htaccess you do so by redirecting the index document.

redirect permanent /images/index.html http://example.com/

Then users can still access images files such as
http://example.com/images/pic.jpg
but
http://example.com/images/
redirects to the home page.

You can also set your webserver such that it doesn't generate pages for directory listings. On apache it is the Indexes option. It can be turned off in htaccess or the main conf file with:
Options -Indexes

Then it generates a 403 forbidden response instead of the directory listing.

Rasputin

5:22 pm on Aug 25, 2011 (gmt 0)

10+ Year Member



Thanks for the explanation, I'll change the htaccess file

Do you think having all these meaningless files indexed is a problem?