just wondering what would be considered better practice for directories that have no index page, a no-indexed (robots meta) index page with links to relevant pages elsewhere on the site or just leave with no index page and forbid directory browsing in htaccess. yes i know this arises from poor design just wondered what would work best for google from this point.
no its not a question of not browsing its a question of how to treat what are clearly low grade pages to google when being spidered. Doing nothing means it gets an a directory listing which google themselves index as an index html page. Clearly bad. if i turn off directory browsing google gets 100's of not authorized returns, so it doesnt know what is being forbidden. If i add an index page and put no-index in the meta then it sees 100's of no-index pages which will be about 10% of the site, wonder if thats a problem given the thoughts that such pages if a significant amount MAY hurt the site overall.
A noindex meta tag in a subdirectory's index.html file should not have any effect on other pages in the same directory, BUT that assumes that those pages do have some other pages somewhere in your navigation that links to them and that they are in your sitemap. Or you could make the pages more useful and let them be indexed as /directoryname/. If the pages you want indexed have inbound links and are in your sitemap -and don't have a meta noindex tag they should get crawled and indexed just fine without the index.html pages being indexed or affecting other pages.
I recently needed to do something similar for a site where the folders previously did not contain any html pages and an urgent site redo had me end up with pages I want indexed in those subdirectories where I had had a noindexed index.html for years. I'm slowly working the site into proper structure, but those index pages were being shown in the sitemap until I changed them to index.php files. Now they sit there doing nothing except covering lists of files from nosey things. When I'm done they will be indexed pages with the url of /subdirectoryname/
Theoretically if there are no references to the root of the directory, it won't have any real side effect - or at least not one worth worrying about. That said, I've seen hungry Googlebot try directory roots just to see what happens, so best practice would suggest that you need those URLs to do something.
If you create any content at the URL, then you run the risk of creating additional, low quality URLs with no particular purposes, so I would say either refuse the requests, or if there's somewhere appropriate, redirect them. I wouldn't serve anything with a 200, robots excluded or otherwise - that just creates new content to be evaluated.
You can stick in every directory the same index.php which just returns 404 headers or 403 headers when the directory is requested.
But that creates additional files that require additional management. Options -indexes in htaccess will 403 every directory root that doesn't contain an index file, which seems like a more elegant solution if a 4xx response is desired.
i think the 404 route sounds the best idea. It ensures the index pages cannot be counted as content and prevents directory browsing. I just dont like the idea of 100's of forbidden requests where once there was indeed a page. I will probably make the headers return a 410.