Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Do I need a blank index page in every directory to avoid 403 errors?

         

AndyA

11:28 pm on Jun 26, 2010 (gmt 0)

10+ Year Member



I have directory indexing on my server set to return a 403, as I don't want people seeing the Apache directories. Some directories on my site have index pages, and I link to them using /directory/, not /directory/index.html.

Now I have errors showing in WMT. One is a 403 error for:

example.com/directory/

I have files in that directory that are linked to, but the nature of that particular directory makes an index.html page unnecessary.

In addition, I now have not found error messages listed for real pages on my site, only they are in the directory that has been returning the 403! i.e.,

example.com/directory/realpage1.html
example.com/directory/realpage2.html

Those pages do exist, but they are located at root:

example.com/realpage1.html etc.

What is the best way to deal with this? Putting a blank index page in every single directory seems like a waste of time, especially when the directory only has images in it that are linked to from other pages on my site, or I have one directory that is an archive of old pages, that have links going directly to the pages, but an index page in that directory would be an extra chore to maintain.

What's the best way of dealing with this? Thanks.

jdMorgan

3:28 am on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For the 403 errors on directories that do not have an index page and for which you do not wish to have an index page, the best way of dealing with it is to ignore the 'error' in GWT, because the error is in GWT itself and not on your site.

Do be sure you're not accidentally linking to those directory URLs, but otherwise, take the GWT errors with a grain of salt.

I'm not sure what you're saying about the example.com/directory/realpage1.html URLs. If those pages actually reside at example.com/realpage1.html and if you have no links to example.com/directory/realpage1.html anywhere on your site then again, that is an error in GWT and not on your site.

I guess you should also check to make sure you don't have any mis-coded redirects or internal rewrites that could be confusing the 'bot.

It's good to take errors reported in GWT seriously, but on the other hand, Google is far from infallible. For example, due to the complex nature of the robots.txt file on one of my sites, GWT reports that Googlebot is denied from the site. It isn't denied at all, it's just that Google uses a different (and defective) robots.txt parser for GWT than they use for the real Googlebot (big mistake). It 'sees' the robots.txt file differently and thinks Googlebot is denied when it isn't.

On the other hand, the real Googlebot parses that robots.txt file quite correctly, goes where it should, and does not go where it shouldn't. The result is a bunch of #1 and #2 listings for relevant keywords and phrases, indented listings ("sitelinks"), etc. for a site that GWT reports as un-indexable... So the bottom line is that GWT has bugs, and you just may have found another one.

Jim

bwnbwn

4:40 am on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



AndyA why can't you just 301 the index page to the first page in the directory that you want to be dispalyed.

AndyA

1:11 pm on Jun 27, 2010 (gmt 0)

10+ Year Member



@jdMorgan:
I think what you said is exactly what's happening. Google is responsible for the errors. I'm pretty certain there are no links going to that directory without a specific page in that directory being referenced. I checked my Sitemap to be certain, and it just lists links to pages in the directory, but not the directory itself. (I do panic a bit when I see lots of errors listed.)

@bwnbwn:
Normally I would do that, but for instance with the archived blog pages, there isn't one particular page that I want displayed. I suppose I could 301 back to the current months blog page, which is not in the archive directory, but it has links to all the older pages on it.

While thinking about what to do with this situation, it did occur to me that there might be some benefit to having a customized index page in each directory, at which point I would have the opportunity to manually direct people where they might want to go. Seems like a lot of work, though, for the possible benefit of a couple visitors a month. I could always tell the SEs to not index via robots, which could be easily undone if I decided to use them at some point down the road.

Thanks for the advice!

jdMorgan

4:17 pm on Jun 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> "I suppose I could 301 back to the current months blog page" [emphasis mine]

Because of the "Moved Permanently" meaning of a 301 redirect and the 'polluted' meaning of 302 (sort of a confused mixture of "Moved Temporarily" in HTTP/1.0 and an ill-defined "Found" in HTTP/1.1, with the "Moved Temporarily" meaning now taken over by 307), I'd recommend a 303-See Other in this case, should you decide to redirect.

Just as a 404 says, "I can't find a resource to associate with that requested URL, but I'm not saying why," 303 says, "See this other URL, and I'm not saying why." There is no Moved/Permanent/Temporary meaning in it, just "Please use this other URL."

In this way, you avoid the problem of saying that the never-changing directory-index URLs have been permanently moved to an ever-changing "current month's" blog page URL.

Jim

AndyA

6:36 pm on Jun 27, 2010 (gmt 0)

10+ Year Member



That's a great idea, Jim. I'll do that instead because if I ever decide to add an index in that archive directory, it will be easy enough to do. I've never used a 303 before, I'll see what happens.

Thanks again!