Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Issue with index.html extensions

         

realmaverick

6:17 pm on Apr 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A new one for me today in WMT. Google was reporting a 404 for /widgets/ and lists the pages that link to it. The actual page is /widgets/index.html

This is the only url on my website, that uses the index.html extension. Simply because that's the way its always been, never seen the point in redirecting it.

I checked ALL of the pages that apparently link to /widgets/ and not one of them do. They all link to /widgets/index.html

Index.html is obviously the default file for a folder. And isn't really required. But the way it's setup /widgets/ doesn't work.

Other than rename the index.html to widgets.html, what can I do? It seems strange that Google would assume a link to /widgets/index.html was to /widgets/ especially as many sites still use the index.html extension. Fortunately this is only a single page, but still.

g1smd

10:29 pm on Apr 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirect (using a RewriteRule and [R=301,L] flags) the index URL to one ending with slash.

Add the DirectoryIndex index.html directive to the .htaccess file in that folder.

The above assumes you use Apache. :)

deadsea

11:35 am on Apr 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You are probably the only webmaster on the internet that has set it up such that /widgets/ doesn't work but /widgets/index.html does work. It sounds like Google is making an assumption that is true almost all the time but which doesn't work for you.

I've never liked the way that Google handles default documents, www vs no www, etc. It seems like they should crawl the site, identify pages that are identical, and merge them into the "best" url for pagerank and indexing purposes once they have made the determination of identicalness. Instead it requires canonicalization, settings in webmaster tools, and it would appear that they make faulty assumptions some times.

Robert Charlton

8:14 pm on Apr 20, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It seems like they should crawl the site, identify pages that are identical, and merge them into the "best" url for pagerank and indexing purposes once they have made the determination of identicalness.

In a way, Google does "merge" them. If you don't make a choice, Google filters out the dupe versions. But, depending on your inbound links, what Google decides is the "best" url might not be what you would have chosen.

That's why it's ideal to use 301 redirects if you can... to assure that only one url is available to others on the web to link to. 301s also merge PageRank from existing links to different urls, so your link votes aren't split up.

An alternative approach, not as good as 301s, is to use the canonical link element. One concern with the canonical element is that we don't know whether Google combines PageRank or not.

See this Matt Cutts video, introducing the canonical link element, which might help explain the issues....

Canonical Link Element
http://www.youtube.com/watch?v=Cm9onOGTgeM [youtube.com]

Again, as Matt points out, 301s are preferable to the canonical link element if you can use them.