Msg#: 4275843 posted 11:26 am on Mar 3, 2011 (gmt 0)
I have an issue with one of my websites. One of the pages of the format http://www.example.com/page.html is indexed several times by Google like this: http://www.example.com/page.html/page1/ and so on. Basically, for some reason Google is indexing this page ending in .html with a slash (/) at the end. As a result, all internal links on this page become altered (page.html/page1/page3/page5/ and so on) resulting in a high number of apparently different pages showing the content of the oringinal page.html, only not styled. All these pages are indexed by Google and I am worried that they may see it as dup content. I couldn't find any errors with the html code on page.html.
Does anyone have any idea what could be causing this issue? Or why, instead of showing a 404 error when added the slash at the end the page.html file loads fine?
Msg#: 4275843 posted 2:38 pm on Mar 3, 2011 (gmt 0)
Sounds like the resource (file) exists and is not a directory so the trailing slash [httpd.apache.org] issue is unlikely. Next I would check on or both of these directives: AcceptPathInfo [httpd.apache.org] MultiViews [httpd.apache.org]
Msg#: 4275843 posted 7:50 pm on Mar 3, 2011 (gmt 0)
This does sound like at least a two part problem. One is the server side part, as coopster pointed out. The other is the HTML part:
As a result, all internal links on this page become altered
This you can address by not using page-relative URLs in your internal linking. A quick fix for that can be to use the <base> element in the page's head area: 4. Path information: the BASE element [w3.org]
Msg#: 4275843 posted 8:00 pm on Mar 3, 2011 (gmt 0)
Begin internal links with a slash or add the base tag.
There's a serious flaw in the site's coding. If you are using RewriteRule you will need to check this code carefully. You also need to check how your script checks if URLs are valid and generates a custom 404 error message for those that are not.
Msg#: 4275843 posted 1:52 pm on Mar 7, 2011 (gmt 0)
Thank you for the helpful replies. I'm still having this issue. While your answers give good advice I must decipher them since I am not a very techy person and I have little understanding of all these things, especially when it comes to server settings. One thing I noticed is that when I remove the php handler for .html documents in my .htaccess file I no longer see this error and adding a trailing slash to .html pages results in a 404 page. However, in this case my .php scripts stop working (scripts integrated into .html pages, retrieving data from a mysql DB).
This is the php handler I am using: php_flag allow_url_fopen on AddType application/x-httpd-php .html .php .htm .shtml