homepage Welcome to WebmasterWorld Guest from 54.211.219.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
Same page erroneously indexed several times
vivalasvegas

5+ Year Member



 
Msg#: 4275843 posted 11:26 am on Mar 3, 2011 (gmt 0)

I have an issue with one of my websites. One of the pages of the format http://www.example.com/page.html is indexed several times by Google like this: http://www.example.com/page.html/page1/ and so on. Basically, for some reason Google is indexing this page ending in .html with a slash (/) at the end. As a result, all internal links on this page become altered (page.html/page1/page3/page5/ and so on) resulting in a high number of apparently different pages showing the content of the oringinal page.html, only not styled. All these pages are indexed by Google and I am worried that they may see it as dup content. I couldn't find any errors with the html code on page.html.

Does anyone have any idea what could be causing this issue? Or why, instead of showing a 404 error when added the slash at the end the page.html file loads fine?

Thanks.

 

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4275843 posted 2:38 pm on Mar 3, 2011 (gmt 0)

Sounds like the resource (file) exists and is not a directory so the trailing slash [httpd.apache.org] issue is unlikely. Next I would check on or both of these directives:
AcceptPathInfo [httpd.apache.org]
MultiViews [httpd.apache.org]

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4275843 posted 7:50 pm on Mar 3, 2011 (gmt 0)

This does sound like at least a two part problem. One is the server side part, as coopster pointed out. The other is the HTML part:

As a result, all internal links on this page become altered

This you can address by not using page-relative URLs in your internal linking. A quick fix for that can be to use the <base> element in the page's head area: 4. Path information: the BASE element [w3.org]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4275843 posted 8:00 pm on Mar 3, 2011 (gmt 0)

Begin internal links with a slash or add the base tag.

There's a serious flaw in the site's coding. If you are using RewriteRule you will need to check this code carefully. You also need to check how your script checks if URLs are valid and generates a custom 404 error message for those that are not.

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4275843 posted 11:17 pm on Mar 3, 2011 (gmt 0)

You also need to check how your script checks if URLs are valid and generates a custom 404 error message for those that are not.


Sage advice. I often forget to mention this point and it is crucial.

vivalasvegas

5+ Year Member



 
Msg#: 4275843 posted 1:52 pm on Mar 7, 2011 (gmt 0)

Thank you for the helpful replies. I'm still having this issue. While your answers give good advice I must decipher them since I am not a very techy person and I have little understanding of all these things, especially when it comes to server settings. One thing I noticed is that when I remove the php handler for .html documents in my .htaccess file I no longer see this error and adding a trailing slash to .html pages results in a 404 page. However, in this case my .php scripts stop working (scripts integrated into .html pages, retrieving data from a mysql DB).

This is the php handler I am using:
php_flag allow_url_fopen on
AddType application/x-httpd-php .html .php .htm .shtml

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved