| 1:38 pm on Jun 17, 2013 (gmt 0)|
Are you using wordpress? If you are it would be very easy to add canonical links through a plug-in like yoast.
If you're not using wordpress then you need to add canonical links somewhere in the header of both pages. Both leading to the same correct page.
<link rel="canonical" href="http://www.example.com/WebmasterWorld/google-seo/no-specifics/>
For more info:-
BTW on a technical level if only one page actually exists it seems there are some probs with your htaccess file - but a quick fix would be to just add canonical.
| 1:56 pm on Jun 17, 2013 (gmt 0)|
Many thanks Savanadry.
Yes, there is just one file, i.e. index.html .
They are both pointing to the same file. Hope that helps to clarify things a little better. What should I do to correct this, and why is Google thinking that they are two pages? :)
[edited by: Robert_Charlton at 9:04 pm (utc) on Jun 17, 2013]
[edit reason] examplified domain [/edit]
| 1:57 pm on Jun 17, 2013 (gmt 0)|
Meant to say, no, not Wordpress.
| 2:03 pm on Jun 17, 2013 (gmt 0)|
Google thinks they're two URIs (not pages) because they ARE two URIs.
Think about it - in the bath directory, you could have an index.html, an index.htm and an index.php and they could all be very different. Which file is the default just depends on how your webserver is configured (and Google won't know that) So it takes everything it finds and indexes it separately.
I would probably make sure the /bath/index.html redirects to /bath/, and so on for directory URI. I don't want any index.anything in Google.
| 2:12 pm on Jun 17, 2013 (gmt 0)|
Thanks. I put this in the .htaccess file last week:
Would that not fix it? It's much easier for me to link to the files as .index.html . I really appreciate your thoughts here.
| 2:34 pm on Jun 17, 2013 (gmt 0)|
I would never link to the files as index.html Never never never.
And no, I don't think that .htaccess line will do it. If you're using Apache, best to ask in the Apache forum.
| 2:38 pm on Jun 17, 2013 (gmt 0)|
>I would never link to the files as index.html Never never never.
Why do you say that? I've done it for ten years and get around 3 million uvs a month, so it never seemed to do any harm. This is a new thing that has suddenly started happening, after I've done a bit of restructuring. If there is a good reason not to do it, then I'll remove the index.htmls.
| 2:58 pm on Jun 17, 2013 (gmt 0)|
I really do appreciate your help here and would love to get this sorted. If I take out all links to index.html - do you think that this will resolve this problem over time?
| 4:41 pm on Jun 17, 2013 (gmt 0)|
Read this. Twice.
| 9:18 pm on Jun 17, 2013 (gmt 0)|
After you read that one twice, give this one a shot as well....
Duplicate content from index.html
|WebmasterWorld Information: We don't allow specifics |
And a mods note: I also recommend that you reread the Google Forum Charter [webmasterworld.com], which explains our linking policy. We do not offer public site reviews. Use example.com instead of your own domain.
I suggest you read the Charter two or three times as well.
| 6:22 am on Jun 18, 2013 (gmt 0)|
Thank you. This is all very helpful. I have removed all links on the site to index.html now. Will this resolve the problem in time, naturally, or should I put something in the .htaccess file or similar? If so, what would you recommend?
[edited by: mtreasure at 6:25 am (utc) on Jun 18, 2013]
| 6:24 am on Jun 18, 2013 (gmt 0)|
I should have said that the link to index.html was on around 12,000 pages. I'm guessing it will sort itself out in time, but what about other sites linking to us as a ../index.html link?
| 7:10 am on Jun 18, 2013 (gmt 0)|
you need to add a rule to your configuration that redirects all requests for the default directory index document to the directory itself with a 301 status code. (i.e. a url with a trailing slash)
that will provide the proper response to handle referred requests that include the index document file name in the url as well as search engines requesting to crawl these legacy urls.
linking internally to the canonical urls is a good signal of quality and intent but as long as requests for the index.html url resolve to a 200 OK response then your google index problem won't resolve itself.
| 7:17 am on Jun 18, 2013 (gmt 0)|
Thank you for that. I suspected something like that would be necessary. Please would you mind explaining exactly what I need to do? I'm not sure where to start with this exactly. Your patience here is much appreciated!
| 8:21 am on Jun 18, 2013 (gmt 0)|
btw welcome to WebmasterWorld, Martin!
as netmeg posted earlier:
|If you're using Apache, best to ask in the Apache forum. |
this thread would be a good place to start.
.htaccess redirects for index.html, index.php and index.htm to /:
try some code - if it doesn't work, report what you tried and your results and you will get some help in the Apache forum.