|301 directory to its own index page|
Dealing with Google duplicate pages
Hi all, long time member, first time poster.
I'm facing a peculiar situation. I just found my Google WMT tools 'Html Suggestions' showing a massive increase in duplicate page titles and meta descriotions (like 250 overnight). As far as I can tell, Google is suddenly treating EVERY directory and its index page as separate URLs.
e.g. http://www.example.com/directory/ and
are showing as dupes.
I know that technically they are unique URLs, but its never been an issue before AND all index pages have self referencing canonical tags. Just to make it clear, there is no page at /directory/. All my pages have actual page URLs like /directory/index.htm, /directory/page.htm, etc.
I'm guessing the way to handle this is to 301 all '/directory/' to the actual page (e.g. '/directory/index.htm')
I've tried creatng htaccess redirects like
Redirect 301 /directory/ http://www.example.com/directory/index.htm
and similar rewrite rules but end up either in an endless loop or getting a 403 forbidden error.
I know how to redirect a specific page like /directory/index.htm to /directory/, but can't get it to work the other way around.
Can anyone ofer an easy way to do this (or an alternative to 301 if that's not the best way to approach this? Just to be clear, I don't want to get into an SEO discussion over whether its better to redirect everything to the directory, I just want to deal with this flippin' Google mess and make sure all requests go to my actual pages!
Thanks very much for any suggestions!
EDIT: Drat! Almost forgot one of the most important bits. Not all pages use /index.htm. Some use /index.php and some use /page1.htm or similar (don't ask). So I don't think a site-wide htaccess or rewrite solution will work. I suspect I'll have to set up htaccess in each directory and specify what index url to redirect to from there.
The canonical URL for a folder and for the index page in that folder is one that ends with a trailing slash and does not mention the actual name of the index page document if one exists.
You should redirect all index.* and any other such forms to the canonical URL. This is a simple job using a RewriteCond to detect that the request came from the web and a RewriteRule to issue the 301 redirect.
There's several hundred threads with basic code that can easily be adapted to handle multiple alternative extensions in this very forum. It's a question that's asked several times each month.
I really appreciate you taking the time to reply.
I may have not made myself clear. When I mentioned canonical urls I meant that all my pages use a self-referencing <link rel="http://www.example.com/directory/page.htm" /> tag in the header, so there should be no confusion over duplicate URLs.
I do not wish to redirect '/directory/index.htm' to '/directory/', rather I want to let Google know that I have ONE page, and its at /directory/index.htm'. I assume that is best done with a 301 redirect to the actual page. I have a huge website, and all my internal links are to the actual html page not to the directory, and I have thouands of external links to the actual urls, NOT to the /directory/ url.
Gosh, I hope I'm explaining myself clearly. If anyone has ideas how to redirect any requests for '/directory/' to /directory/whatever.htm' I'd sure appreciate the help.
|I do not wish to redirect '/directory/index.htm' to '/directory/', rather I want to let Google know that I have ONE page, and its at /directory/index.htm'. I assume that is best done with a 301 redirect to the actual page. I have a huge website, and all my internal links are to the actual html page not to the directory, and I have thouands of external links to the actual urls, NOT to the /directory/ url. |
Gosh, I hope I'm explaining myself clearly.
Well, not really, since you first say you don't want to redirect and then two sentence later you talk about a 301 redirect. You can't redirect without redirecting.
On closer reading it sounds as if what you want to do is the precise opposite of normal behavior. You're talking about redirecting requests for
This is obviously just as easy as redirecting in the other direction-- easier, in fact, since you don't need to look at THE_REQUEST and/or append a [NS] flag. But it's so flatly and dramatically wrong that it puts us into "Just show him how to use the ### gun" territory.
I don't think I said I didn't want to redirect at all. If that's how it came across I apologise.
You are absolutely correct; I DO want to redirect from /directory/ to /directory/index.htm
but only because Google is flagging all my directory and index pages as duplicates. I understand that most folks here are of the opinion that that's backwards. All I really want [puts on best nooby innocent look] is to make sure Google knows there is only 1 url, not 2, and it seems a 301 is the best way to do that (correct me if I'm wrong).
I do get it that people here don't like this idea, and think I should be directing the index page to the /directory/, however, pages like /directory/index.htm are the actual page, with internal and external links. Its a huge website, with thousands of links to the actual pages such as /directory/index.htm, so I'm really reluctant to redirect away from the actual page to the directory. Is there no way to do it the other way? So far all I've done is get into an endless loop trying things like this within a directory
Redirect 301 ^/ http://www.example.com/directory/index.htm
Please say that was a typo. You can only use mod_alias (Redirect by that name) if you don't use mod_rewrite at all. Not because the server will explode, but because there are likely to be unintended consequences. In Apache, you do not want unintended consequences.
Even within mod_alias, you would need to use the RedirectMatch format so you can use an end-anchored Regular Expression.
:: detour to refresh memory on mod_alias syntax, with pause for shudder at "(.*)\.gif" locution ::
RedirectMatch ^(.+/)?$ http://www.example.com/$1index.htm
But don't quote me. Does mod_alias use a leading slash in htaccess? I can't remember, and docs are uninformative. If yes, replace the above with ^(.*/)$ I've used a non-final .* or .+ because it's to be followed by only one character. But a [^.] would probably be better, since it gets rid of non-page requests a little sooner.
|Redirect 301 ^/ http://www.example.com/directory/index.htm |
:: heavy edit here because I tried to work out what the rule would do in real life, and just got a headache ::
If you have a large site with thousands of directory links, it can't possibly all be static, hand-rolled html. The names of linked directories come from somewhere, and it's the "somewhere" that needs to be changed globally. You are right that your internal links would also need to be changed, because search engines understandably don't much like internal links that consistently lead to a 301.
Besides, what happens if you ever change your extensions from .htm to something else? You'd still have to change all the directory links, or else do some gratuitous rewriting.
Wow, thanks for the detailed response lucy24. I really appreciate it. I honestly didn't understand half of the mod_alias reference. Assume I don't know much and you won't be too far off:-)
>>it can't possibly all be static, hand-rolled html<<
Well, actually ...
that's not far off reality either! We have a rather cludged together home-made set of admin scripts (I won't glorify it by calling it a CMS)
I'll have a go at some of your rewrite suggestions in the morning - its getting late where I am, and my heads hurting, too now!
Thanks for your help,
Although you can redirect folder URLs to index page URLs or index page URLs to folder URLs, the latter is what you should be doing. The former can create one whole big mess that you really want to avoid. Yes, you'll need to also adjust internal links to agree with that, but the end result will be much easier to maintain.
You sure do not want to be redirecting requests for example.com/ to example.com/index.php or for any such similar requests at lower levels.