Forum Moderators: phranque
The Background:
It's a 10 year-old non-profit site. The main page (or homepage) has always been called keyword.htm. We have never had a page called index.htm.
So the main page of the site has always been:
http://www.example.org/keyword.htm
Ten years ago it seemed a waste to call the main page "index" so we called it a domain related keyword instead.
We have always had in the root .htaccess file the following:
Options -Indexes
ErrorDocument 404 /404.html
DirectoryIndex keyword.htm
So errors go to our brief custom 404 page, containing a simple apology text and a link to our search page.
We use no cloaking, meta-refresh or redirect tricks.
The site is plain flat, all valid, html 4.01 strict.
The site uses a href="/page.htm" type links, and each page has it's own full url as a base ref in the Head.
The .htaccess also has (along with other non-related RewriteCond):
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.org
RewriteRule ^(.*)$ http://www.example.org/$1 [R=301,L]
RewriteRule ^$ http://www.example.org/keyword.htm [R=301,L]
The Problem:
I don't know why, but Google and Yahoo! contain listings of our main page both as:
http://www.example.org/
and
http://www.example.org/keyword.htm
We obviously only want the second (longer) url listed, but both SEs seem to prefer the first one, despite most external links pointing to the longer url. They appear to choose the shorter url over the longer one, and the shorter url has zero PR.
This means they perceive the main page as a duplicate which must harm the site. Though whichever version they serve up in the serp it keeps the same serp position. G only shows one version at any time, whereas Y! usually shows both in the same serp.
Google juggles its preference between the two as they fiddle with their algorithm each month.
We've always had the www version selected as preferred domain in G WMTools.
The Fix:
What would you do?
Ditch the keyword.htm page and create an index.htm page, change all internal links to point to that, and 301 all external keyword.htm links to index.htm?
Like this:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.org
RewriteRule ^(.*)$ http://www.example.org/$1 [R=301,L]
RewriteRule ^$ http://www.example.org/index.htm [R=301,L]
Regarding internal links: I understand that best practice is not to include the index.htm part in internal links, but rather let the server "find it".
I'd actually prefer, if possible, to keep the keyword.htm page and get the SEs to see we have never had any other homepage, and drop their ghost version.
I appreciate your expert advice.
It makes no difference what the filename of this page is; What matters is the URL. And obviously, Google and Yahoo! are showing a strong preference for the URL to be "/".
What would I do? I'd delete the second rule above, change all on-site home page links to "/" and over time, try to get all inbound links to the home page changed to "/" as well.
After three months or so, I'd then replace that second rule with this one:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /keyword\.htm\ HTTP/
RewriteRule ^keyword\.htm$ http://www.example.org/ [R=301,L]
Overall, you'll have to decide if the small keyword-in-URL ranking advantage of using "www.example.org/keyword.htm" is worth continuing to confuse the robots and continuing to risk duplicate content causing a 'split' in your home page PR.
[added] Also, when replacing that second rule, put the replacement above the domain redirect; Most-specfic redirects go first, followed by less-specific redirects, then internal rewrites, again in most-specific to least-specific order. [/added]
Jim
[edited by: jdMorgan at 2:34 am (utc) on Dec. 3, 2007]
Overall, you'll have to decide if the small keyword-in-URL ranking advantage of using "www.example.org/keyword.htm" is worth continuing to confuse the robots and continuing to risk duplicate content causing a 'split' in your home page PR.
There's already a split PR, so I have no real choice.
I'll certainly follow your advice.
Please confirm I've understood correctly before I make any changes.
In my www .htaccess file change this:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.org
RewriteRule ^(.*)$ http://www.example.org/$1 [R=301,L]
RewriteRule ^$ http://www.example.org/keyword.htm [R=301,L]
To this:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.org
RewriteRule ^(.*)$ http://www.example.org/$1 [R=301,L]
Simultaneously change all internal "/keyword.htm" links to just "/"
Try to get all inbound links to keyword.htm changed to /
Wait 3 months, or so, for the SE's to digest that, then swap the above rules for these:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /keyword\.htm\ HTTP/
RewriteRule ^keyword\.htm$ http://www.example.org/ [R=301,L]
RewriteCond %{HTTP_HOST} ^example\.org
RewriteRule ^(.*)$ http://www.example.org/$1 [R=301,L]
Is that the procedure you advise Jim? (At my risk of course :)
Can you clarify what you expect the SEs to do during this changeover.
They currently have both these urls in their databases, showing the identical homepage:
http://www.example.org/keyword.htm
http://www.example.org/
Will they automatically drop the longer url, or should I try to carefully remove it using the G Removal Tool in some deft way? G seems to "hold onto" old urls for years, often "coming back" even after using the Removal Tool.
I'd expect the SE's to begin listing the "/" URL more often, and to start moving the PR from the keyword.htm to "/" based on your removal of the initial redirect and the increasing number of links to "/".
After they've figured that out, we add the new "reversed" redirect to force the issue when/if needed, and to stop people copying their browser address bar and linking to keyword.htm instead of "/".
Don't use the removal tool unless there are lawyers making you do it (it's just dangerous unless your typing is always perfect). The redirect rule will suffice here.
Jim
That domain also has a subdomain with the same history and problem.
[keyword2.example.org...] is the main page
and
[keyword2.example.org...] is a duplicate in the SEs.
So how do we switch to the shorter URL?
The subdomain has its own .htaccess, currently containing only this code, nothing else:
ErrorDocument 404 /404.html
DirectoryIndex keyword3.htm
Trying to apply Jim's advice to our subdomain, would this be correct?
Keep the main page filename and the DirectoryIndex the same.
Change all internal links (and hopefully inbound links) to the subdomain main page from:
[keyword2.example.org...]
To:
[keyword2.example.org...]
Wait 3 months, or so, for the SE's to digest that, then use this code in the subdomain .htaccess file:
ErrorDocument 404 /404.html
DirectoryIndex keyword3.htm
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /keyword3\.htm\ HTTP/
RewriteRule ^keyword3\.htm$ [keyword2.example.org...] [R=301,L]
Is that correct?
Note: I should add that, my Virtual host makes subdomains using DS-IRM (Domain Stack-IRM, or Subdomain).
They say a DS-IRM is a Subdomain that works like an IRM. For example, you might want to set up a separate section of your site called [store.example.com...] which would actually point to http://www.example.com/store/.
I'll use this redirect in the www .htaccess:
redirect 301 /keyword2/ [keyword2.example.org...]
That seems simpler than using mod_rewrite.
I'll work on the subdomain first, and see what happens, then do the main www after the Christmas and New Year rush.
Than you both again, I think this thread will help others remove duplicate homepages in the search engines.