Forum Moderators: phranque
I have solved the www versus non-www redirecting with the Mod Redirect part of my htaccess file. So it is not that problem
My sitemap generator program on one of my sites found 30 pages, where only 15 exist. But on further investigation, 30 do exist, but 15 are straight duplicates.
My principal site - and preferred domain - in this case is
www.example.co.uk/ - a small site with 15 pages.
This site has a PR of 3 on the home page
There is also a site out there called
www.example.co.uk./ - with an extra dot after the UK and before the last forward slash. I did not create this site
This "extra" site has another 15 files, and the index page has a PR of zero.
Both sites are visible on a Google search, and both sites come up cleanly within a browser, with identical content.
My understanding of the Google-Voodoo is that this represents a duplicate content, and the content filter would be applied - thus reducing PR etc.
I have checked quite a few sites now, and those that appear to be hosted on Apache servers all appear to have the same problem.
Those sites hosted on windows servers don't display the same characteristic.
Example in my local area www.quux-foo.com (Windows based)
When you enter www.quux-foo.com./ (with an extra dot) a clean site is served as www.quux-foo.com (without the offending dot)
I have raised this question on Google's webmaster forum but no-one seesm to want to take the issue or discussion on.
I have read that Aapche and other servers create various aliases for internal purposes and shorthand processing work. This is one of the reasons that www.example.com and example.com exist side by side, and we need to adjust through the .htaccess file for one preferred domain, as I understand it. On another site I am told that the HELM management system creates aliases by default.
I have assumed that this extra set of file with the extra dot has come from the same server source for technical purposes ...
So here we are in an Apache forum - where I assume people have a more specific Apache server knowledge and experience than on a general Google forum. So if anyone wants to respond, some questions -
- does anyone know why this happens?
- Is it a feature of Apache servers? (maybe I'm wrong)
- Am I right in saying that a penalty exists because there IS duplicate content created?
- How do I fix it? - and
- do I need to fix it - or are the extra files irrelevant, a mirage, and SEs REALLY don't them into account.
[edited by: jdMorgan at 8:35 pm (utc) on April 15, 2007]
[edit reason] No URLs or specifics, please. See TOS. [/edit]
- Is it a feature of Apache servers? (maybe I'm wrong)
No, it's a feature of DNS and server configuration options. It has nothing to do with Apache vs. IIS.
Hosting companies typically configure the domain and its www subdomain to resolve to the same resources (files, scripts, etc.) on the server. Left like that, you run the risk of duplicate content. Why do they do it? Because some of their customers want to use the www.example.com subdomain, some want to use the example.com domain, and the hosting companies don't want to be bothered with this once the account is activated, so they set up both and leave it to you to deal with (if you're even aware of it).
- Am I right in saying that a penalty exists because there IS duplicate content created?
Not a penalty, but rather a diluting effect; You spread your link popularity and PageRank across two or more URLs.
- How do I fix it? - and
Do a search here on WebmasterWorld for "canonical domains" and "canonical URLs", and implement the suggestions found on as as-needed basis. Redirect all non-canonical URLs to the canonical equivalent. Examples are:
example.com/
www.example.com/
xyz.example.com/
example.com/index.html
www.example.com/index.html
xyz.example.com/index.html
That's six URls -- all pointing to the same file. Most SEO-savvy members will recommend redirecting them all to either example.com/ or www.example.com/ -- Your choice, but pick one, link to it consistently, and redirect all of the others.
- do I need to fix it - or are the extra files irrelevant, a mirage, and SEs REALLY don't them into account.
Yes. Search engines will be happiest (and you will, too) if every resource in your domain has one and only one URL. If alternates exist, they should be 301-redirected to the proper URL.
The good news is that for new, simple sites, most of these issues can be precluded/handles with only four directives -- all previously posted here.
Jim
Most of what you have said seems to tie up with my own suppositions.
All the aliases that you have listed I have covered with my standard .htaccess treatment.
I still have a problem with this "trailing dot" domain, ie
www.domain.com./ (final dot before the slash)
Can you direct me to a source to explain the syntax to redirect this one?
I can do the front end redirects, and the /index.html's etc. This dot before the slash is giving me problems
Mod rewrite seems to have special rules with the main host name up to the forward slash - the folders and files after that are relatively easy to deal with.
But a dot after the ".com" (ie .com./) or a dot after the ".co.uk" (.co.uk./) is very confusing
I really appreciate your input
bryan
My sitemap generator program on one of my sites found 30 pages, where only 15 exist. But on further investigation, 30 do exist, but 15 are straight duplicates.My principal site - and preferred domain - in this case is
www.example.co.uk/ - a small site with 15 pages.
This site has a PR of 3 on the home pageThere is also a site out there called
www.example.co.uk./ - with an extra dot after the UK and before the last forward slash. I did not create this site
Example in my local area www.quux-foo.com (Windows based)
When you enter www.quux-foo.com./ (with an extra dot) a clean site is served as www.quux-foo.com (without the offending dot)
[microsoft.com....] works perfectly from here. And they run it on Microsoft-IIS/6.0. ;)
The problem is not specific to a type of webserver. A hostname with a final dot is valid. In the sense that it resolves (when a DNS server is queried). And there is nothing wrong with it.
Now, for a solution to your situation. First, check your page for any reference to links with a hostname with a final dot.
Second, you can configure apache to redirect www.example.co.uk./ to www.example.co.uk/
RewriteCond %{HTTP_HOST} ^.*example\.co\.uk\.$
RewriteRule ^(.*)$ http://www.example.co.uk/$1 [R]
# Redirect to remove trailing port number or period (or both) from hostname
RewriteCond %{HTTP_HOST} ^www\.example\.co\.uk(:[0-9]+¦\.¦\.:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.co.uk/$1 [R=301,L]
# Redirect all non-canonical domain variants to canonical domain
RewriteCond %{HTTP_HOST} !^www\.example\.co\.uk$
RewriteRule (.*) http://www.example.co.uk/$1 [R=301,L]
Jim
Jim
IIS is case insensitive, so your index page on an IIS server could be accessed using... index.html, index.htmL, index.htMl, index.htML, index.hTml, index.hTmL, index.hTMl, ... and all other permutations up to INDEX.HtML, INDEX.HTml, INDEX.HTmL, INDEX.HTMl, and INDEX.HTML.
Now that is a problem!
I have fixed the bad link which was causing mysitemapbuilder to recognise the extra dot series of files (as Achernar pinted out), then used your second rule suggestion
RewriteCond %{HTTP_HOST}!^www\.domain\.co\.uk$
RewriteRule (.*) [domain.co.uk...] [R=301,L]
which I hadn't thought of doing - ie if the domain is NOT written like this, then re-write it - much cleaner and covers nearly every situation I was worried about. (I think I understood that right didn't I?)
Thanks loads
B
In the FrontPage directory of _vti_bin, plus the subdirectories of _vti_adm and _vti_aut I have added a line with
Options +FollowSymlinks
And all works well (I hope).
No technical issues with this are there?
Again thanks for your help
B
I'm surprised you found that "on another site," since it was apparently first-reported by WebmasterWorld member "chopin2256" here [webmasterworld.com], and credited to member "Bumpski". :)
See post #1496336 in that thread for details.
Jim
I'll spend more time looking through webmaster world next time
B
As far as the trailing dot - what a great find, which I am surprised has not been discussed here before. (At least I hadn't noticed it.) It points out how important it is to keep-up with changes that may at first glance appear not to be related to your site.
Auto-linking has become popular, and I'd never imagined that this flaw in auto-linking software would exist so widely and have this impact.
Add "trailing dot removal" to the list of "must have" rewrites!
I can offer a bit of insight as to why the "." is being accepted by browsers in the first place: a "." is, indeed, legal at the end of a domain name. By adding the final ".", the domain name becomes a "fully-qualified domain name", or FQDN. This indicates (normally, to the local operating system and/or network) that the domain name should not be further suffixed with a default domain name.
It's a bit of arcana that is unknown to and ignored by 99+% of Internet users. There are very few situations where the average user would ever need to add the "." for the domain name to properly-resolve. (Although the people at news.com - which is really news.com.com - might have a use internally... :) )
I really think this is a browser flaw, as well as an obvious flaw in the auto-linking software. I think browsers should remove the trailing dot from the "host" header that they send to the website.
RewriteEngine on
RewriteCond %{HTTP_HOST}!^www\.mysite\.com$
RewriteRule (.*) [mysite\.com$1...] [R=301,L]
Note the no / before $1 on RewriteRule.
I was getting in IE [mysite.com...] until I got rid of the /
Does this seem correct?
P.S. [webmasterworld.com....] does not redirect
to [webmasterworld.com...]
P.S.S if you look up in google it seems to not be indexing site:http://www.webmasterworld.com./
For use in server config files, you can also use:
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
Or, to make the code "portable" between the config files and the top-level .htaccess file,
RewriteRule ^/?(.*)$ http://www.example.com/$1 [R=301,L]
> P.S. [webmasterworld.com....] does not redirect
...which is why I used it as an example... :)
Jim