Is this a duplicate problem

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is this a duplicate problem

driller41

7:31 pm on Oct 18, 2008 (gmt 0)

Hi, I have been reading about duplicate problems caused by www v non www names.

But I do not understand if this applies to my site.

Here we go.

I have www.example.com and can browse the site using this format

and when I remove the www and go to example.com I can still browse the site as example.com and then say example.com/category.aspx - then perpaps I will visit a url I have added myself and I will go back to www.example.com/whereever.aspx

So I kind of have two versions.

Is this the kind of duplicate problem that needs fixing.?

Robert Charlton

7:59 pm on Oct 18, 2008 (gmt 0)

Is this the kind of duplicate problem that needs fixing.?

In a word, yes. Any time you have the same page displayed under two different urls, you have a duplicate content problem, and you should not let it stand. For more information, take a look at the Hot Topics [webmasterworld.com] section, pinned to the top of the Google Search forum home page, and look at the Duplicate Content section.

The following threads might be particularly helpful, but I suggest you take a look at all the articles, as chances are very good that you have other duplicate content issues as well....

Why Does Google Treat "www" & "no-www" As Different?
[webmasterworld.com...]

Good summary threads:
Duplicate content in forum and articles - will I get penalized?
[webmasterworld.com...]

Canonical URL Issues - including some new ones
[webmasterworld.com...]

Robert Charlton

8:18 pm on Oct 18, 2008 (gmt 0)

A PS to the above. The situation you describe sounds like a problem waiting to happen rather than one that's been indexed, but clearly there's no setup on your server to serve up only one version for users and Google.

It would probably take a non-www inbound link to cause you problems, but there's a pretty good possibility of that happening.

driller41

9:25 am on Oct 19, 2008 (gmt 0)

Thanks, very interesting. I have browsed the threads, from what I understand there may not be a problem?

Even though I myself can manually remove the www from my URL and go to the site does that mean the SE's have a problem?

I have checked:- site:example.com -inurl:www and this test produces no results, so I guess google has just the www versions in it's database.

Also when I look in my dns records I see:-

example.com IP Address 3600 A Record
*.example.com IP Address 3600 A Record
www.example.com IP Address 3600 A Record

Which I think means that any visiting spider will be sent to the www version only.

The only problem I forsee is if there is a link in the site which is the non www version - that would make the spider go wrong.

Am I understanding this correctly?

Robert Charlton

6:12 pm on Oct 19, 2008 (gmt 0)

Which I think means that any visiting spider will be sent to the www version only.
The only problem I forsee is if there is a link in the site which is the non www version - that would make the spider go wrong.
Am I understanding this correctly?

What the DNS records mean is that an inbound spider can be sent to any of the above... ie, to...

- example.com with no www
- example.com with a www subdomain
- or to any wildcard subdomain.

This is a fine dns setup, because it assures your site will respond to a likely variety of requests, but dns is only half of what you need to take care of.

You must also set up the proper 301 rewrites on your web hosting server (ie, where your site is hosted), so that only one canonical version of your site is served. That way, if there's a link to either a non-www version, or to some subdomain that you don't specifically want, the request for that link will be rewritten to a request for the desired version of your domain, and you won't have a duplicate content problem.

Right now, you're very open to errors and mischief from others.

g1smd

8:18 pm on Oct 19, 2008 (gmt 0)

*** the request for that link will be rewritten to a request for ***

Just to be consistent with the terminology we use in the Apache forum, that scenario is an "external redirect" not a rewrite. Apologies for jumping in, but the terminology is forever getting mixed up.

Robert Charlton

8:50 pm on Oct 19, 2008 (gmt 0)

g1smd - And I'm one of those forever mixing them up. If you can clarify the distinction here, it would help more people than just me. Thanks.

g1smd

10:03 pm on Oct 19, 2008 (gmt 0)

A redirect forces the browser to make a new request for a new URL.

A rewrite connects the external URL request to an internal server filepath that is different to that which may have been suggested by the path shown in the URL.

So, the browser asks the server for "A" and the server responds with a 301 redirect that tells the browser to make a new request for "B". The browser asks for "B" and the server gives it the content.

For a rewrite, you ask for "X" and the server fetches the content of file "Y" but does not reveal the fact that the path was internally changed.

Receptional Andy

10:06 pm on Oct 19, 2008 (gmt 0)

Also when I look in my dns records I see:-
example.com IP Address 3600 A Record
*.example.com IP Address 3600 A Record
www.example.com IP Address 3600 A Record

Your DNS records are unrelated to how your web server responds to requests for your site (I doubt you want a wildcard DNS subdomain entry in there, incidentally - depending on your web server configuration, that could mean [anything].example.com would return your site!).

All those records mean is that the web server at [IP address] will respond to requests for those (sub) domains - that web server decides what happens when the requests get there.

If you have the same content available on different URLs (any difference whatsoever, even a single character), then you trust to a search engine's approach to duplicate handling that it won't affect performance.

Google's handling of dupes may not necessarily work in your favour - it could result effects like lost links, or devalued content. Here's one simplified scenario to exemplify how this can affect a site's performance:

www.example.com/widget is linked from internal navigation and has several external links. example.com/widget has one external link
the search engine finds that www.example.com/widget is a copy of example.com/widget
the duplicate selection process chooses example.com/widget as the "best" URL, based on other criteria, and drops www.example.com/widget from its main results
the search engine doesn't count the links to www.example.com/widget towards example.com/widget
example.com/widget does not perform well in SERPs because it has insufficient links

This isn't something that should really be a concern for the average person with a site, but unfortunately, website configurations (from technical perspective) are far from ideal, and it ends up being the webmaster who has to address the issues.