I don't believe it is a complex issue at all. :o
A heart transplant is a complex issue. What google wants you to think is not a complex issue.
Who has heard or watched these google representatives truly resolve a problem. They are vague at best, and they are skilful in sublime vagueness and have turned it into an art form.
How often, if ever, had Matt, Adam or the other misleadingly named google representative told you that the www in your domain is in fact a sub-domain? And that google has a definite dislike for sub domains.
The only thing the www is doing is ZERO. Nothing, and is a problem that is staring webmasters in the face and is still a problem because these representatives want you to keep being puzzled.
I've never come across a host provider that asks me how I want the www treated. They all either don't know or don't care. How many webmasters have been asked by a host as to why are you hosting your site in a very vulnerable way to google on our servers?
Canonical issues, duplicate content issues and a myriad of other problems do not exist but manifests itself by google creating the environment for it.
These representatives have never informed webmasters properly about anything. Adam ventured into a powerful thread recently to proclaim the existence of peanut butter and sightings of Elvis eating hamburgers to deflect webmasters from the valid points being discussed.
If a webmaster continues to think that there is any value in the www without knowing why it is there, other than it means World Wide Web, which it does not, then we will never progress and continue to play a game set out by google to keep webmasters guessing. The www was a gimmick the pioneers of the internet needed for other purposes.
I've never heard these representatives to give anything away other than cryptic clues.
Millions of websites are exactly in the very state I am explaing here and for anybody to say it there the webmasters fault would be an unfair comment.
Let us assume a simple website is to be created like a fortress against duplicate content. Firstly, there is no such thing as duplicate content. Google creates it. So you are safe before you start. Use that safety to your advantage. Don't just read google's misleading webmaster guidelines.
When purchasing a name for this simple fortress type website think first that the registrar has no or little knowledge regarding search technology. He only wants your money.
A standard and arcane method is used to sell domain names, often by ill equipped registrars.
When you purchase a domain for this simple website as the example here, the responsibility is now yours to protect the domain from google. Google will cause your domain duplicate content and canonical issues if you present this domain to google as it is.
ANSWER IS SIMPLE
How will google find this domain to mistreat it. Google has to request instructions from the global DNS to first ask if the domain exists. Since it exists google is then told that the name is parked at cheapskateregistrars nameservers. Now, this is the place to make sure that google is never given the opportunity to apply duplicate content penalties or canonical issues to your website.
Here you can merge the two versions of the domain together so that they resolve to one only. A far better choice would be to get rid of the www subdomain. Since google has shown a dislike for subdomains. And their algo is forever changing. Leave the least possible variants to minimize problems in the future.
WWW is nothing but a suddomain. That is why it is near impossible to create a subdomain such as mysite.www.simple-site.com. So this explains why if you only had simple-site.com and you wanted a subdomain for a ware-wolf-woes you then get ware-wolf-woes.simple-site.com and if you wanted to abbreviate it, you now have www.simple-site.com. We are now back to the www full circle.
KILL THE PROBLEM
Get rid of the www subdomain and be left only with simple-site.com at the ANAME RECORDS. Now it becomes totally impossible for google to create a problem regarding www versus non www.
Any agent, browser, crawler, spider, harvesting bot or any browser cannot go wrong. Your website now answers to one name only. It is impossible for a mix-up. And the risks and problems google might throw your way is eradicated at source. Regarding www and non www that is.
These google representatives are employees of google. They should ooze with technical replies and foster confidence in webmasters but they are trained only to keep you guessing.
Google does not tell you that if you do not make sure that your website answers only to one name that it will give you penalties. It simply goes ahead and does it. Nor does google own up to its responsibilities that it is its harvesting crawlers that pick up damaging links to your website. In fact, google misleads you by saying another website cannot harm your site.
LET US HAVE A CLOSE LOOK AT THIS MISLEADING CLAIM.
We have say;
[simple-site.com...] just like a usual website. Unprotected.
It ranks nicely after your efforts and you are pleased with the looks and hits to your website. You have say 50 pages of that site nicely crawled by google and all indexed. Some wiz kid webmaster in a remote Chinese village creates a scraper site and is using a php based automated linking process that is going to point to your website. He leaves out the www. Ahhh, he has actually left out the subdomain. Don't forget.
A link now is visible at the wiz kid's site pointing to [simple-site.com...] The wiz kid used a cheap computer and uploaded the site in between feeding chickens and pigs on his land. Unintentionally creating a killer link that google says is impossible.
NOW WE HAVE A MAJOR VULNERABILITY ISSUE.
And google is the potential killer. Its harvesting bot has detected the killer link. Potentially, this link can create 50 duplicate pages of your website.
Pagerank is going to be split. Duplicate content penalties are going to abound. Untold problems are going to beset your website by google because you are going to be caught cheating.
The harvesting bot informs google's database that a new link exists and is put aside for later processing by deepcrawl bots. A time bomb is ticking away and it is going to explode.
Google instructs a deepcrawl bot to GET info for [simple-site.com...] the bot must first go to request existence of the domain from DNS it is told that it exists on nameserver NS01CHEAPSKATES and the killer bot goes there and sees that it points to the server the simple site is on. At the poorly configured server hosted by a one man band who knows nothing about search technology the bot makes a request. BANG.... It is given a 200 GET because this is the very first time that domain has been given out. And the server has presented the killer crawler with a 200 GET. Now the bot takes the contents of the index page full of relative links to google's notoriously ill equipped algo. A trained eye would spot in the same raw logs that 2 minutes ago a google deepcrawl bot had requested [simple-site.com...] and the bot was given a 304 UNCHANGED. Here is the evidence, two websites exists and its cheating in the eyes of google.
The process now continues until all 50 pages are in google without the www and all 50 pages with the www and 50 duplications are to be processed by the duplicate content algo. Red flag after red flag is raised against the unprotected site.
WHAT HAPPENS NEXT
Sensational names are given to googles updates. Bourbon is the chosen name amongst webmasters to celebrate the event. The owner of www.simple-site.com makes a first post. MY SITE HAS TANKED and is frustrated. What happened. Who can help me.
If the ANAME RECORDS said that it only answers to one name, then no duplication would result. No canonical issues. No vulnerability. The near useless host provider would be academic.
Now we know that another webmaster can indeed tank your website.
Base refs etc are untidy and dangerous to a novice. Serverside redirects and many other things can be done but that is another story.
Please do not use this example as a basis to fix your website. I'm simply making a point about how webmasters are misled by google.
Sorry to have made such a long post.
[edited by: AlgorithmGuy at 1:39 am (utc) on Sep. 30, 2006]