Why should they be the same site?
If home.domain.com exists, what should that be the same as?
Should www.home.domain.com then also be presumed to exist?
No. The permutations are too complex.
Herd the bot by using the 301 redirect.
Technically the www denotes a subdomain, and it can point to different content. In some cases it actually IS different content. So Google -- any search engine for that matter -- must not make assumptions and their code must account for as many technical possibilities as possible. Just because something is often the case doesn't mean that it must be the case.
|...why does Google want to treat the "www" and non-"www" versions of a website as different sites? Isn't it pretty obvious that they are one site? |
Or am I missing something?
The blame really rests on your domain name registrar.
Very little if no information is provided to unsuspecting customers. Registrars provide an extremely poor service at best. They should inform buyers that if they are going to use the domain for a website and for ranking purposes that search engines may see two websites instead of one and duplicate content penalties may result.
They are selling a product with no ingredients listed and no sell-by date. And with little or no aftersales customer service.
It is a known fact that google and other search engines apply penalties to duplicate content created by what you describe.
Google has at last applied a way for you to ask google to treat both versions as one. You will find the process in google's sitemaps.
How efficient this process is remains to be seen. Since very little that google claims it does very well is an efficient process. Some element of google misleading webmasters must always be in the back of your mind. For instance, google claims that no harm can come to your site from other webmasters. That is a lot of absolute rubbish. A registrar simply selling you a domain can have catastrophic affects on your website. You could easily purchase a duplicate content penalty from a registrar. But google would deny that too.
Alternatively, you can kill the root problem via your ANAME records. Again, if the registrar is properly equipped to allow you access to the records. You can point mysite123.com to www.mysite123.com this way the problem is solved permanently. All other methods, including serverside 301 are lesser options.
If you run your own apache based server, a simple container within your configuration files can hold the non www version. Insert in a folder with the same name a .htaccess file that gives a 301 permanent to the www version.
> A registrar simply selling you a domain can have catastrophic affects on your website.
That's stretching things a bit.
A registrar sells domains. Every domain comes with a practically-unlimited number of possible subdomains and sub-sub domains, etc. The DNS record and hosting configuraton determine what subdomains are "defined" in DNS and how they are hosted on the server.
It's true that both registrars and hosting companies could make things clearer and provide tutorials for new Webmasters, and if fact, many do. But 99% of all their customers don't bother to read "all that complicated stuff" and thus get into trouble.
The other problem is the bare-bones registrars and hosters. They exist because of demand for $7.95/year domains and $3.99/month hosting. Is it any wonder they have no tutorials or tech support?
Bottom line: If you want to rank well and profit from the Web, be prepared to either read and learn a lot of techinical stuff, or to pay someone who'se already done so. Like most other things in life, there are no shortcuts, and you get what you pay for.
|Bottom line: If you want to rank well and profit from the Web, be prepared to either read and learn a lot of techinical stuff, or to pay someone who'se already done so. Like most other things in life, there are no shortcuts, and you get what you pay for. |
I agree with what you say, but only on a level where you and I and webmasters that are aware of this is concerned.
We are also aware that mysite123.com is indeed www.mysite123.com and the original question at the top of the thread is asking why can't google understand this? If we can keep things in perspective the question is relating to this known fact.
I'm sure I have read an above average amount, and I am sure you have too. And I still keep reading, and reading and reading and even the fine small print does not escape me. Yet only recently have I come across googl'e clandestine and miserly effort to address this issue.
Google provides an option to click a button for it to merge the two domains in its database. The button is in the most obscure place and out of reach of any living creature other than a bloodhound that can sniff it out.
I did not come across that by reading, and very few highly paid webmasters know of it's existence. So I think if the guy paid out lots of dollars and became a bookworm like you suggested, he would still come here and ask the same question, minus a few dollars in his pocket, and probably more frustrated.
Matt is a representative of google, I doubt if he has ever written anything that has not been filtered by the PR department before the contents of what he mentions becomes public. He has never been subjected to a bombardment of random questions by learned webmasters where he is on the spot. He speaks from a port hole, nudged and hinted at as to what and what not to disclose.
No, I don't think it is about reading or paying. I think it is about common sense. There can be no doubt about it. An infant can understand that the two domains are one of the same.
I prefer to work to this logic. You buy the rights to use:
and then you set up services on it that you require:
The fact that web server software usually assumes that you'll have a website directly at domain.com is a quirk that you can easily correct. You could just as easily have the mail server there or something else, or nothing.
If this www vs. non-www issue is really an issue with google, this point needs to be emphasised in google's official webmaster guidelines, and preferably with a working example of .htaccess code.
I studied the guidelines, had no clue, and got burned.
I did not knowingly create
|multiple pages, subdomains, or domains with substantially duplicate content. |
Also, a detail about what to do if you change the url for a page needs to be there too.
msnbot has (now) included those in theirs [search.msn.com] albeit too late for me to escape reproach.
[edited by: tedster at 2:17 pm (utc) on Sep. 24, 2006]
"We are also aware that mysite123.com is indeed www.mysite123.com and the original question at the top of the thread is asking why can't google understand this? If we can keep things in perspective the question is relating to this known fact."
This statement is in fact false.
The www subdomain of the mysite123.com domain is the recomended subdomain to serve web pages from. That in no way equates to the www subdomain being the same as the domain.
Nore is it a requirement that what is on port 80 of a subdomain be a web page server. I could put my ssh server there if I so choose.
Be very careful in making assumptions. In an automated system one side working on incorrect assumptions and another working according to actual standards are going to be at cross purposes.
[edited by: theBear at 2:06 pm (utc) on Sep. 24, 2006]
|The www subdomain of the mysite123.com domain is the recomended subdomain to serve web pages from. That in no way equates to the www subdomain being the same as the domain. |
Nore is it a requirement that what is on port 80 of a subdomain be a web page server. I could put my ssh server there if I so choose.
Your knowledge regarding these issues is well respected and am certainly not going to question what you suggest. In fact I take note of your freely given help.
But what the real question is, is the fact that nobody has an idea exactly how google itself interprets the canolical issue.
I know exactly how they treat it, and according to Google search I have stated how they treat it more than a thousand times in the last two years. :-(
Suppose i do the redirects for the subfolder/index.html pages to the subfolder/ url format instead. How will that go down in the SERPs? Why is this good, and how does it work if these pages ( the subfolder index.htmls ) are already indexed and come up just fine as they are?
The domain root redirect i think i understand. The subfolder index redirect... is that really necessary? How or why does it work to have them redirected? Or is this for precaution only...
The internal navigation is and has since launch pointed at the /index.htmls in the subfolders. Nowhere on the site do i have subfolder/ links. ( though others may link to them from other sites that way... and that's why i need this... or am i mistaken with this? )
[edited by: photopassjapan at 5:11 pm (utc) on Sep. 24, 2006]
www.domain.com/anypage.html and domain.com/anypage.html are duplicates.
www.domain.com/index.html and domain.com/index.html are duplicates.
www.domain.com/ and domain.com/ are duplicates.
www.domain.com/ and www.domain.com/index.html are duplicates.
www.domain.com/folder/ and www.domain.com/folder/index.html are duplicates.
The first three cases are fixed with a non-www to www redirect.
The last two cases are fixed with a */index.html to */ redirect.
If you only link to /folders/index.html internally, then you might get away with it. However, one day it will catch up with you.
There is a very good reason for not including the index file filename in the link. You can change your technology from index.html to index.php or index.cfm or default.asp at any time without having to do any rewrites, or change any internal links at all.
You don't even lose your inbound links from other sites! You keep all rankings the same, and nobody could ever tell that you even made a change. How's that for future-proofing a site, and reducing your workload at the same time too?
While g1smd is talking about future proofing I would like to bring up a very old bit of advice from the W3C folks.
On page [w3.org...] you can find the old but valid advice about (cool) uris never changing and other very valid and good advice.
It is worth reading.
As for .htaccess routines that are correct, well that all depends upon what you need to do and what you already have in your .htaccess file and elsewhere.
I for one hate to post code to forums because, along with the normal your mileage may vary because of other things about your system that aren't the same as my system and the possibility of the forum software getting into the act, my solution may not work for you or it would but what you did the c&p from isn't what I did the c&p from.
So the redirects are there.
Should i change the internal navigation links too, or does the redirect solve all these problems...:
My example is the previously mentioned... thousands of individual html pages ( for each picture on the site ). I have about 100 fodlers, one for each album, where the index.html is the photo index page... and is linked to as album/index.html as of now.
Also i have a link to the homepage in the top navigation with the link pointing to www.mysite.com/ ...but another at the bottom with www.mysite.com/index.html only with the text "home" instead of the other text at the top.
My checklist for today :)
- redirects: non-www to www, index.html to / - ALL DONE
- subfolder/index.html links to subfolder/ in internal navigation - NOT YET
- no secondary HOME link, especially not with www.mysite.com/index.html as the link - NOT YET
Sorry for double checking every single step i take.
Heh, there's also a parallel thread over at: [webmasterworld.com...] discussing some valid points.
Actually half of the answers you gave there were useful for my situation as well,... thanks :)
The only thing i still can't figure out, and if i remember correctly miki99 had the same problem, is whether or not two home links on each page ( with different link text, top and bottom nav. ) hurt the site or not. ( sorry for being a pest, but this is the last point i'd like to clear before going to batch edit the few thousand static htmls for this round... which is a lovely task. )
|I know exactly how they treat it, and according to Google search I have stated how they treat it more than a thousand times in the last two years. :-( |
Hmm, that is very interesting since not 4 months ago I got a personal, not an automated reply, a personal reply from google that this is an issue they are working on. Msn replied nearly the same.
If you updated your explanation, could you guide us to your updated version please. This could be of great benefit to us all.
A Google search for "301 redirect" and/or "canonical" and/or "duplicate content" and my user name might help.
Some interesting replies. Obviously not such a stupid question :) I'm fully aware of how it's treated with redirects, and fully aware that you can have subdomains with different content.
What I find odd is that you are expected by Google to be knowledgable in SEO in order to avoid a duplicate content penalty. Yes you can blame it on the registrar I guess, but I think that's missing the point.
The post above saying you have to learn techie stuff to get ranked is missing the point totally. If I am searching for Dr Spocks Expert Analysis On Klingon Widgets and Dr Spock has a website but doesn't know the first thing about SEO, Google will probably take me to someone who is regurgitating second-hand info but knows about SEO. I'd rather read from the acclaimed source but he's been penalised for being normal (-ish!).
With "www", I personally think it is "obvious" that this is the standard routing and if there is nada prior to the TLD.com, then fine, just assume one is synonymous with the other. Then the SEO-wise can simply use other subdomain names and bingo, no-one gets penalised for not being at least part-geek oriented. 95% of website owners have probably never even heard of "canonical", yet alone know what it means, so why does Google penalise them for not knowing is my original question re-worded?
[edited by: Simsi at 7:42 pm (utc) on Sep. 24, 2006]
Actually it makes no difference as to how you reword the question.
Some folks seem to think that Google can or is going guess what is what.
The web is a very technical thing (extreme technical terminology will now be used).
The web is quite simply a name space where each name was assigned by a website owner/operator and is completly under your control (or lack thereof).
A server (of some type) will eventually return a response when the name is uttered. The response is also under the control of the namer provided the portion that is known as the domain resolves to one of their servers.
This response is the named object along with header information..
The simple fact that a large number of non techies run websites is not germaine to what happens. If you wish to do something it is in your best interest to learn how it all works.
There is far more to running a web site than cranking out words bracketed by markup.
[edited by: theBear at 9:48 pm (utc) on Sep. 24, 2006]
I was just about to post that running a website is a lot more than just "writing content" and "chasing links" - but you said it so much better.
> The post above saying you have to learn techie stuff to get ranked is missing the point totally.
Well, based on supporting evidence, I rest my case that if you want to be competitively-ranked, you do need to know some techie stuff. In some sectors it make make little difference, but in others, it's quite important. We'd be doing Webmasters a disservice to say otherwise.
Rather than counting on some mysterious back-end canonicalization algorithm at Google, or using their SiteMaps function (which won't help with Yahoo! or MSN/Live), a simple set of redirects will solve this problem, or, if installed prior to taking a site live, prevent it entirely.
I don't dispute those facts at all. The above 3 posts are coming from a "webmaster" angle...I'm coming from a user angle here. That's who Google is trying to reach.
Google (arguably) demotes a site providing top notch or definitive/highly valuable information because the guy who wrote it doesn't know how to fix a canonical URL issue. Doesn't seem right to me.
Surely the purpose of the web is to assist Jo/Joe Public to find the best possible information relevant to his/her requirement. It's not about whether "you want to be competitively ranked" or not. And it shouldn't be about whether the "source" is also technically aware. It's quality of content the user wants to see.
[edited by: Simsi at 11:00 pm (utc) on Sep. 24, 2006]
"Surely the purpose of the web is to assist Jo/Joe Public to find the best possible information relevant to his/her requirement."
Not really, that is the job that the search engines took upon themselves. Part of that is a requirement of having to provide a means of ranking. Google's method was a vote by links and link text.
Oh, and just so you know I'm not a content person. I build the glue that you shouldn't even notice if you were to visit the sites I do work on.
[edited by: theBear at 11:39 pm (utc) on Sep. 24, 2006]
I probably use Google (and several other search engines) more as a user than as someone who works on sites.
Google (and the others as well) has(have) a way to go, however, you are tunneling toward a particular result without seeing the implied requirements in what the search engines have taken on.
It has become a fight for the search engines, one that has required ever more things to be taken into account to provide something that resembles a decent index.
Please note the use of weasel words.
Seperating the good from the also rans is a real problem these days.
In any endeavor those that have a message to get out need to be certain that they aren't being sloppy in how they get the message out there. Having multiple names for the same thing is a bad way to start, having conflicting names for the same thing is even worse.
I can call doors windows, but all that does is confuse the reader so I shouldn't call doors windows.
1.google might presume the content on www or without www is the same isnt'ít?
i cannot put different content on them....
2.6 years ago, i got my first domain, and the host told me, it is reachable through www and without.
Thats's fine, i was thinking, no problems with that, so i didnt care how inbound links were done....
Of course i learned quick, but i think google made a big mistake here
"i cannot put different content on them...."
You may not be able to, but it sure can be setup to do that. That is the problem, there is no _certain_ way to tell.
Come up with a way that works all of the time and sell it to the search engines, and yes I think they all have issues with this.
In recent weeks I have seen two sites that did have different content at domain.com/* and at www.domain.com/* and I am sure that it is not rare.
For one site www had their entire site on it, and non-www just had a few pages of adverts and links.
For the other site non-www had their product catalogue, and the www had their forum on it.
Is it not possible that 2 or more launch pads can be at the root. Depending how you enter. If the server is set up to recognise a request for 123456.com, www.123456.com etc. Then surely, two websites can exist side by side. Even linking to each other and to exchange links with each other. They can even join up forces internally. Share pages etc Yes, No?
In theory, if kept totally seperate though navigation, it is possible. But someone from outside may mess up things with an inapropriate link.
Disregarding duplicate content if sharing. Is it possible?
| This 246 message thread spans 9 pages: 246 (  2 3 4 5 6 7 8 9 ) > > |