Forum Moderators: open
Does this mean that Google have finally recognised that the www.domain.com is normally the same as domain.com?
Not necessarily. We have three servers, www.widget.com, foo.widget.com, and bar.widget.com. If you try to go to plain widget.com with your browser, you're told that the address doesn't exist. When you search for site:widget.com, Google gives hits from all three servers.
However,
there are LOTS of site's that actually host the same site on both those domains (most of them don't even know it).
So, if Google finds out there is a site on both those locations, it will probably check if they are the same. If they are, 1 will be `banned` the other will be listed without as if it were the only one.
This used to be a problem, not anymore furtunatly. :)
On my site's *.domain.com is directed to the same path; but I use server-side scripting to make sure the site is displayed with the domain I want to.
So if someone forgets the www., I give a 301 to www.domain.com.
Google seems to like it, as do other SE's.
It appears that only Google doesn't know that www.domain.com and domain.com are the same.
Some may argue that www could have a different content, but really, how many sites that really bother to have different content for www.
It's basically a given that 'majority' of the sites, if not most, www.domain.com and domain.com are the same. If we want a subdomain we could name it something else such as sub1.domain.com or even www2.domain.com without having to mess around with www which often is nothing but just a 'symbolic link' to domain.com a carry-over from the old internet addressing.
I tend to agree that this is some sort of bug which also effectively bloat the database or could end up in some duplicate penalty type of thing.
For instance search for site:www.domain.com and site:domain.com, different result which could lead to problems such as PR and possibly duplicate content.
As for 301 which is the most popular advice here, never use them but instead I uses 404 error handling and just redirect everything to index page.
Not many at all. However, Googlebot has to allow for every possibility. Technically, root and the www subdomain are separate sites. Any webmaster who has problems with this needs to learn how to do server side redirects.
You have got me worried! I redirect domain.com to www.domain.com. Is there any evidence this could cause problems?
Yes, yes, we all agree, how many times does this have to be re-stated?!
Fact is, technical matters aside, the proportion of sites that actually use these as separate subdomians is tiny.
So let's leave that old sausage in its pedantic pantry.
Regarding 301s causing pages to be dropped:
I can only speak from experience, and I'm afraid this is true. HOWEVER, only because G is so slow to update its database of URLs. If you have a 301 already in place - great - it will work fine, and protect you from a double listing. I envy you :)
Trouble is, if you have a current double listing e.g. not-www and www, and then you put in a re-direct, for example, from not-www to www (as GG appears to recommend), you will find that rather than your pages being corrected on the fly, they will be dropped as if orphaned.
It all depends on when G decides to update its records, but in the case of one of my sites I went from 1000 pages listed to 2 - and eventually pulled the 301 because I couldn't stand it any longer.
Within 12 hours all my pages were back, listed not-www! Exactly as before. I had waited 3 weeks, and lost handfuls of hair, with the net result of zilch!
I suggest get a 301 in place right from the start. But I can assure you, it can be a painful experience to try to correct it later :(
p.s. did you know that technically a tomato is a fruit? So it really shouldn't be in the vegetable section etc. etc. ... :)
p.p.s. GG has mentioned in the past that there is a 'stack' of redirects, as I recall, and that from time to time Gogle updates them. Why can't Google simply tell us approximately when this update is about to occur? The we can fix this old, and frankly silly, problem with the minimum of damage by putting in a 301 at the appropriate time.
Inktomi did have a problem with 404's at the beginning of last year, but they are having problems now with 301's (which Yahoo Mike assures us will be corrected within the next 4 weeks).
The usefulness of the redirect is that it still
pulls in visitors to your pages who are following links
to the "wrong" domain from other sites, and from their
peronal bookmarks, and emails, and so on.
But, the redirect would not even have to be there
if it were not for the lameness of the spider. Many
sites run with one as the alias of the other for
non-www and www. Without a redirect, the sites
would still work fine. It is only the perception
of the need to avoid a possible dupe content
penalty that the redirects are even used.
If the spider took into account that of two
hosts, one is the alias of the other, then
this whole discussion would be moot.
After all these threads, you would think that
some PhD would have passed a note to GG to
announce that they had rejigged the crawlers
to accept non and www as one and the same,
just like a human would. IF and ONLY IF the
alias and the canonical resolved to the same
ip AND the content is the same, AND the parent
domain is the same, then no penalty accrues.
GUARANTEED.
Do we have that? No.....
And I ain't holdin' my breath.
BTW, technically non and www are only two
separate hosts if each has an A record in
the zone. If an admin does not know how
to setup one as an alias of the other, then
that is an entirely separate problem.
+++
>> Do we have that? No..... <<
>> And I ain't holdin' my breath. <<
Umm, Google is already aware of this, and they do have a fix in place. They have a database that says that this site is the same as that site, and to combine the results. Googleguy confirmed it only a few weeks ago in a WebmasterWorld thread.
That database does get updated every few months, and when it does, the PR and backlinks list for all versions of your site will be identical. Before the update they are all treated as separate sites, with separate PR and backlinks.
I have seen the update at work for several sites. It can take 6 months for the combination to happen (in my case it was combining the .net results with the .com results),
The acid test of whether Google have fixed the problem would be to compare the results of the following searches :
link:domain.com
link:www.domain.com
If, these produced identical results then we might consider the problem fixed. Definitely a premature post on my behalf - sorry.
Kaled.
That database does get updated every few months, and when it does, the PR and backlinks list for all versions of your site will be identical
As I understand it, the criteria for getting into the database is if the cache for www.domain/ and domain/ is the same. If you regularly update your index page the cached versions will often be out of sync.
Generally speaking, 301's work and give no problems - they only produce the desired result of one URL variant dropped and another being indexed in stead.
Now, there are exceptions to this rule:
1) The database lag time (from the moment you introduce a 301 until it is reflected in SERPs): This will typically be around a month or a little more. During this month nothing usually happens, but all kinds of odd things may happen, depending on the specific situation - it is not a general rule that all pages are dropped from the index, rather that is one exception. The typical scenario is that some old URLs will remain under the snippets for longer than others.
The "six months" case is an extreme exception, but then again this was not a switch from within a second level domain, in stead it was a merger of two different top level domains, and extra precaution should be taken in such a case.
2) Errors of all kinds: The receiving address may give a 404, the forwarding or recieving address may previously have been a 404, the robots.txt may need fine tuning, conflicting redirects may be in place, DNS or server setup might be uncompatible with the redirect, etc. All of this may lead to weird results that take place at the same time as the 301 redirect, but is not caused by it.
3) Google bugs: These do happen, as (a) Google is not error free, and (b) they think as a SE, not as a webmaster. One particular nasty redirect bug that appeared arond six months ago had to do with 301's (wrongfully) being interpreted as one-to-one relationships only; If you 301 redirected two or more URLs to one new URL you would risk unpredictable results in the SERPS (including, but not limited to "ghost URLs" (*) and de-listing).
This was a bug, there's no other explanation, apart from an outright error. Of course it is possible for a web publisher to merge the contents of two or more documents (pages/URLs) into one new document. Permanently, even.
The solution was to use 302 redirects for all cases that was not one-to-one (ie. all mergers of two or more URLs; not just relocations of one page). This worked, but of course it is not the proper way to do things (as a 302 is temporary and not permanent). I'm not sure if it's fixed yet, but i think so. Recently (within the latest 1-2 months) i've made a 10-domains-to-one 301 redirect and i have experienced no sideeffects.