Have Google fixed WWW problem?

Forum Moderators: open

Message Too Old, No Replies

Have Google fixed WWW problem?

kaled

9:59 pm on Apr 1, 2004 (gmt 0)

I just noticed that site:mydomain.com works. Certainly, a week or so ago this would have failed. I would have to type site:www.mydomain.com

Does this mean that Google have finally recognised that the www.domain.com is normally the same as domain.com?

Kaled.

webnewton

9:15 am on Apr 2, 2004 (gmt 0)

Results are different with site:www.domain.com and site:domain.com

MikeBeverley

9:22 am on Apr 2, 2004 (gmt 0)

The results are only different on Google.com, country searches like Google.co.uk are showing the same results for www and non-www searches.

All my sites are showing less pages for the www search, is this the same for everyone here?

kaled

11:41 am on Apr 2, 2004 (gmt 0)

Just had a quick play - the following searches look identical.

site:www.bbc.tv
site:bbc.tv

Generally, the bbc uses the url www.bbc.co.uk and Google doesn't recognise that bbc.tv is the same, but it does appear to recognise that bbc.tv is the same as www.bbc.tv

Kaled.

Spica

12:50 pm on Apr 2, 2004 (gmt 0)

Another way to check which pages Google considers the same or not is to check PR. The index page of my site now displays the same PR with or without the www. It used to be PR0 if you didn't type the www. However, all other pages are still PR0 without the www.

jtbell

12:52 pm on Apr 2, 2004 (gmt 0)

Does this mean that Google have finally recognised that the www.domain.com is normally the same as domain.com?

Not necessarily. We have three servers, www.widget.com, foo.widget.com, and bar.widget.com. If you try to go to plain widget.com with your browser, you're told that the address doesn't exist. When you search for site:widget.com, Google gives hits from all three servers.

DoppyNL

12:54 pm on Apr 2, 2004 (gmt 0)

For Google domain.com and www.domain.com are 2 complete different domains.

However,

there are LOTS of site's that actually host the same site on both those domains (most of them don't even know it).
So, if Google finds out there is a site on both those locations, it will probably check if they are the same. If they are, 1 will be `banned` the other will be listed without as if it were the only one.

This used to be a problem, not anymore furtunatly. :)

On my site's *.domain.com is directed to the same path; but I use server-side scripting to make sure the site is displayed with the domain I want to.
So if someone forgets the www., I give a 301 to www.domain.com.
Google seems to like it, as do other SE's.

rfgdxm1

8:59 pm on Apr 2, 2004 (gmt 0)

>Does this mean that Google have finally recognised that the www.domain.com is normally the same as domain.com?

Google shouldn't assume anything. If you wish to make this clear to Googleblot, redirect root to www, or vice-versa.

SyntheticUpper

9:10 pm on Apr 2, 2004 (gmt 0)

Oh for heaven's sake drop it. This has been going on for months!

It's a Google bug, because they don't give a monkeys.

A 301 redirect leads to your pages being dropped.

This is fact.

Net_Wizard

10:49 pm on Apr 2, 2004 (gmt 0)

I agree with SyntheticUpper...

It appears that only Google doesn't know that www.domain.com and domain.com are the same.

Some may argue that www could have a different content, but really, how many sites that really bother to have different content for www.

It's basically a given that 'majority' of the sites, if not most, www.domain.com and domain.com are the same. If we want a subdomain we could name it something else such as sub1.domain.com or even www2.domain.com without having to mess around with www which often is nothing but just a 'symbolic link' to domain.com a carry-over from the old internet addressing.

I tend to agree that this is some sort of bug which also effectively bloat the database or could end up in some duplicate penalty type of thing.

For instance search for site:www.domain.com and site:domain.com, different result which could lead to problems such as PR and possibly duplicate content.

As for 301 which is the most popular advice here, never use them but instead I uses 404 error handling and just redirect everything to index page.

rfgdxm1

11:41 pm on Apr 2, 2004 (gmt 0)

>Some may argue that www could have a different content, but really, how many sites that really bother to have different content for www.

Not many at all. However, Googlebot has to allow for every possibility. Technically, root and the www subdomain are separate sites. Any webmaster who has problems with this needs to learn how to do server side redirects.

HarryM

2:42 am on Apr 3, 2004 (gmt 0)

A 301 redirect leads to your pages being dropped

You have got me worried! I redirect domain.com to www.domain.com. Is there any evidence this could cause problems?

4serendipity

3:40 am on Apr 3, 2004 (gmt 0)

You have got me worried! I redirect domain.com to www.domain.com. Is there any evidence this could cause problems?

I've been doing the exact same thing on a couple sites for about 2 years and haven't experienced any problems.

DoppyNL

6:50 am on Apr 3, 2004 (gmt 0)

You have got me worried! I redirect domain.com to www.domain.com. Is there any evidence this could cause problems?

When redirecting from domain.com to www.domain.com
domain.com will eventually be dropped; but www.domain.com will stay in the listing.
domain.com is dropped for the simple reason the crawler doesn't get any pages from it ;)

SyntheticUpper

9:05 am on Apr 3, 2004 (gmt 0)

>> Technically they are different sub-domains

Yes, yes, we all agree, how many times does this have to be re-stated?!

Fact is, technical matters aside, the proportion of sites that actually use these as separate subdomians is tiny.

So let's leave that old sausage in its pedantic pantry.

Regarding 301s causing pages to be dropped:

I can only speak from experience, and I'm afraid this is true. HOWEVER, only because G is so slow to update its database of URLs. If you have a 301 already in place - great - it will work fine, and protect you from a double listing. I envy you :)

Trouble is, if you have a current double listing e.g. not-www and www, and then you put in a re-direct, for example, from not-www to www (as GG appears to recommend), you will find that rather than your pages being corrected on the fly, they will be dropped as if orphaned.

It all depends on when G decides to update its records, but in the case of one of my sites I went from 1000 pages listed to 2 - and eventually pulled the 301 because I couldn't stand it any longer.

Within 12 hours all my pages were back, listed not-www! Exactly as before. I had waited 3 weeks, and lost handfuls of hair, with the net result of zilch!

I suggest get a 301 in place right from the start. But I can assure you, it can be a painful experience to try to correct it later :(

p.s. did you know that technically a tomato is a fruit? So it really shouldn't be in the vegetable section etc. etc. ... :)

p.p.s. GG has mentioned in the past that there is a 'stack' of redirects, as I recall, and that from time to time Gogle updates them. Why can't Google simply tell us approximately when this update is about to occur? The we can fix this old, and frankly silly, problem with the minimum of damage by putting in a 301 at the appropriate time.

MikeBeverley

9:29 am on Apr 3, 2004 (gmt 0)

Would a 404 not solve the problem without annoying Google?
I've used 404's redirecting to my main page for some time now and I've never had duplicate listings or any of my pages indexed without the www

Inktomi did have a problem with 404's at the beginning of last year, but they are having problems now with 301's (which Yahoo Mike assures us will be corrected within the next 4 weeks).

HarryM

7:06 pm on Apr 3, 2004 (gmt 0)

SyntheticUpper,

Thanks, you have made the situation clear.

I promise not to mention the matter again.

Until I forget of course... :)

g1smd

10:57 pm on Apr 3, 2004 (gmt 0)

The usefulness of the redirect is that it still pulls in visitors to your pages who are following links to the "wrong" domain from other sites, and from their peronal bookmarks, and emails, and so on.

plumsauce

9:25 am on Apr 4, 2004 (gmt 0)

The usefulness of the redirect is that it still
pulls in visitors to your pages who are following links
to the "wrong" domain from other sites, and from their
peronal bookmarks, and emails, and so on.

But, the redirect would not even have to be there
if it were not for the lameness of the spider. Many
sites run with one as the alias of the other for
non-www and www. Without a redirect, the sites
would still work fine. It is only the perception
of the need to avoid a possible dupe content
penalty that the redirects are even used.

If the spider took into account that of two
hosts, one is the alias of the other, then
this whole discussion would be moot.

After all these threads, you would think that
some PhD would have passed a note to GG to
announce that they had rejigged the crawlers
to accept non and www as one and the same,
just like a human would. IF and ONLY IF the
alias and the canonical resolved to the same
ip AND the content is the same, AND the parent
domain is the same, then no penalty accrues.
GUARANTEED.

Do we have that? No.....

And I ain't holdin' my breath.

BTW, technically non and www are only two
separate hosts if each has an A record in
the zone. If an admin does not know how
to setup one as an alias of the other, then
that is an entirely separate problem.

+++

g1smd

9:40 am on Apr 4, 2004 (gmt 0)

>> After all these threads, you would think that
some PhD would have passed a note to GG to
announce that they had rejigged the crawlers
to accept non and www as one and the same,
just like a human would. <<

>> Do we have that? No..... <<

>> And I ain't holdin' my breath. <<

Umm, Google is already aware of this, and they do have a fix in place. They have a database that says that this site is the same as that site, and to combine the results. Googleguy confirmed it only a few weeks ago in a WebmasterWorld thread.

That database does get updated every few months, and when it does, the PR and backlinks list for all versions of your site will be identical. Before the update they are all treated as separate sites, with separate PR and backlinks.

I have seen the update at work for several sites. It can take 6 months for the combination to happen (in my case it was combining the .net results with the .com results),

kaled

10:07 am on Apr 4, 2004 (gmt 0)

In answer to my original question, clearly Google have not fixed the problem. Having considered the matter, I think it is fair to say that the premise of my original post was flawed.

The acid test of whether Google have fixed the problem would be to compare the results of the following searches :

link:domain.com
link:www.domain.com

If, these produced identical results then we might consider the problem fixed. Definitely a premature post on my behalf - sorry.

Kaled.

g1smd

10:27 am on Apr 4, 2004 (gmt 0)

Once the site has been in the Google listings for about 6 months or so, then the results will usually be combined, as I explained just above.

HarryM

11:33 am on Apr 4, 2004 (gmt 0)

That database does get updated every few months, and when it does, the PR and backlinks list for all versions of your site will be identical

As I understand it, the criteria for getting into the database is if the cache for www.domain/ and domain/ is the same. If you regularly update your index page the cached versions will often be out of sync.

No5needinput

11:59 am on Apr 4, 2004 (gmt 0)

I posted a similiar question a couple of weeks ago. You can see GG's response here [webmasterworld.com...]

HarryM

12:13 pm on Apr 4, 2004 (gmt 0)

GG said "We try to guess for a domain whether domain.com is the same as www.domain.com".

I'm sure they don't do this manually, that is why I suggested they compare the cache.

claus

1:03 pm on Apr 4, 2004 (gmt 0)

My two cents (All this is of course afaik, fwiw, imho etc.):

Generally speaking, 301's work and give no problems - they only produce the desired result of one URL variant dropped and another being indexed in stead.

Now, there are exceptions to this rule:

1) The database lag time (from the moment you introduce a 301 until it is reflected in SERPs): This will typically be around a month or a little more. During this month nothing usually happens, but all kinds of odd things may happen, depending on the specific situation - it is not a general rule that all pages are dropped from the index, rather that is one exception. The typical scenario is that some old URLs will remain under the snippets for longer than others.

The "six months" case is an extreme exception, but then again this was not a switch from within a second level domain, in stead it was a merger of two different top level domains, and extra precaution should be taken in such a case.

2) Errors of all kinds: The receiving address may give a 404, the forwarding or recieving address may previously have been a 404, the robots.txt may need fine tuning, conflicting redirects may be in place, DNS or server setup might be uncompatible with the redirect, etc. All of this may lead to weird results that take place at the same time as the 301 redirect, but is not caused by it.

3) Google bugs: These do happen, as (a) Google is not error free, and (b) they think as a SE, not as a webmaster. One particular nasty redirect bug that appeared arond six months ago had to do with 301's (wrongfully) being interpreted as one-to-one relationships only; If you 301 redirected two or more URLs to one new URL you would risk unpredictable results in the SERPS (including, but not limited to "ghost URLs" (*) and de-listing).

This was a bug, there's no other explanation, apart from an outright error. Of course it is possible for a web publisher to merge the contents of two or more documents (pages/URLs) into one new document. Permanently, even.

The solution was to use 302 redirects for all cases that was not one-to-one (ie. all mergers of two or more URLs; not just relocations of one page). This worked, but of course it is not the proper way to do things (as a 302 is temporary and not permanent). I'm not sure if it's fixed yet, but i think so. Recently (within the latest 1-2 months) i've made a 10-domains-to-one 301 redirect and i have experienced no sideeffects.

(*) Ghost URLS: URL only in SERPS, no snippet, no cache. The 301 redirect bug is not the only cause for this phenom, it may also appear due to waking the dead (reviving 404's), to not-yet-spidered links, to reinclusion/respidering of links from past indexes, and a multitude of other weird things.