Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

"Domain Association" in Google.

How does that relate to Duplicate Content issues?

         

g1smd

11:21 pm on Jul 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For my 6000th WebmasterWorld post. Crikey!

If you have a website that responds with a "200 OK" at both domain.com and at www.domain.com then you have Duplicate Content. In effect you have two sites competing against each other in the SERPs. This has been covered many times before.

You will find that random pages from each "site" are listed in the SERPs, and that many are marked as Supplemental Results. When listing pages using a site: search, you will find that site:www.domain.com and site:domain.com -inurl:www give completely different listings.

Additionally, the Pagerank for domain.com/somepage.html and for www.domain.com/somepage.html are likely to be completely different for almost every sample of somepage, even for the root index page.

Finally, the results for link:www.domain.com and for link:domain.com are also likely to give completely different results too.

These are things that we have known about for years.

.

In the distant past, Google tried hard to "associate" related sites (such as www and non-www) as being "one site".

Some three or four years ago they used to run a process over their database, several times per year, to fix these associations. However, they stopped doing that long ago.

Duplicate Content issues really started to bite around the time Google introduced the Supplemental Index. It was at that point that using a 301 redirect from non-www to www (or vice versa) started to become essential. It made sure that all pages of a site were listed as www (or as non-www if the redirect was reversed).

That procress has been discussed many times here too.

.

Once Google associates those two "sites" as being the same site, they then list the same backlinks, whether you ask for link:domain.com or for link:www.domain.com. This occurs even though the links only really go to one particular domain, and there are none pointing at the "ghosted" site(s).

.

Additionally, if a site has a .com domain and a .co.uk domain, or any other combination of domains, such as a main site and some common mis-spellings, the same 301 redirects are also required to avoid all of those Duplicate Content issues disussed above.

.

Obviously, once Google makes those additional associations, a request for link:anydomain.com for any of the related domains will give exactly the same list of backlinks, even though all of the links actually only go to one particular domain.

.

There is an interesting question as to how the internal mechanism works when you request the backlinks list for one of the "associated" sites.

For example, when Google associates domain.co.uk with the main site www.domain.com and starts to "list" the same backlinks for both searches in the SERPs, do they:

1. Copy all the data for www.domain.com (the main site) to a separate "file" for domain.co.uk and show that file when link:domain.co.uk is requested, OR

2. When people request data for domain.co.uk simply show the data for www.domain.com (the main site) instead.

The results for two searches are the same; but are you looking at a separate copy of the data, or are you just being redirected to the original backlinks list as held for the main site?

.

This also has important repercussions as to how backlinks listed for the associated sites are treated if the 301 redirect that completes the association is ever removed.

For a while, Google will be showing an incorrect backlink list, which may lead to some confusion if you weren't aware that a redirect had previously been in place.

How long does it take Google to realise that the redirect has gone?

How long does it take for the backlink list to be recompiled for the "associated" site, or for the association to the "other" backlink list for the "main site" be broken?

Even if you see something listed in the public SERPs, I still don't believe that that is always the same data that Google is using internally, or that a site may be getting the benefit that you think it might be getting from what you think you see.

.

Discuss.

g1smd

3:56 pm on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK. It went kinda quiet all of a sudden.

Is this one too difficult, too obvious, too technical, too controversial, too political or what?

I've been told, elsewhere, that I'm talking rubbish with this one.

rainborick

4:28 pm on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've kept an eye on people experiencing the www. problem for several years now, and I still see evidence of Google eventually canonicalizing these URLs for indexing and ranking purposes. So I was surprised to see you say they stopped doing it. I'll be keeping a closer watch on it.

What I have also seen Google doing is modifying the special operators in ways that make them less accurate for dissecting SEO issues. That is, a couple of years ago, if I saw weird results from site: and link: I'd be much more inclined to consider them to be reliable indicators of trouble than I do today. Today, I'd use them only to justify more investigation. This has been true since Google's Big Daddy infrastructure changes that I think have limited those special operators to using methods that are designed to either enhance their speed or limit their impact on the system. But ultimately, it has all made them less accurate for SEO issues like this.

CainIV

5:34 pm on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting post.

Here is a case study that recently happened which might shed some light.

Client of mine had a website that was rebuilt from non-friendly urls to friendly urls. The development was done without further consultation on my part.

Previously I had the development team do site wide non www redirect to www using 301. The site was ranking well.

The development team forgot to implement the 301 4 weeks ago during testing and implementation. Upon coming back off holidays, to my surprise, the redirect was gone.

Luckily enough I check the website when I got back and asked they immediately put the 301 back.

Using site: the website was now a mess, where previouly it was fine. There were unequal numbers of non ww urls and www urls in the query.

Here are my observations from looking at server log traffic, hits etc:

1. It only took only about one week for Google index the incorrect url. Days later the non ww url and www url were both listed.

2. Within the next update in the SERP's the site went from second page to not in the top 1000 for its search terms.

3. Putting the 301 back has caused a jump of about 100 positions per day back towards where the site previously sat in the SERP's.

This doesn't explain how Google counts backlink checks using queries for each 'page', but it doesnt show that Google certainly views them as different things, and that when using the 301 that it can (within a week give or two on a well linked site) assosicate backlinks with different pages.

Drew_Black

1:43 am on Jul 21, 2007 (gmt 0)

10+ Year Member



Isn't setting the preferred domain in Webmaster Tools supposed to take care of the www/non-www issue?

CainIV

3:18 am on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, that setting only tells google how you prefer the site to be shown. You need the 301 redirect to make sure nothing is taken to 'chance'.

g1smd

7:05 pm on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone want to discuss the "Domain Association" part of my post, and subsequent information?

Marcia

9:06 pm on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Going back to inclusion in the "anchors file" in the original documents (if that's still valid), the question is how that's handled in the case of associated domains or TLDs.

Miamacs

9:43 pm on Jul 21, 2007 (gmt 0)

10+ Year Member



...

DISCLAIMER: My brain is over ten degrees above its nominal temperature. It's far too hot for me to think properly, so expect some redundancy.

But it's interesting so I gave it a shot.

I say it's a redirected query. Not a copy of the profile.
Meaning when asking for data of the domain that has been merged into another, I get the data of the recipient side, and that's that. ( though only for domain to domain redirects. ) And in the background Google keeps track of the now redirected domain too.

The final destination of the 301's slowly picks up all the data, which is recorded to its profile one by one as they crawl the web ( as long as the redirects are in place ), but the redirected domains' profiles are frozen from day 1, and queries for them are forwarded to their target's data. ( which will soon include theirs, but not vica versa ).

From the perspective of links, the new site will not get them copied to its profile either. It seems to me that Google picks them up one by one, simply skipping the inbetween domain, and is adding them to the final destination's profile. Until it does, these links are in the now inaccessible profile of the now redirected domain, but since the page listings are turned off, ( site: command won't show old pages ) they don't really matter. Once it did, the link is ( obviously ) erased from that list, as it's taken for granted that they now have a new target... and they can't have two. So... there's a temporary blackout. Should you pull the redirect in the midst of the process, it will not negate the effect at an instant, but reverse the above practice, with the target domain losing and the old domain re-gaining the links when Google finds out that they have a new target ( yet again ). Except that with every such move your trustrank clock gets reset. You lose all the benefits of link age which is... need I explain? A bad move.

If you look closely on the SERPs for the info:example.co.uk domain ( which, let's say has been redirected to example.com ) you'll notice that soon after the full domain redirect is picked up ( root level, full scale redirect ) Google will list the options for the domain as follows:

query is info:example.co.uk

This will list the domain like this:

TITLE of EXAMPLE.COM
The regular snippet found for example.com, let's say
the META description tag, nice and tidy.
www.example.com/

Show Google's cache of example.co.uk - (is really a link to cache:www.example.com)
Find web pages that are similar to example.co.uk - (is really a link to related:www.example.com)
Find web pages that link to example.co.uk - (is really a link to link:www.example.com)
Find web pages from the site example.co.uk - (this is however site:example.co.uk ,0 results)
Find web pages that contain the term "example.co.uk"

...

From the time Google learns of the domain redirect...
The link: command will show 0 results.
the site: command will show 0 results.

The data in their profile seems to fade away in a similar pace when Google can't reach a page for a longer period. Technically, it can't. The difference is, that while Googlebot will try to crawl pages again and again, it will do so on the new domain. Links it can follow will be recorded, links it can't will be reported but for the new domain. Which means a link less for the old one, every time. If the redirect is lifted, the same thing occurs, only in reverse, it will drop the links it recorded for the new domain, and start accessing them at the old location. But it won't look for pages on the resurrected domain that only the new domain had.

...

Google seems to think differently of a domain to domain redirect and page by page redirects. If it sees, at the root level, that example.co.uk is now example.com, no further rules set, that breaks the ice very fast. After the "merger" is completed, the old domain name will default to the new one, simply skipping the process of evaluating the 301. They just don't seem to care anymore.

...

But if you look at the issue from another angle...

Let's assume that the data of a domain that's now redirected to another ( whether it's the exmaple.com to www.example.com or example.co.uk to example.com ) is a copy of its target, and not just a redirected query in the database.

... isn't that what the 302 hijack was all about?

Put up a redirect, wait until the database copies the profile of the target into the profile of the source page, pull down the redirect. It sti(n)cked. But that has been fixed now hasn't it?

...

As for the time it takes for Google to stop believing the redirect would be gone/come back, it seems to me that the timeframe can take anything from 4 days up to 2 to 3 weeks.

Anyway these are but my observations...

youfoundjake

9:49 pm on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



g1smd
For example, when Google associates domain.co.uk with the main site www.domain.com and starts to "list" the same backlinks for both searches in the SERPs, do they:

Is that association based on a 301 in place or is it based on how the pages are linked to in the anchor tag?

I don't know, but if its based on the anchor text, then Google should see it as a completely different domain wouldn't they regardless of the 301, as I may have touched in a previous post about the actual domain "example.com" and how there are subdirectories

[webmasterworld.com...]

Google has a list of URLs for example.com, regardless of whether they exist or not, just because at some point those URL's are referred to in an anchor tag.

As for my site:
no www, 1-184 repeat is 142, total is 142
www 1-185, repeat is 143, total is 143

what is that one page difference? Heeh.

domain.com -inurl:www yields no results for my site.

g1smd

10:29 pm on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My example was for a domain that has a site-wide 301 redirect in place to some other domain.

You can take that as .co.uk and .com, or as mis-spellings of the main domain, or whatever you want.