link:www.example.com == link:example.com

Forum Moderators: open

Message Too Old, No Replies

link:www.example.com == link:example.com

Finally, it's fixed

kaled

6:18 pm on Jul 6, 2004 (gmt 0)

Perhaps I haven't been paying attention, but I just noticed that the link:www issue has been fixed.

link:www.domain.com returns the same results as link:domain.com.

Is this new or did I just miss it?

Kaled.

ciml

6:39 pm on Jul 6, 2004 (gmt 0)

I suspect that Google crawled both URLs and merged them.

If the URLs were example.com and www.example.com, or example.com/foo.html and example.org/bar.html then this would still happen.

I can't think of a domain that should have www.example.com and example.com to check that they haven't also started to assume that those two URLs could be merged

BigDave

6:53 pm on Jul 6, 2004 (gmt 0)

Like ciml said, they finally got around to merging them. It usually only takes them about 3 months from what I have seen. They have been doing this for at least 3 years.

Depending how you code your website, you could easily do some things that slow the merging down. For example, if you have the code generate the absolute URL from whatever URL the current page is, would cause the server to return different pages for different domains.

doc_z

6:53 pm on Jul 6, 2004 (gmt 0)

As ciml said, Google just merged the pages/sites. The handling hasn't changed for a long time. If www.domain.com and domain.com are indexed with different content, you still get different results.

djgreg

9:09 pm on Jul 6, 2004 (gmt 0)

I don't think it is fixed.
In my case domain.com and www.domain.com has exactly the same content.
Now when searching for link:www.dom.com Google shows zero backlinks, when searching for link:dom.com it shows 50 backlinks. [and only on one site of these 50 backlinks my domain is linked like a href="h..://dom.com", all other sites use the usual method. but why shows google the sites which link to www.dom.com as backlinks for dom.com ? ]

And the other curious thing is:
domain.com has PR6 as usual
www.domain.com has PR0

very weird thing, isn't it?

Any conclusions or suggestions what to do to get this problem fixed? You know, having PR0 on www.dom.com is not very desirable!

my3cents

9:49 pm on Jul 6, 2004 (gmt 0)

Mine is not only, NOT fixed, ut I am still getting my home page show up several different ways:

www.domain.com
domain.com
www.domain.com/%1F
www.domain.com/?tracking=code

I have seen this happen once or twice a year for the last few years and it usually takes "more than days and less than months" until G realizes all of the pages are the same and merges them. During this process I usually see a loss of internal PR and lose 90% or more of the keywords I rank for.

This time is no exception and appears to have started sometime around the end of May and the main page was gone from the serps completely and internal pages fell several spots. Everytime this has happened, once G resolves this issue, all rankings return to normal +/- 1 or 2 spots.

kaled

12:54 am on Jul 7, 2004 (gmt 0)

Hmm, I've checked more sites. Certainly the bbc site is still suffering from the same old problem, but for others, including my own (that I haven't changed) the problem is fixed.

Something has definitely changed, but clearly I was wrong to suggest the problem has been entirely fixed.

Kaled.

ciml

11:27 am on Jul 7, 2004 (gmt 0)

I think it helps to get away from the idea of a problem to be fixed.

There is absolutely no reason why someone should not have a different page on www.example.com and example.com - do we think that Google should fail to list one of those perfectly valid sites? I hope not.

Google is excellent at dealing with duplicate content - it merges the two URLs and credits whichever one stays with the backlinks (and PR) of both.

If we have two URLs with different content (or content that changes between the times Googlebot asks for the two URLs) then we should not be surprised to see each one listed.

sidyadav

11:32 am on Jul 7, 2004 (gmt 0)

So do we still have to use the .htaccess technique to redirect site.com to www.site.com?

I'm guessing yes, as for the PR?

Sid

kaled

1:27 pm on Jul 7, 2004 (gmt 0)

There are sites that don't work without the www. prefix and there are sites that redirect to a www.prefix, but I have NEVER seen a site that has different content when the www. prefix is used.

Would any members here deliberately deliver different content?
It may be legal to do so but I suspect the answer is 'no'.

Kaled.

ciml

4:16 pm on Jul 7, 2004 (gmt 0)

Kaled, in my experience things work better in a large, heterogeneous environment when people follow the rules [faqs.org].

BigDave

5:40 pm on Jul 7, 2004 (gmt 0)

kaled,

I've seen you mention several times that you are a programmer. When you open a file, do you check for an error?

What you are suggesting is worse than assuming that there is no error when you open a file.

example.com, www.example.com, and ftp.example.com are in fact different domains, even though in most cases they will point to the same machine. Google should, and does, assume that they are different until they find enough evidence that they are the same.

It is still quite common at large institutions for example.com and www.example.com to be on different IP addresses, and the web server on the non-www server just puts out a 301, or the router will reroute the http requests to the other ip address.

In the case of the bbc, where they are in fact different servers shouldn't google be asssuming they are different until there is sufficient proof otherwise?

kaled

8:41 pm on Jul 7, 2004 (gmt 0)

I think I am correct in saying that links to two unique pages dir/index.html and dir/index.shtml will both treated as links to dir/.

If that is considered acceptable by Google, I can see no reason for treating links to www.domain.com differently to domain.com. Remember, all Google does is follow links and index the pages it finds.

How Google treats www. is irrelevant to the operation of the internet, but Google might function better with this problem resolved. Having said that, it seems the problem has been at least partially fixed, so perhaps Google engineers agree with me.

Kaled.

BigDave

9:30 pm on Jul 7, 2004 (gmt 0)

I think I am correct in saying that links to two unique pages dir/index.html and dir/index.shtml will both treated as links to dir/.

If they do, they shouldn't. There is an index.html in my website's root, but it is not the default page. It is the old "coming soon" splash screen that was up for about 2 weeks while we worked on getting things uploaded 3 years ago. If somone is linking to that page, then they mean to link to that page instead of the default home.php page.

Google can take the name as a clue that they are probably the same, just as they can take the DNS settings as a clue to whether two domains might be the same.

And as many of us pointed out, it doesn't appear that the google engineers have changed anything at all. They have been merging those domain names for years when the evidence they have points to them being the same. It just takes longer for some situations.

Just because it has now merged your domains, does not mean that they changed anything, it just means that they have amassed enough evidence to convince them that they should be merged.

Merging IS a good thing, but merging based on simplistic assumptions is a bad thing. Those simplistic assumptions can be taken into account, but they should not be the sole basis for the merge.

kaled

10:26 pm on Jul 7, 2004 (gmt 0)

Merging IS a good thing, but merging based on simplistic assumptions is a bad thing.

I rather doubt that Google performs massively complex analysis to determine if domain merging is valid. A simple algo is likely to be right 99.9% of the time. However, the algo isn't as simple as it might be.

As for my domain - well the last time I checked this (months ago), I checked a load of other big sites and most were wrong, now when I check most are right. Perhaps this has been creeping up on us or perhaps Google have changed something, I don't know (and I can't remember which sites I checked before).

Kaled.

my3cents

4:30 am on Jul 8, 2004 (gmt 0)

I would think that most sites are like mine, where www and without www are the same. As for my site, they have been the same for several years, and google usually merges them (does not list them each, but only the www. version). Once or twice a year, google starts listing them separately, or in my case, several time. 5 listings for the same page, including tracking urls it's picking up from results of other search engines and ads.

I guess my point is, google obviously has enough evidence that all of these pages are the same, but on occasion, it forgets I suppose. The content has NEVER been different, but when google "forgets" I lose my rankings, once they "remember" all rankings return to normal. Also during this "forgetting" period, the results for the keyword I used to show up for are of poor quality and include pages that have nothing or very little to do with the search term. The last time this happened it even listed completely blank pages and one page from an expired domain that resulted in a page not found. Strange how this is not widely accepted as something needing to be fixed.

I'm quite sure google does not want to return results with completely blank pages and websites that no longer exist. I will say that the two examples I point out, used to have relevant content months before they returned to the search.

Also, google still has some pages listed for allinurl:www.mysite.com that have not existed for over a year and a few cached versions of pages from early 2003.

I understand there are measures that can be taken to rewrite the urls so that they are easier to merge, but in my case, google is indexes duplicates of many of my pages that are tracking urls or results from other search engines with query strings that are beyond my control. How can I mod_rewrite for every imaginable query string? Obviously nobody can guess every possible query string generated from other search engines or advertising systems and mod_rewrite them.

If this isn't something that needs to be fixed, what is it and how can I stop google from indexing dozens of urls for the same page?

scoreman

3:30 pm on Jul 8, 2004 (gmt 0)

Hey, sidyadav. What code do I need in the htaccess file for to redirect from [domain.com...] to [domain.com...]

sidyadav

5:04 pm on Jul 8, 2004 (gmt 0)

> What code do I need in the htaccess file for to redirect from [domain.com...] to [domain.com...]

Replace the bits in bold :):

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^sitedomain\.com
RewriteRule (.*) [[b]sitedomain[...] [R=301,L]

Thanks for asking, posting this here hopefully will reduce the number of stickies I'm getting with this question, hehehe.

Sid