Forum Moderators: open
If the URLs were example.com and www.example.com, or example.com/foo.html and example.org/bar.html then this would still happen.
I can't think of a domain that should have www.example.com and example.com to check that they haven't also started to assume that those two URLs could be merged
Depending how you code your website, you could easily do some things that slow the merging down. For example, if you have the code generate the absolute URL from whatever URL the current page is, would cause the server to return different pages for different domains.
And the other curious thing is:
domain.com has PR6 as usual
www.domain.com has PR0
very weird thing, isn't it?
Any conclusions or suggestions what to do to get this problem fixed? You know, having PR0 on www.dom.com is not very desirable!
www.domain.com
domain.com
www.domain.com/%1F
www.domain.com/?tracking=code
I have seen this happen once or twice a year for the last few years and it usually takes "more than days and less than months" until G realizes all of the pages are the same and merges them. During this process I usually see a loss of internal PR and lose 90% or more of the keywords I rank for.
This time is no exception and appears to have started sometime around the end of May and the main page was gone from the serps completely and internal pages fell several spots. Everytime this has happened, once G resolves this issue, all rankings return to normal +/- 1 or 2 spots.
There is absolutely no reason why someone should not have a different page on www.example.com and example.com - do we think that Google should fail to list one of those perfectly valid sites? I hope not.
Google is excellent at dealing with duplicate content - it merges the two URLs and credits whichever one stays with the backlinks (and PR) of both.
If we have two URLs with different content (or content that changes between the times Googlebot asks for the two URLs) then we should not be surprised to see each one listed.
Would any members here deliberately deliver different content?
It may be legal to do so but I suspect the answer is 'no'.
Kaled.
I've seen you mention several times that you are a programmer. When you open a file, do you check for an error?
What you are suggesting is worse than assuming that there is no error when you open a file.
example.com, www.example.com, and ftp.example.com are in fact different domains, even though in most cases they will point to the same machine. Google should, and does, assume that they are different until they find enough evidence that they are the same.
It is still quite common at large institutions for example.com and www.example.com to be on different IP addresses, and the web server on the non-www server just puts out a 301, or the router will reroute the http requests to the other ip address.
In the case of the bbc, where they are in fact different servers shouldn't google be asssuming they are different until there is sufficient proof otherwise?
If that is considered acceptable by Google, I can see no reason for treating links to www.domain.com differently to domain.com. Remember, all Google does is follow links and index the pages it finds.
How Google treats www. is irrelevant to the operation of the internet, but Google might function better with this problem resolved. Having said that, it seems the problem has been at least partially fixed, so perhaps Google engineers agree with me.
Kaled.
I think I am correct in saying that links to two unique pages dir/index.html and dir/index.shtml will both treated as links to dir/.
If they do, they shouldn't. There is an index.html in my website's root, but it is not the default page. It is the old "coming soon" splash screen that was up for about 2 weeks while we worked on getting things uploaded 3 years ago. If somone is linking to that page, then they mean to link to that page instead of the default home.php page.
Google can take the name as a clue that they are probably the same, just as they can take the DNS settings as a clue to whether two domains might be the same.
And as many of us pointed out, it doesn't appear that the google engineers have changed anything at all. They have been merging those domain names for years when the evidence they have points to them being the same. It just takes longer for some situations.
Just because it has now merged your domains, does not mean that they changed anything, it just means that they have amassed enough evidence to convince them that they should be merged.
Merging IS a good thing, but merging based on simplistic assumptions is a bad thing. Those simplistic assumptions can be taken into account, but they should not be the sole basis for the merge.
Merging IS a good thing, but merging based on simplistic assumptions is a bad thing.
I rather doubt that Google performs massively complex analysis to determine if domain merging is valid. A simple algo is likely to be right 99.9% of the time. However, the algo isn't as simple as it might be.
As for my domain - well the last time I checked this (months ago), I checked a load of other big sites and most were wrong, now when I check most are right. Perhaps this has been creeping up on us or perhaps Google have changed something, I don't know (and I can't remember which sites I checked before).
Kaled.
I guess my point is, google obviously has enough evidence that all of these pages are the same, but on occasion, it forgets I suppose. The content has NEVER been different, but when google "forgets" I lose my rankings, once they "remember" all rankings return to normal. Also during this "forgetting" period, the results for the keyword I used to show up for are of poor quality and include pages that have nothing or very little to do with the search term. The last time this happened it even listed completely blank pages and one page from an expired domain that resulted in a page not found. Strange how this is not widely accepted as something needing to be fixed.
I'm quite sure google does not want to return results with completely blank pages and websites that no longer exist. I will say that the two examples I point out, used to have relevant content months before they returned to the search.
Also, google still has some pages listed for allinurl:www.mysite.com that have not existed for over a year and a few cached versions of pages from early 2003.
I understand there are measures that can be taken to rewrite the urls so that they are easier to merge, but in my case, google is indexes duplicates of many of my pages that are tracking urls or results from other search engines with query strings that are beyond my control. How can I mod_rewrite for every imaginable query string? Obviously nobody can guess every possible query string generated from other search engines or advertising systems and mod_rewrite them.
If this isn't something that needs to be fixed, what is it and how can I stop google from indexing dozens of urls for the same page?
Replace the bits in bold :):
RewriteEngine On
Options +FollowSymlinksRewriteCond %{HTTP_HOST} ^sitedomain\.com
RewriteRule (.*) [[b]sitedomain[...] [R=301,L]
Thanks for asking, posting this here hopefully will reduce the number of stickies I'm getting with this question, hehehe.
Sid