Welcome to WebmasterWorld Guest from 220.127.116.11
It appears that once Google is able to index some pages under the non-www and the www the whole site is split.
At one stage I only thought that is was the urls that Google crawled that got split - but it does appear to be the whole site - even for the urls that are not crawled!
Eg - take a look at MCs site - which suffers from Canonical issues but has not suffered in a downranking at this stage - possibly due to the number of links going to the /blog page.
mattcutts.com/blog/2005/10/ is not indexed under the non-www and if you query all the DCs you will see that it references the www on a "mattcutts.com/blog/2005/10/" search as the page.
So you would probably think that the Canonilization process is working correctly - and we would assume that Google thinks that mattcutts.com/blog/2005/10/ is the same page as www.mattcutts.com/blog/2005/10/
But if you do a PR check on the non-www and the www pages you will start to see that they are in fact split - the non-www has no PR while the www has PR accross the DCs.
Also do a cache on the non-www and no page is returned.
To me it just looks like Google thinks there are two sites:-
mattcutts.com and www.mattcutts.com - the page mattcutts.com/blog/2005/10/ has not been crawled under the non-www version so has no cache or PR - but it recognized as different from the www as Google know that a site at mattcutts.com exists.
Try this on sites which dont have Canonical issues and you will see that if you query the non-www you will get the PR of the www and the cache of the www.
Matt site has seemed to have survived the site split - but as we know it has done a lot of damage to a large number of sites on the web.
Whether this fix that MC has talked about will correct this problem we will see.
Previously as I said I always thought it was just the odd page that had canonical errors. (Most typically homepages as that is where the link goes in)
But it deffo seems to be the whole non-www site that gets split away, so on the non-www Google allocates PR/cache to the pages it has crawled and puts the rest of the non-www site to a PR0 - as they have not been crawled/listed.
So a site:domain.com -www does not really show all the pages that have the problem - as effectively it is the whole site - and that search only shows the pages it has crawled from that non-www site.
We had a chat via sticky, i have been looking into this and it would seem there are issues with canonicalization.
How do i know this?
Remember a couple of months back, lots of sites were in supplemental hell (Big Daddy). I was one, but i kept an eye on 3 competitors who were also in the supplementals. 4 other competitors were not. Incidentally, myself and the 3 other guys in supp hell did not have htaccess in place, i was now moving to another server that supported it. After the pages came back into the index after the BD update, about 2 weeks later, bang! myself and the 3 supp sites dropped quicker than homesick moles. The 4 guys who werent in supp hell survived. Google's little trick to sort the base URL out didnt work, at least not in my case or the 3 other guys.
It was literally one page on my site that got cached with non WWW, but as you say, it trickled down through the site, fed zero PR to the sitemap and voila, i am now staring at the butt end of my competitors.
Tried to get in touch with google, zip. 2 things make google great, the product itself and the sites it lists for users. maybe they could take time to help those who really want to know what they SHOULDNT be doing to filter out the guys who just want to know what they SHOULD be doing to get good rankings.
Yes, which is why on some DCs your sitemap has a PR0 on the non-www despite the non-www page not being listed in Google - because effectively that page now belongs to another site (site as in non-www) and the PR calculation for Google has gone wrong.
Everyone was going blah blah blah about the supplementals but all those sites had a split site problem and the supplemental problem was caused by Canonical url issues.
The Canonical url issues are still there for lots of sites....sigh after so flippin long too.
Still awaiting the Mozilla Googlebot PR calculation etc to see if Google can make any progress.
and after applying the htaccess redirect how long of a time frame are you waiting to see if things are straightened out?
From my experience it's 6 to 12 months before all links that are on the web, are pointed correctly at your absolute URL.
Sometimes it is best to operate on what is best for the website, and not what is best for the website with Google considered in the equation.
While Google seems to be having some issues it is going to fix itself and if you are not taking the steps now to fix things...then when Google does get straight you'll be even farther out of the loop...
By any chance would you happen to know if this "site split" can apply to a situation like domain 1 301'd to domain 3, domain 2 301'd to domain 3. Then domain 3 is 301'd to www. only? I am seeing some really odd characteristics that tell me that Google is showing the www. canonical issues correctly for the site I was having www. issues with, but is getting confusued on a lot of the pages that they see as domain 1, or domain 2 instead of domain 3.
One thing to note is that I do webmsster for a site that was experiencing canonical errors and yet has no supplementals.
Both my non www and www were indexed about 3 months back (shortly before Jagger) Since then, the cache date of the non www page has reverted to March 2005, and a search for the non www page shows a supplemental. None of the inner pages show anymore for a non-www site search.
I have had a 301 in place since update Jagger.
if anyone wants to sticky me, go for it but in short i think we found 2 major reasons on why we went from number 1 to number 40 overnight. These are not concrete conclusions, but hell, the figures stack up.
Multiple 302 hijacks of the sitemap pages, Dayo thanks for leading me down that road. I know google are sorting this out now, if anyeone has lost rank, take heart, google doesnt follow the redirects any longer. We were being hijacked to pull in punters for their adsense campaign.
Then i got into checking the intitle: command and i always try and look at various other competitors up and down the rankings to formulate a more informed guess. intitle: command showed me that a couple of copy cats recently took my title word for word and slapped it into their pages....so much weight was given to them that running the intitle brings my own homepage (the originator of that title) as number 38. Nice one google, might be a good idea to check domain age as opposed to how many times it appears in a site to give it relevancy points.
These two points along with the canonical issue brings me to the conclusion that we are suffering right now because of three intrinsic erros in the way google 'decides' what is important and what isnt.
I'm outta here, we are good enough for top spot on yahoo and MSN, whatever we get from google now is a bonus but i aint gonna try and keep one step ahead of the copy cats because the billion dollar machine cant do it for me.
Within weeks of adding the redirect, ALL of the non-www pages were fully listed and showing a snippet, but it took many months for all of the www pages to be delisted.
A few weeks later dozens of www pages reappeared as Supplemental Results and still remain listed nearly a year later.
PR was originally split all over the place, but I haven't checked it again for many months.
Little evidence of it.
MC saying that Google are going to fix canonical problems every 3 months and then nothing happening does not mean a fix is going to happen IMO.
Just hope they know what the problem is and what problem it has caused to sites - if the fix is just simply showing one url for a domain.com/www.domain.com and they dont fix the split ranking issues that have resulted from this bug then that is not a fix.
I had a site that was all supplemental and pages started coming back - now I notice that pages are starting to disappear again.
Supplemental problem still not fixed as Canonical issues are not fixed - why cant Google fix the problem that has destroyed there index. Why cant MC/GG gives us progress reports - eg with the Supplemental issue they ask for sites and gave an update on the situation.
... and found something of interest.
Dayo, I'm surprised you've only just discovered this.
Some sites without the .htaccess 301 in place have been affected worse than others. (It is my suspicion that sites on Windows servers and FP sites on Apache were the worst affected... but that is largely a guesstimate based on my many sites which are a mixture of all sorts). But it's always been the whole site that was affected by poor SERPs (suggesting whole site was split for PR and link rep).
Of the sites that were affected some who initiated the .htaccess redirect recovered their rankings (you must know of EFV's posts, at least). Some sites that did not implement the redirect also recovered their ranking i.e. Google was improving some canonical problems even before BD - Mattcutts may be an example but I see examples among big known, well linked sites. However, since BD there have been a lot more sites that have recovered from the "whole site" malaise of non-redirect dup content affecting their SERPs position. Or so it seems.
While I appreciate it may be frustrating for the owners of those sites that haven't recovered I have to, for once, agree that Google seems to at last be doing something about this problem. And I wish they get around to your site/s soon.
Yay, I was probably looking at it slightly the wrong way. EG I thought that as I only had the non-www homepage listed in Google I assumed that it was just that one page which had the problem.
I therefore thought as the homepage was effected the rest of the site suffered rather than all pages within the site have been split.
>>>While I appreciate it may be frustrating for the owners of those sites that haven't recovered I have to, for once, agree that Google seems to at last be doing something about this problem. And I wish they get around to your site/s soon.
This has been the most frustrating thing I have ever experienced - I really do hope a fix is forthcoming for all sites.
If Google had banned my site (I would have disagreed) but at least I would know where I stood - all of this, yes we are going to fix it message coming out of G every few months with no fix (for still a large number of sites IMO) has made it even more frustrating - it is worse than a penalty.
Google has canonicalized to: www.mattcutts.com/blog/2005/10/
I do not see both versions (www and non www) listed. Only one version of the page exist in the index from what I can tell. If the non www version has no PR value and the www version is assigned PR, then how is this a canonical problem?
Sorry if I do not understand your logic Dayo or if I am just not getting it
If you do a link:mattcutts.com/blog/2005/10/ or cache:mattcutts.com/blog/2005/10/ it should return the www results if the page was truly Canonicalized. Try it with correctly Canonicalized pages from other sites.
It should also show the www PR if it was correctly Canonicalized - the page acts as an uncrawled page from the non-www part of MC site.
They have updated again, and my site is down to its correct number <1000 so now I can see what they have put into supplemental and if there are duplicates between domain.com and www.domain.com.
There are no duplicate URLs. The PR of www.domain.com is the same as the PR of domain.com. So I don't believe its a split domain problem.
What is clear though, is they have many of my main product pages in the supplemental index. The more product variations I have in that range, the more likely it is to go into supplementary, with the top product range completely unsearchable in Google.
I think it was a mistake to reduce our pages last year for Google Search, we even had to remove the landing pages for Froogle since they contained a duplicate of each product, laid out especially for Froogle visitors.
It gutted our online business without any counter benefit from Google. I think we'll go back to splitting the ranges across pages again so each product can be searched for separately on MSN, and repeating the seasonal items on their own seasonal pages. It makes for a good search result, even if Google thinks otherwise and we can do without the last 5 visitors a day we get from Google.
"It makes for a good search result, even if Google thinks otherwise and we can do without the last 5 visitors a day we get from Google." panlus
So true and easily forgotten
Google sends about 50% of the traffic to website.
Anyone know where the other 50% comes from?
And there is nothing to say that Google sends a better targeted buyer. Most often PPC conversion shows Yahoo converts to sales at a higher percentage than Google.
One of the most popular spammed out categories online many would think Google to be the most loved but actually it is Yahoo traffic the webmasters thrive for.
Is purposely redirecting to non-www considered bad form? I have always disliked those pointless and space wasting w's and their requisite period in the URL (www.) so for 7 years I've left them out of all addresses in articles I've written and distributed, email sigs, forum posts, and marketing materials.
Now I have a half dozen sites that have never used the www. in the URL and want to keep it that way. Can anyone point to a quick how-to on redirecting to non-www? Is it possible? Is it wise? My sites didn't suffer in BD - one of them improved.
I lost all G rank back on Dec 27th'ish and regained it on March 7th, to lose it all again last week. Seemed like G had figured it out for a short while but regressed.
I have been haste to implement 301's from the non-www to www because I didn't want to give in to the problem G was having, since it wasn't a problem for my site.
But in the end, they have the traffic I need. Should I 301 my non-www directory to www?