Matt did mention, more than once, that he intentionally did not redirect no-www to with-www so that his site could be a "test case" for this kind of canonical problem. Hope Google gets this sorted. It flies in the face of the way many webhosts set up a domian by default.
This slightly changes on how I see the problem.
Previously as I said I always thought it was just the odd page that had canonical errors. (Most typically homepages as that is where the link goes in)
But it deffo seems to be the whole non-www site that gets split away, so on the non-www Google allocates PR/cache to the pages it has crawled and puts the rest of the non-www site to a PR0 - as they have not been crawled/listed.
So a site:domain.com -www does not really show all the pages that have the problem - as effectively it is the whole site - and that search only shows the pages it has crawled from that non-www site.
Spot on Dayo
We had a chat via sticky, i have been looking into this and it would seem there are issues with canonicalization.
How do i know this?
Remember a couple of months back, lots of sites were in supplemental hell (Big Daddy). I was one, but i kept an eye on 3 competitors who were also in the supplementals. 4 other competitors were not. Incidentally, myself and the 3 other guys in supp hell did not have htaccess in place, i was now moving to another server that supported it. After the pages came back into the index after the BD update, about 2 weeks later, bang! myself and the 3 supp sites dropped quicker than homesick moles. The 4 guys who werent in supp hell survived. Google's little trick to sort the base URL out didnt work, at least not in my case or the 3 other guys.
It was literally one page on my site that got cached with non WWW, but as you say, it trickled down through the site, fed zero PR to the sitemap and voila, i am now staring at the butt end of my competitors.
Tried to get in touch with google, zip. 2 things make google great, the product itself and the sites it lists for users. maybe they could take time to help those who really want to know what they SHOULDNT be doing to filter out the guys who just want to know what they SHOULD be doing to get good rankings.
Yes, which is why on some DCs your sitemap has a PR0 on the non-www despite the non-www page not being listed in Google - because effectively that page now belongs to another site (site as in non-www) and the PR calculation for Google has gone wrong.
Everyone was going blah blah blah about the supplementals but all those sites had a split site problem and the supplemental problem was caused by Canonical url issues.
The Canonical url issues are still there for lots of sites....sigh after so flippin long too.
Still awaiting the Mozilla Googlebot PR calculation etc to see if Google can make any progress.
and after applying the htaccess redirect how long of a time frame are you waiting to see if things are straightened out?
From my experience it's 6 to 12 months before all links that are on the web, are pointed correctly at your absolute URL.
Sometimes it is best to operate on what is best for the website, and not what is best for the website with Google considered in the equation.
While Google seems to be having some issues it is going to fix itself and if you are not taking the steps now to fix things...then when Google does get straight you'll be even farther out of the loop...
By any chance would you happen to know if this "site split" can apply to a situation like domain 1 301'd to domain 3, domain 2 301'd to domain 3. Then domain 3 is 301'd to www. only? I am seeing some really odd characteristics that tell me that Google is showing the www. canonical issues correctly for the site I was having www. issues with, but is getting confusued on a lot of the pages that they see as domain 1, or domain 2 instead of domain 3.
I would agree Dayo. Further problems occur when pages are referenced as relative, because that one non-www page can now reference other pages using non www.
One thing to note is that I do webmsster for a site that was experiencing canonical errors and yet has no supplementals.
Both my non www and www were indexed about 3 months back (shortly before Jagger) Since then, the cache date of the non www page has reverted to March 2005, and a search for the non www page shows a supplemental. None of the inner pages show anymore for a non-www site search.
I have had a 301 in place since update Jagger.
Dayo, you have no idea what you just unearthed!
if anyone wants to sticky me, go for it but in short i think we found 2 major reasons on why we went from number 1 to number 40 overnight. These are not concrete conclusions, but hell, the figures stack up.
Multiple 302 hijacks of the sitemap pages, Dayo thanks for leading me down that road. I know google are sorting this out now, if anyeone has lost rank, take heart, google doesnt follow the redirects any longer. We were being hijacked to pull in punters for their adsense campaign.
Then i got into checking the intitle: command and i always try and look at various other competitors up and down the rankings to formulate a more informed guess. intitle: command showed me that a couple of copy cats recently took my title word for word and slapped it into their pages....so much weight was given to them that running the intitle brings my own homepage (the originator of that title) as number 38. Nice one google, might be a good idea to check domain age as opposed to how many times it appears in a site to give it relevancy points.
These two points along with the canonical issue brings me to the conclusion that we are suffering right now because of three intrinsic erros in the way google 'decides' what is important and what isnt.
I'm outta here, we are good enough for top spot on yahoo and MSN, whatever we get from google now is a bonus but i aint gonna try and keep one step ahead of the copy cats because the billion dollar machine cant do it for me.
Yes, for a 150 page site with no redirect in place from www to non-www there were about 110 non-www pages indexed and about 70 www pages indexed, but half of what was indexed was either showing as URL-only (no title or snippet), or as Supplemental Results, or both.
Within weeks of adding the redirect, ALL of the non-www pages were fully listed and showing a snippet, but it took many months for all of the www pages to be delisted.
A few weeks later dozens of www pages reappeared as Supplemental Results and still remain listed nearly a year later.
PR was originally split all over the place, but I haven't checked it again for many months.
A few people listen to you ;->
>>>I know google are sorting this out now
Little evidence of it.
MC saying that Google are going to fix canonical problems every 3 months and then nothing happening does not mean a fix is going to happen IMO.
Just hope they know what the problem is and what problem it has caused to sites - if the fix is just simply showing one url for a domain.com/www.domain.com and they dont fix the split ranking issues that have resulted from this bug then that is not a fix.
I had a site that was all supplemental and pages started coming back - now I notice that pages are starting to disappear again.
Supplemental problem still not fixed as Canonical issues are not fixed - why cant Google fix the problem that has destroyed there index. Why cant MC/GG gives us progress reports - eg with the Supplemental issue they ask for sites and gave an update on the situation.
|... and found something of interest. |
Dayo, I'm surprised you've only just discovered this.
Some sites without the .htaccess 301 in place have been affected worse than others. (It is my suspicion that sites on Windows servers and FP sites on Apache were the worst affected... but that is largely a guesstimate based on my many sites which are a mixture of all sorts). But it's always been the whole site that was affected by poor SERPs (suggesting whole site was split for PR and link rep).
Of the sites that were affected some who initiated the .htaccess redirect recovered their rankings (you must know of EFV's posts, at least). Some sites that did not implement the redirect also recovered their ranking i.e. Google was improving some canonical problems even before BD - Mattcutts may be an example but I see examples among big known, well linked sites. However, since BD there have been a lot more sites that have recovered from the "whole site" malaise of non-redirect dup content affecting their SERPs position. Or so it seems.
While I appreciate it may be frustrating for the owners of those sites that haven't recovered I have to, for once, agree that Google seems to at last be doing something about this problem. And I wish they get around to your site/s soon.
>>>Dayo, I'm surprised you've only just discovered this.
Yay, I was probably looking at it slightly the wrong way. EG I thought that as I only had the non-www homepage listed in Google I assumed that it was just that one page which had the problem.
I therefore thought as the homepage was effected the rest of the site suffered rather than all pages within the site have been split.
>>>While I appreciate it may be frustrating for the owners of those sites that haven't recovered I have to, for once, agree that Google seems to at last be doing something about this problem. And I wish they get around to your site/s soon.
This has been the most frustrating thing I have ever experienced - I really do hope a fix is forthcoming for all sites.
If Google had banned my site (I would have disagreed) but at least I would know where I stood - all of this, yes we are going to fix it message coming out of G every few months with no fix (for still a large number of sites IMO) has made it even more frustrating - it is worse than a penalty.
Google has canonicalized to: www.mattcutts.com/blog/2005/10/
I do not see both versions (www and non www) listed. Only one version of the page exist in the index from what I can tell. If the non www version has no PR value and the www version is assigned PR, then how is this a canonical problem?
Sorry if I do not understand your logic Dayo or if I am just not getting it
Google still treats it as a seperate page that is why it is a split site/canonical problem.
If you do a link:mattcutts.com/blog/2005/10/ or cache:mattcutts.com/blog/2005/10/ it should return the www results if the page was truly Canonicalized. Try it with correctly Canonicalized pages from other sites.
It should also show the www PR if it was correctly Canonicalized - the page acts as an uncrawled page from the non-www part of MC site.
Anyone done a header check lately?
I do wonder if they are even interested in trying to fix this issue?
They are not even trying by the looks of it.
Dayo_UK, regarding the suspected duplicate problem for the affected site, I've just gone through and compared the supplemental pages to the non supplementals.
They have updated again, and my site is down to its correct number <1000 so now I can see what they have put into supplemental and if there are duplicates between domain.com and www.domain.com.
There are no duplicate URLs. The PR of www.domain.com is the same as the PR of domain.com. So I don't believe its a split domain problem.
What is clear though, is they have many of my main product pages in the supplemental index. The more product variations I have in that range, the more likely it is to go into supplementary, with the top product range completely unsearchable in Google.
I think it was a mistake to reduce our pages last year for Google Search, we even had to remove the landing pages for Froogle since they contained a duplicate of each product, laid out especially for Froogle visitors.
It gutted our online business without any counter benefit from Google. I think we'll go back to splitting the ranges across pages again so each product can be searched for separately on MSN, and repeating the seasonal items on their own seasonal pages. It makes for a good search result, even if Google thinks otherwise and we can do without the last 5 visitors a day we get from Google.
|There are no duplicate URLs. The PR of www.domain.com is the same as the PR of domain.com. So I don't believe its a split domain problem. |
To further complicate the issue - and on some sites only - Google shows PR correctly but splits the link reputation. Or so I believe. :(
"It makes for a good search result, even if Google thinks otherwise and we can do without the last 5 visitors a day we get from Google." panlus
So true and easily forgotten
Google sends about 50% of the traffic to website.
Anyone know where the other 50% comes from?
And there is nothing to say that Google sends a better targeted buyer. Most often PPC conversion shows Yahoo converts to sales at a higher percentage than Google.
One of the most popular spammed out categories online many would think Google to be the most loved but actually it is Yahoo traffic the webmasters thrive for.
I've seen a couple of posts reference sites that explain how to do a 301 from a non-www to www, but have seen no reverse instructions. Is it possible to 301 so that all www pages redirect to non-www pages? (Sorry if it should be obvious, but it appeared not to work in reverse from the instructions I saw)
Is purposely redirecting to non-www considered bad form? I have always disliked those pointless and space wasting w's and their requisite period in the URL (www.) so for 7 years I've left them out of all addresses in articles I've written and distributed, email sigs, forum posts, and marketing materials.
Now I have a half dozen sites that have never used the www. in the URL and want to keep it that way. Can anyone point to a quick how-to on redirecting to non-www? Is it possible? Is it wise? My sites didn't suffer in BD - one of them improved.
Do a search on Apache rewrite. Many ways to do many things you may not realize.
Next word of advice.
Don't fix what isn't broken
To redirect the other way, using instructions in the .htaccess file, just add www where there was none, and take out the www where it existed before.
Ok, so a question for the forum. I'm seriously getting hurt by this cannonical issue within google and Dayo's post made me realize just how big an issue it is.
I lost all G rank back on Dec 27th'ish and regained it on March 7th, to lose it all again last week. Seemed like G had figured it out for a short while but regressed.
I have been haste to implement 301's from the non-www to www because I didn't want to give in to the problem G was having, since it wasn't a problem for my site.
But in the end, they have the traffic I need. Should I 301 my non-www directory to www?
In the long run Phil it would be best.
How do most inbound links to your site look?
By pointing to the www you will align all links out there to the absolute and settle the PR.
Plus don't do 301s use Apache Rewrite Rules/Condition
Hope this helps