You mean the non-www to www redirect?
If the site has been around for a long time without it, I would instead use the
rel="canonical" tag here.
Make sure you register both www and non-www URLs in Google webmastertools. Look at the crawl stats and crawl errors reports for both. Likewise look at the internal linking and incoming external links reports. They are separate for www and for non-www.
Yes, I mean the mysite.com to www.mysite.com redirect.
I was going to do something in the htaccess file to do the redirect. Can I use a tag that way? Why use a rel tag?
Regarding adding both www and non-www versions to Google webmastertools, may I ask what the benefit of that would be? Since it is a vbulletin forum, each page is dynamically generated and there should be no difference between the www and non-www pages, both in number and content.
I am quite excited becase mysite.com/forum/ is PR4 and www.mysite.com/forum/ is PR4. So, perhaps if I get this canonical situation sorted out it might go to PR5 or 6?
|Regarding adding both www and non-www versions to Google webmastertools, may I ask what the benefit of that would be? |
Go look at the reports for your site. The answer will be obvious soon enough. :)
Fixing canonical problems from the very first day a site goes live is best done with a redirect. When its for an old site, using a redirect can sometimes mess with the analytics in several ways. The
rel="canonical" tag stops that happening.
Although your site is dynamically generated, you'll find that the same page is pulled by Google on different days for www and for non-www and the revisit rate is completely different. That means they will often appear completely different in the SERPs.
Its hard to measure how much of an effect implementing canonicalization has. I've done it an a couple sites and I'd say the sites gain 5-10% traffic over the next couple months usually.
It can have the biggest impact when googlebot is spending so much time crawling non-canonical pages that it doesn't get to all of your content. I'd guess that 1 million posts is at least 100,000 forum topics. That is a lot of content for a PR4 site. I'd imagine that googlebot's crawl budget isn't enough to regularly crawl all of your content. Increased crawl rates could lead to better indexing in the long tail for you.
I've added the mysite.com version to Webmaster tools now and awaiting the data to become visible.
Am I likely to get a PR boost and/or a jump in all my rankings, due to stronger root authority?
I wouldn't count on it.
I thought that was one of the main reasons for redirecting to www.mysite.com, together with removing any penalty for dupe content. Or am I missing something?
A while back, Google did have lots of trouble with canonical variations. Even today in certain "edge situations" they still can - so attention to canonical issues is definitely a best practice.
But when it comes to common platforms such as WordPress and vBulletin, Google seems to have already adapted. As earlier posts said, some sites may see a 10% boost over time, but others can't even be sure that anything changed.
If both versions of URLs have the same toolbar PageRank, then Google has already got it figured out, in my expwerience. In that case most of the advantage will come from more efficient crawling. If you see toolbar differences, then you may get more of a boost over time by taking care of any canonical issue.
By the way, with-www or no-www is far from the only canonical challenge. Check out Canonical URL Issues - including some new ones [webmasterworld.com] for something like 40 that have been noticed in the wild.
So effectively, you are saying for most sites that even though links might be pointing to a mysite.com version, the www.mysite.com version still gets the credit?
Several years ago, googlebot was not very smart about common cases of non-canonical urls. www vs no-www, index.html, session id parameters, and such caused real crawling and ranking problems. Canonicalization was a solution, and it worked.
I always thought it was silly that googlebot couldn't seem to deal with www and no-www returning the same content. Today, googlebot seems much smarter to me about some of the basic cases.
It also used to be the case that how you linked your site together internally mattered much more. Back in the days when pagerank sculpting worked wonders, canonicalization was a way of getting every last drop of pagerank your site had coming to it. A few years ago, Google implemented some algorithms that made sure that sites that are not well sculpted or have some canonicalization issues aren't at a disadvantage. As a result, it is no longer necessary to canonicalize as much as in the past for ranking boosts.
There are certainly cases in which Googlebot still doesn't identify non-canonical content properly and canonicalization can help. You can usually tell this is the case by looking in your logs and seeing if Googlebot is spending time crawling two versions of the same url.
|You can usually tell this is the case by looking in your logs and seeing if Googlebot is spending time crawling two versions of the same url. |
.. or by looking at the "crawl stats" tools in WMT.
Ok, I have some data to share. Here are the results of comparing mysite.com and www.mysite.com in Google Webmaster Tools.
Links to Your Site
www.mysite.com Not found: 1869
www.mysite.com Soft 404: 459
mysite.com Not found: 0
mysite.com Soft 404: 0
www.mysite.com pages crawled per day: 54,499
mysite.com pages crawled per day: 2,298
One of the first things that comes to mind is the number of incoming links to the site stat. About 99% of them use www.mysite.com. Either that or maybe I need to wait longer for more results to come in?
Also, it seems that I have some crawl error issues restricted to www.mysite.com.
Google is mostly crawling the www version of the site and this informs that you should redirect all non-www requests to the www version of the URL.
The fix here is not on Googles side, it is on your side.
> in the htaccess file to do the redirect.
That is the final answer. Don't even let Google think about anything else. Hard redirect domain.com to www.domain.com regardless of where it came from.
|www.mysite.com Not found: 1869 |
www.mysite.com Soft 404: 459
mysite.com Not found: 0
mysite.com Soft 404: 0
That's an odd package. You'd certainly expect higher numbers for with-www, but not outright zeros on the other side. It makes it seem as if every last link to the "wrong" sitename is present and accounted for. (That's good, if true, but it looks strange.)
How does google identify a soft 404? That is, I know what it is, and I know how (not) to code it. But how can google tell? Especially when, like here, there are plenty of "real" 404s in the mix.
Google counts redirects to the home page as "soft 404" as well as 200 code error pages with nothing but an error message. Usually when there is a mix, the redirects to the home page are causing the soft 404s.
Google also gets soft 404 if you have many pages with very thin but similar content. I have a case of image gallery being a popup in a smaller window, where the popup page html only contains 2 iffames. The first iframe shows the big photo and the second iframe contains thumbnails strip. The page itself is set to noindex (but allows follow) to allow Google to get to big image..
Google is, however, reporting soft 404 for every of pages that contain 2 iframes.