Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonical URL trouble - why does it only affect some sites?

         

MadeWillis

3:51 pm on Feb 28, 2008 (gmt 0)

10+ Year Member



I read a lot about this in older threads, but couldn't really find a good answer to my question. Why does having a single canonical URL oly affect some sites? Is it soley based on IBLs to the site (some with www and other without)? I assume this is the same for internal linking.

- We currently do not redirect example.com to www.example.com.
- We currently use a relative internal linking structure.
- We do not use subdomains, but may in the future.
- Current TBPR shows the same for both www and non-www version.

Should we still consider implementing the 301 redirect? I've always followed the rule "If it ain't broke don't fix it". Could we see any improvements by making the change even if it doesn't appear we are being hurt in any way for our current method?

If we were to implement this and down the road decide to add a subdomain, what complications would occur? Would we need to swith our internal links to absolute links? What else should be considered?

Thanks

tedster

7:30 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has been working for several years to counteract various types of canonical problems, with varying degrees of success. For instance, it has become less common to see both the with-www and the no-www versions of a url in a site: operator search. Likewise the index.html versus domain root issue, the http versus https issue and so on.

But to say that this means canoncial issues are not affecting any particular "unfixed" site requires a leap of faith that I am not willing to make. We cannot see Google's map of web links to be sure that PR and link juice are flowing as we intended. We cannot verify that canonical issues are not splitting up the potential of our urls into "separate piles".

The best practice is still to give Google (and all search engines) a helping hand with the 301 redirect fix. It leaves no ambiguity, no wiggle room as to what you intend. The technology of web urls does open up this ambiguity and it's not a trivial issue for a search engine to fix on its own.

If we were to implement this and down the road decide to add a subdomain, what complications would occur?

No compications or differences that I can see. With or without the canonical fix, links from the root domain to the subdomain will still need to link to http://subdomain.example.com -- that is, there's no such thing as a relative link from the root domain to a subdomain, or vice versa. But do make sure that www.subdomain.example.com (with www) does not also resolve with 200 status, but either 301 redirects or returns a 404.

MadeWillis

8:38 pm on Feb 28, 2008 (gmt 0)

10+ Year Member



Thanks for the great response tedster.

If I were setup the 301 redirect to the canonical URL would this need to be done site-wide? In other words, would every page on our site need to redirect to the canonical version of that URL? or would a simple line of code do the trick?

Currently we are on a Windows server running IIS & Coldfusion, but will be moving to Linux server with Apache.

Do you foresee any problems doing this before the switch? or Should I wait to try an avoid any other problems? Time to complete this project could play a role in which I choose.

tedster

9:12 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On either server, you can set up a sitewide "page by page" redirect with a few simple steps. On Apache it takes just a few lines in .htacess and on IIS you set it up in Internet Services Manager.

Here's an excellent thread, no matter which server you are using:
Why Does Google Treat "www" & "no-www" As Different? [webmasterworld.com]

Also, jdMorgan recently posted an awesome set of instructions in our Apache Forum to fix all manner of issues, including the www canoncial issue: A guide to fixing duplicate content & URL issues on Apache [webmasterworld.com]

In general, whatever url issues you run into, Apache is often a lot easier to work with than Windows, so your move sounds like a good plan to me. For anyone who is running Windows IIS, the canonical problem can be fixed like this:

"non-www" to "with-www"
Permanent 301 redirect in IIS

1. In Internet Services Manager, set up both www.example.com
(with-www) and example.com (no-www) as websites.

2. Select the example.com website (no-www) in Internet Services
Manager and go into the properties.

3. In the Home Directory tab, change the option button
"When connecting to this resource the content should
come from" to be "A redirection to a URL".

4. In the "Redirect to" box, enter http://www.example.com$S$Q
(A note about the variables used here:
$S retains the requested URL's full filepath
$Q retains any query string present in the request.)

5. Check the checkbox that says "A permanent redirection
for this resource." This is a key step, or else you will
create a 302 redirect rather than a 301.

However, other kinds of URL issues on IIS can be a real bear - unless you install a 3d party module like ISAPI Rewrite, which essentially gives you Apache's .htaccss functionality, but on a Windows server.

MadeWillis

9:21 pm on Feb 28, 2008 (gmt 0)

10+ Year Member



This helps a lot, thanks!

youfoundjake

3:16 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



However, other kinds of URL issues on IIS can be a real bear - unless you install a 3d party module like ISAPI Rewrite, which essentially gives you Apache's .htaccss functionality, but on a Windows server

Particularly the redirect from default.aspx to / can cause an infinite loop which does a website no good.
Outside of the ISAPI Rewrite which I understand to be a good tool (but don't use since I'm on Apache), there are a couple of places that I have seen a 301 redirect written in VB, one for non-www to www, and the other for www to non-www, which ever you prefer.

tedster

4:19 am on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, if you don't have admin access, you can accomplish the same thing with vbscript, but you must make sure to call that script every time on every page (many more possible points of failure):

[webmasterworld.com...] - see the 6th post

MadeWillis

6:45 pm on Apr 23, 2008 (gmt 0)

10+ Year Member



tedster,

I have implemented the IIS fix you mentioned above. If I am on my homepage the redirect works great.

example.com redirects to www.example.com/

I am having a slight problem with my category pages. My problem is this

I am using the Helicon Isapi Rewrite which happens before the redirect to the cononical URL. So if I am on this category page

http://www.example.com/some-URL-rewrite-text-here/results/?categoryID=5

and I remove the www and hit enter, I get this as a result

http://www.example.com/match.cfm?categoryID=5

So now I have a duplicate URL problem. Does the URL rewrite need to happen after the canonical URL redirect? If so, using IIS how can I make this happen?

There is a check box that says “The exact URL entered above”. I have checked this box.

On a different site, which I’d like to implement the same fix and uses the same URL rewrite, I have left the check box unchecked. This causes a double slash like this

example.com redirects to www.example.com//

However, my category pages look correct and I do not have any problems with the URL rewrite when I leave the check box unchecked.

Any idea what is causing these problems?

tedster

7:02 pm on Apr 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sounds like a configuration problem in IIS - are you using $S to retain the requested URL's full filepath (step 4 above)?

If you haven't already done so, I'd also suggest uprading to version 3.0 of ISAPI Rewrite. It uses the same syntax as Apache's .htaccess and that makes everything a lot easier. You can do your canonical fixes through the ISAPI Rewrite step and not mess with the Information Services Manager settings in IIS.

Even with version 2.0, you can execute all kind of canonical fixes without going into the IIS admin panel, however not using the htaccess syntax. See this thread for example code: ISAPI Rewrite 2.0 [webmasterworld.com]

MadeWillis

7:34 pm on Apr 23, 2008 (gmt 0)

10+ Year Member



Thanks tedster. I think we've actually made the fix using the ISAPI rewrite. The only issues we may have now is a 301 to a 301. This would only occur if someone was directed to the non-www and had different text where the URL rewrite was happening. For example

good URL
http://www.example.com/the-correct-text-here/results/?categoryID=5

a URL like this

http://example.com/the-correct-text-here/results/?categoryID=5

redirects to the proper URL with a single 301. What if they have this:

http://example.com/some-other-incorrect-text-here/results/?categoryID=5

In this case, they would get a 301 to the with-www version and also a 301 to the correct URL rewrite. Will this cause any issues? Since there are two redirects going on what will Google think?

I don't think this situation is likely to happen, but I do want to have the fix in place just in case. Should I do this differently?

[edited by: MadeWillis at 7:35 pm (utc) on April 23, 2008]

tedster

8:08 pm on Apr 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Shouldn't they end up with a 404 in that case?

g1smd

12:15 am on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirects should come before rewrites, and you should try to avoid having any sort of redirection chain, where the output of one redirect becomes the input of another redirect.

There's much more in the Apache forum, once you get there. I run away, very very fast, from IIS stuff.

jdMorgan

1:43 am on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The same principles hold true on ISS and Apache: First do your external redirects, starting with the most-specific URL patterns and ending with the least-specific, then do your internal rewrites, again starting with the most-specific and ending with the least.

In this case, the more-specific rule to redirect example.com/incorrect-text-here/results/?categoryID=5 to www.example.com/correct-text-here/results/?categoryID=5 should come first, before the rule that redirects example.com/correct-text-here/results/?categoryID=5 to www.example.com/correct-text-here/results/?categoryID=5.

In other words, a 'more-specific pattern' is associated with a 'more-incorrect URL' -- the more flawed a requested URL is, the sooner it should handled in your rules. The pattern in the rule is more specific because it is seeking to detect and correct multiple errors.

Jim

MadeWillis

2:42 pm on Apr 28, 2008 (gmt 0)

10+ Year Member



Thanks, all, for the tips!

trader

3:02 pm on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



With all my websites the first thing I do is make an .htaccess file like the example. It appears to have helped my sites nicely since I started implementing it some time ago. In particular I like the effect of concentrating my SEO work on the non-www and by not using the WWW it also makes the URL's visually more appealing and shorter, plus more concise anchor text for links, etc.

This error code method and my 404 error code handling in particular is probably not the official or recommended way to go and likely does not follow strict webmaster guidelines but for me it is the easiest way to handle it. That's especially so if you do not know what your error code traffic is looking for or the names of the missing pages (which is usually my situation), or know if you are even getting any 404 traffic.

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.example\.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

ErrorDocument 401 /index.html
ErrorDocument 400 /index.html
ErrorDocument 404 /index.html
ErrorDocument 402 /index.html

g1smd

6:38 pm on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** With all my websites the first thing I do is make an .htaccess file like the example. ***

That's often the first thing that I work on.

*** likely does not follow strict webmaster guidelines ***

Worse than that, I'd say having an error page that looks exactly the same as your root index page is an accident waiting to happen.

trader

11:11 pm on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi G1, what is wrong with doing that, especially since I am referring to using it mostly with new websites (which may or may not even have much error code traffic anyway)?

BTW, I already confessed it may not be the correct way and also may not follow webmaster quality guidelines but for me it works very nicely.

P.S. What kind of accident could happen?

g1smd

12:41 am on Apr 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google tests URLs that don't exist, specifically to find out various things about your site's 404 response.

I just don't think it looks good that "root" and "404" are identical. I can envision several things going wrong.

trader

3:29 am on Apr 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you think G is somehow penalizing sites which fail to use a correct 404 error page? Does G police websites to make sure they follow proper webmaster quality coding procedures?

If that is being actively done I will of course quickly revaluate this issue and take steps to do it correctly even though it will involve extra time and work.

In fact, I can start working on new htaccess files now so was wondering if you have an example of a good 404 webpage I could copy, with modifications. I am looking for a real good one as far as managing to keep the visitor so he proceeds to the website after reading the error message. The 100% visitor retention was one of the major benefits I liked about the index page forwarding.

tedster

4:47 am on Apr 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google actively tests 404 handling and will not let a site verify their Webmaster Tools account if they fail the tests. I can't say for sure what all the repurcussions may be, but it's clear that inadequate error handling makes indexing a lot more problematic.

trader

1:25 am on Apr 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for all your feedback g1smd and tedster, most appreciated.

I now realize the error of my ways in the past involving the error code redirect to index page when it should be going to a 404 page.

Now I am in the process of fixing all the old ones I did wrong in favor of a real 404 page using the htaccess file I earlier posted.

I managed to make a fairly good one after getting some very good ideas from how Yahoo does it, i.e. [yahoo.com...]

Thanks again as I am on the road to doing things the right way thanks to this forum and the members who posted about this subject both in this thread and others.

trader

3:23 pm on Apr 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you clicked the Yahoo link above you would see how after 10-sec they automatically redirect the 404 traffic to their home-page. Is that OK to do as far as proper webmaster procedures go?

Since it kicks-in after 10-seconds (possibly non-sufficent time to read the 404 page and do something else) I don't see much difference in doing that vs what I was doing myself when I posted my old htaccess code. Any comments?

tedster

5:14 pm on Apr 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As long as the first page actually shows a 404 status in the server header (and Yahoo does) that appraoch should be fine. Content that has a 404 status will not be indexed.

trader

11:28 pm on May 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks tedster, I finally figured it out thanks to you and g1smd so now my 404 pages confirm a 404 status when checked. In fact, I am really excited by my new 404 page which I designed in-part inspired by the interesting way Yahoo does it, combined with feedback here.

Now hard at work changing lots of sites since I am doing this the correct way instead of the 404 traffic going to my home-page as in the past. BTW, I do have several sites which get a lot of 404 traffic and others with lesser amounts.

One question I have (which I apologize for probably not being too important) is that assuming meta tags, title, description (and a bit of error applicable content and links) on my new 404 pages are SEO friendly as far as being relevant to site content and keywords go is it possible the 404 pages could rank in the SE's and add an optimized SE indexed page?

I realize SEO is not a consideration when setting up 404 pages but just wondering about any possible side-benefit.

Robert Charlton

3:07 am on May 2, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I always have meta robots noindex,nofollow on my custom 404 pages, with a title something like "404 File Not Found." I don't think you should want it to rank for anything.

Make sure there's a non-technical message about why a visitor might have reached this page, and include prominent navigation on the page to get the visitor to the main parts of the site.

Include an email link or a form for visitors to report broken links.

Also, use absolute urls in all of your custom404 navigation.

PS... In fact, absolute paths to images, etc, as well.

[edited by: Robert_Charlton at 3:09 am (utc) on May 2, 2008]