Canonical page, relative vs absolute internal linking - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonical page, relative vs absolute internal linking

matt621

6:40 am on Sep 18, 2014 (gmt 0)

10+ Year Member

Google says:

-----------------

Indicate the preferred URL with the rel="canonical" link element

Suppose you want http://blog.example.com/dresses/green-dresses-are-awesome/ to be the preferred URL, even though a variety of URLs can access this content. You can indicate this to search engines as follows:

Mark up the canonical page and any other variants with a rel="canonical" link element.
Add a <link> element with the attribute rel="canonical" to the <head> section of these pages:

<link rel="canonical" href="http://blog.example.com/dresses/green-dresses-are-awesome" />

This indicates the preferred URL to use to access the green dress post, so that the search results will be more likely to show users that URL structure. (Note: We attempt to respect this, but cannot guarantee this in all cases.)

Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element.

Use this structure: http://www.example.com/dresses/green/greendresss.html
Not this structure: /dresses/green/greendress.html).

at [support.google.com...]

--------------------------------------

But I'm confused. I do not understand what a canonical page is and are they saying we have to put that html in the body or header of every page? For our shopping cart I think that's impossible. I'm not sure if it's possible anyway.

Google clearly says not to use relative links. This is really bad because it makes maintaining the site much harder and the portability of the code much more restrictive.

Can any one shed some light on this issue of "canonical page?"

Are they saying to put a this code:

<link rel="canonical" href="http://my domain.com/somepage.html" />

in the page somepage.html on our site "mydomain.com?

and if so should it go in the head or body?

And do that for every page on our site?

Thanks

aakk9999

8:45 am on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Lets take your questions one by one:

I do not understand what a canonical page is

If the same page content can be accessed using different URLs, then a canonical page refers to the ("main") URL you wish to appear in index. Here are two examples:

1) 10 products on the page, sorted by price hi->lo and lo->hi. The sort criteria is a parameter in URL, e.g. www.example.com/products.php?sort=hilo www.example.com/products.php?sort=lohi In this case the content is the same, it is just arranged a bit differently on the page. You would pick one page to be the one that is "canonical" (the URL you wish to index)

2) A page where an additional parameter can be added that has no impact on the content. It happens often with CMS systems. For example: www.example.com/somepage.php?param1=a&some-unwanted-param=b In this case the canonical URL would be the one without the second parameter as it is irrelevant: www.example.com/somepage.php?param1=a

are they saying we have to put that html in the body or header of every page

It must go to <head> section. If it is in the body section OR if Google encounters two declarations of canonical link element, it will disregard it completely on this page.

For our shopping cart I think that's impossible. I'm not sure if it's possible anyway.

This depends on the shopping cart solution. Often this would be done by a shopping cart software automatically. If you use URL Rewrite (Search Engine Friendly URLs), then this would normally be your canonical URL. Alternatively, there would be a field on your product page where you would enter canonical URL. The best is to discuss it with your shopping cart provider.

Google clearly says not to use relative links. This is really bad because it makes maintaining the site much harder and the portability of the code much more restrictive.

The canonical link element should not automatically set domain name to the current domain. In fact, ideally, there should be a parameter in the database somewhere to specify which domain name to use when creating canonical link element. In this way, if your test site leaks (google gains access to it), then your canonical link element, which would point to your live site, should ensure there is no content duplication. The portability of the site should not be a problem for canonical if implemented properly.

Can any one shed some light on this issue of "canonical page?"

Are they saying to put a this code:

<link rel="canonical" href="http://mydomain.com/somepage.html" />

in the page somepage.html on our site "mydomain.com?

Yes, and it should go to <head> section, and I normally put it high up in <head> section.

WARNING: Incorrectly implemented canonical link element can tank the site. You need to fully understand what you are doing before you implement it on your website.

It is also worthwhile reading this thread:

Common Mistakes With rel=canonical
Apr 10, 2013
http://www.webmasterworld.com/google/4563444.htm [webmasterworld.com]

Atomic

3:20 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I have another canonical tag and absolute linking question regarding a site I have begun working on.

This company uses hundreds of domains as landing pages for various campaigns. I recently discovered that each domain, some of which were only created to hold a single landing page, have a complete copy of the companies main website due to a CMS error. The result is hundreds of copies of the entire website.

I was wondering what the best course of action would be in regards to the canonical tags and internal links.

Is it better to have canonical tags taking the form:

1. <link rel="canonical" href="http://example.com/directory/page1" />

or

2. <link rel="canonical" href="http://domainvariable.com/directory/page1" />

And which strategy would be best for internal links, say from main navigation?

1. <a href="//example.com/directory">Directory</a>

or

2. <a href="//domainvariable.com/directory">Directory</a>

matt621

5:28 pm on Sep 18, 2014 (gmt 0)

10+ Year Member

To AAKK,

Thanks. That does help. I will read the link. I do have to say this whole idea is counter to the basic premise of the internet. To be globally accessible not matter what. I know I'm screaming into the hurricane, but google is making the internet less useable, not more with these restrictions.

To Atomic, what is // ? I am not away of that syntax.

Thanks

Atomic

5:49 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

@matt621
// is protocol relative which means the link will work if it's http or https

matt621

6:03 pm on Sep 18, 2014 (gmt 0)

10+ Year Member

Wow. Thanks much. So much to know. That really helps and pretty much solves my problem. We are transitioning over to all ssl pages and that would take care of a lot of issues.

I'm going to try it. :)

Robert Charlton

6:43 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

I'm going to try it.

I think I'd wait for more feedback on the protocol relative url before trying it. I understand that there can be some problems with them.

For general reference on relative vs absolute, take a look at this thread....

Internal Linking - Better to Use Absolute or Relative Links?
http://www.webmasterworld.com/google/4186860.htm [webmasterworld.com]

matt621

7:06 pm on Sep 18, 2014 (gmt 0)

10+ Year Member

Then I recommend that you make things as simple as possible for spiders. I recommend absolute links instead of relative links, because there's less chance for a spider (not just Google, but any spider) to get confused.

(from the other thread)

This illustrates my point.

Google says to make it easier for their spider and other spiders. But we make webpages for PEOPLE not spiders. What's happening is that people are chasing the train when they should be focused on the people.

Yes, I know, reality. Google... serps, traffic. I get that. But my point is that GOOGLE needs to change not us. Google says it's all about the User experience. And I agree. So why then is google telling everyone we need to compromise the site for people for the sake of their spider?

Google is the one that has the problem. Everyone of these posts from GG or others from w/in google or other SE, start with "sometimes spiders have problems..." etc. That's their problem not ours. We are creating websites and pages for people and relative urls make for a more consistent user experience and fewer errors.

Okay, I'm done. I'll go back to updating my links to kiss the kings ring.

matt621

7:35 pm on Sep 18, 2014 (gmt 0)

10+ Year Member

I think I'd wait for more feedback on the protocol relative url before trying it. I understand that there can be some problems with them.

it's never easy is it? ;-)

how do you even search for that?

aakk9999

7:39 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

@matt621
I would do the following:

a) Leave internal links root relative. This would allow it to work on your test domain(s), i.e. allowing portability.

b) rel=canonical should use absolute links, including protocol. http/https is still duplication, hence it is better to specify the protocol for canonical. There will be no problem running the same code on different domains, the role of canonical tells Google which URL you prefer Google to index if there is the same content on several URLs. It also tells Google to combine all linking juice from all these other URLs onto the your preferred (canonical URL).

There is another option and this is not to use canonical link element and leave Google to decide which one to index. Google will pick out one and will filter out others. But it will not combine link juice.

@Atomic

Since you are creating a campaign landing page on a different domain then, if this landing page has equivalent page on the main domain, you should implement canonical to point to the equivalent page on the main domain and all other "internal" links should go to your main site (they are not internal links in that case). As it is a campaign landing page, it is probably not meant to be in Google index anyway.

On your main domain I would use root relative links for internal linking as it is easier for portability between test site and production, or you may use absolute links, inclusive of protocol, but then you must be careful when moving the code to production.

Canonical link element should use protocol and domain name of the main domain. This would ensure that if there is some URL leak to Google, the canonical link element takes care of it.

Atomic

7:55 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Thank you aak999! This is exactly how we decided to deal with this issue. Even so, I've been worried Google might discover the site copies and think it was part of some linking scheme.

You have put my mind at ease.

matt621

8:12 pm on Sep 18, 2014 (gmt 0)

10+ Year Member

Yes, thanks AAKK That helps. :)

aakk9999

8:34 pm on Sep 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

@Atomic

Even so, I've been worried Google might discover the site copies and think it was part of some linking scheme.

If you are worried, you can add <meta name="robots" content="nofollow"> to the <head> section of the campaign site(s) or alternatively add rel="nofollow" to individual links going back to your main site from your campaign site(s). It will do no harm. Just make sure this is only done on your campaign site(s) and not on the main site!

Another possibility is to block the landing page site(s) in robots.txt, but if your setup is such that the main site and the campaign site(s) point to the same webspace then you would have to serve dinamically a different robots.txt depending on which site the request is for. Also, if a coding mistake is made here, it could tank the main site if a wrong robots.txt is served.