homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

Should I encode ampersands in my canonical tags?

 5:29 pm on Feb 22, 2013 (gmt 0)

Hi Everyone, hope this isn't the wrong place to post this.

I looked but can't find a straight answer on this (closest I found is [webmasterworld.com...] but seems incomplete)

Should my canonical tags have encoded ampsersands in them?

I thought we were supposed to, like we're supposed to in xml sitemaps, but on Google's support page itself, they don't:

<link rel="canonical" href="http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35653" />

<link rel="canonical" href="http://www.example.com/blah?hl=en& a m p;answer=35653" /> (showing "& a m p ;" with spaces so it renders)

Would using encoded characters be wrong? would it cause problems?



 9:48 pm on Feb 22, 2013 (gmt 0)

It's not because google doesn't do the right thing that they are correct ;-)

Seriouly: you must encode & as &amp; in html (and in xml).


 11:07 pm on Feb 22, 2013 (gmt 0)

In this specific case it will work either way. Try it:

(raw &) [support.google.com...]

(encoded &amp;) [support.google.com...]

Encode it anyway. "The right way" or "a good habit" does not always translate to "the only way that will work".


 12:36 am on Feb 23, 2013 (gmt 0)

In all html (also in xhtml5) except for HTML5 you *have* to encode a content &.
Standards ...

HTML5 allows the author to write worse code than it needs to allow. So you can get away with it in that standard provided the characters following the & do not look like an htmlentity - and since HTML5 is now a "living" standard: you do not know what html entities they will invent in the future. So you cannot guarantee that in the future it will not start to "look like" an htmlentity.

A useless thing for lazy authors IMHO. - But HTML5 is stuffed with that kind of thing.

So not encoding every content & in an html document as &amp; is a mistake IMHO - of equal proportion to using < or > in the content that's not encoded as &lt; or &gt; .

Regardless of standards, browsers can recover from the error in many cases but let's assume you write:
<a href="http://www.example.com/file?a=1&copy=2">
It's an error, but which did you intend
<a href="http://www.example.com/file?a=1&copy;=2"> (insert a copyright sign, that's missing the semicolon ?)
<a href="http://www.example.com/file?a=1&amp;copy=2"> (an unescaped ampersand?)

Validators hopefully will continue to flag it as errors.


 9:09 pm on Feb 23, 2013 (gmt 0)

swa66: thanks for posting that.

It's such knowledge sharing that keeps me coming back to read years after the surge of new SEO focused websites that appeared after WW opened its doors.


 12:29 am on Feb 24, 2013 (gmt 0)

insert a copyright sign, that's missing the semicolon ?

Matter of fact, that element of browser helpfulness has annoyed me for a long time. Is the trailing semicolon required or isn't it? If it isn't required, why use it? If it is required-- which makes far more sense because how else would you know when the entity is finished?-- then for pity's sake require it already :)


 1:03 am on Feb 24, 2013 (gmt 0)

Browser helpfulness is only perpetuating lazy author's errors. Unfortunately HTML5 has "approved" that as the right way instead of outlawing it one and for all.
(except for xhtml5)

One of the main reasons I aim for valid xhtml1 in the past and polyglot xhtml5 nowadays. The bigger reason is to have the xml toolset to automate things if/when I need them.


 5:37 pm on Feb 26, 2013 (gmt 0)

Thanks for the comprehensive replies everyone, very helpful.

Why is it a browser renders the & a m p ; URLs when clicked in a link, but not when pasted into an address bar? That's what caused the confusion in the first place.

I'm going to guess that it's because you're not meant to paste HTML code into an address bar... and expect anything :)


 10:42 pm on Feb 26, 2013 (gmt 0)

Your guess is actually quite right.

In HTML, you are supposed to (exceptions aside) to encode any & as &amp; . SO in a <a href=""> or so your encoding works as the rbowser knows it's reading html and will decode the &amp; to & and then use it.

In your address bar: there's no html, so no decoding of htmlentities.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved