Are dingbats-symbols in title element shown in SERPs?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Are dingbats-symbols in title element shown in SERPs?

deeper

11:49 am on Dec 4, 2013 (gmt 0)

Hi,
I have a dingbats-symbol in the title tag of some pages.

Is it true, that Google-user must have installed the dingbats-font in order to see the symbol in the serps? I guess most have not...

Or is this solely dependant from the browser, which is able to show all existing symbols (dingbats, wingdings..), so that every user should see the symbol in the serps?

I know that Google may not show the symbol, but at first I'd like to clear this question.

deeper

3:07 pm on Dec 4, 2013 (gmt 0)

To be more concrete, for example this could be my title tag:

weather &#10052 hard winter predicted

&#10052 ; is a snowflake

Lame_Wolf

3:10 pm on Dec 4, 2013 (gmt 0)

A number of years ago, I saw three hearts in the SERPS along with text. However, I cannot remember if it was in the title part or the meta data.

ohno

4:20 pm on Dec 4, 2013 (gmt 0)

Well they show &#8482 as TM correctly in the tittle-try it?! By memory it didn't show correctly in the description.

deeper

5:42 pm on Dec 4, 2013 (gmt 0)

At the moment I don't care about Google and if they filter certain symbols.

I'd like to know if a "normal" user with an usual PC and browser is able to see these dingbats-symbols. They are mentioned here at Wikipedia:
[en.wikipedia.org...]

ohno

5:52 pm on Dec 4, 2013 (gmt 0)

Your post makes no sense. You first ask whether they can be seen in the SERP's yet go on to say you don't care about Google?!

tbear

6:24 pm on Dec 4, 2013 (gmt 0)

quote:
Is it true, that Google-user must have installed the dingbats-font in order to see the symbol in the serps? I guess most have not...

I would say, yes, they need to have dingbats installed with fonts.

deeper

8:24 pm on Dec 4, 2013 (gmt 0)

@ohno:
At the moment I don't care about explicit filterung of Google, because they may refuse some symbols.

Of course I care about the Serps and what User will see there, DEPENDANT FROM THEIR PC/BROWSER. That's my question. If many won't see the symbol due to their systems the second question of explicit filterung is irrelevant. Finally both should give me a "go".

JD_Toims

8:46 pm on Dec 4, 2013 (gmt 0)

weather ❄ hard winter predicted

Sure -- I've had a ton of symbols in titles over the years. Not that one specifically, but definitely a large number of special characters / symbols. I also haven't had any instance I can remember with Google not showing them, but who knows where that is right now today or will be tomorrow?

As far as "dependence of display", the user's browser will have to be able to translate the encoding to a matching symbol [meaning it needs to be available within the user's system or a font loaded by the browser] or they'll get something similar to "huh, what?" via the � replacement symbol in most cases.

Generally though, unless you're using something "really non-standard", or have the encoding of the page set to something other than what the encoding of the symbol is so a browser misinterprets what it should try to find, then IME it's not much of an issue these days.

lucy24

9:58 pm on Dec 4, 2013 (gmt 0)

Backtrack. Is there any evidence that google knows what a dingbat is? By default, a dingbat is just another non-ASCII character. Obviously a search engine has no trouble displaying results in Arabic, Hindi, Inuktitut or, for that matter, any European language, whether its diacritics are safely Latin-1 (� � and so on) or something more obscure.

For dingbats to be handled differently, the search engine would have to process non-ASCII characters according to exactly which unicode block they belong to. There is of course no technological reason why they couldn't do this-- but do they? Add in the complications involving Roman-script characters rendered by systems whose base language is in a CJK script. I've personally seen this in logs: high-unicode characters that resolve to the visual equivalent of ABC.

Whether your own computer can display all these extra characters depends entirely on the individual user. The two key elements are whether the operating system has the requisite font installed, and whether the specific application (here generally a browser) uses font substitution appropriately. But none of this is the search engine's problem.

deeper

11:27 pm on Dec 4, 2013 (gmt 0)

@JD_Toims:
Thanks for your experiences. Are dingbats-symbols like the 10052-snowflake "non-standard" and the other (more usual) ones (Wikipedia link) "standard" in your eyes?

You mention the encoding of the page. Do you think UTF-8 is better than 8859-1, increasing chances of showing symbols properly by the browsers of Google-user?

@lucy24:
Due to JD_Toims, Google seems to be willing to show them. Of course you never can be sure, but hey, I just will test if my favourite symbols will be shown and then I know what Google does.

More crucial in my eyes is the OS/browser-equipment of Google-users. This is something I cannot test, but only suppose. How many of them will be able to see a dingbats-symbol like the snowflake, how many a non-dingbats-symbol?
70 and 90%, 90 and 90%, 70 and 70?
90% would give me a "go", 70 not.

lucy24

11:53 pm on Dec 4, 2013 (gmt 0)

Do you think UTF-8 is better than 8859-1, increasing chances of showing symbols properly by the browsers of Google-user?

It should make absolutely no difference. In spite of the term "charset", the encoding of a page has no effect on the characters it is able to display. The stated encoding only applies to how the browser interprets the information in the html file. So make sure the charset declaration, whether in-page or site default, matches the charset you actually used when building the page. If you use entities-- whether named, decimal or hexadecimal-- the encoding has no effect at all, because all entities are ASCII. But entities make the html impossible to read, so I don't advise using them except for certain characters such as spaces and dashes that may not be visible to the naked eye.

&#10052 = hex [statman.info] 2744. Ah, that's why you refer to it as a dingbat. That's the actual name of the unicode range. (This is the one that was lifted entire from Zapf Dingbats, right? Not to be confused with the Wingdings that used to come with Windows.) Font coverage is pretty skimpy. I've got it in:

Arial Unicode
Code2000 (third-party)
DejaVu Sans (also third-party)
Menlo
Quivira (third-party)
unifont
Zapf Dingbats (this sounds like a "Duh!" but isn't, because this is a new unicode font, entirely separate from the dingbats legacy font)

And that's speaking as a font junkie with a Mac. Platform is relevant here, because the Mac OS and also the individual applications have a much longer history of font-friendliness. Even to this day you will sometimes meet mainstream windows programs that don't do font substitution --essential because you can't force the whole page to one of the short list above. Well, you could, but it's not nice. The alternative is putting every occurrence of your special characters into a span with named fonts, which again should absolutely not be necessary.

Menlo sounds like a system font that's also standard on windows, but cursory research tells me I'm mistaken. I used to have a link to a splendid website that had numbers on exactly which fonts were available on which platform-- essential information for stack building-- but the site seems to have disappeared :(

If modern Windows computers include the expanded version of Arial-- and if MSIE has now got a grip on font substitution-- you should be safe.

You may find this useful:
[alanwood.net...]
The site has been around more-or-less forever and tends to have solid information.

JD_Toims

1:38 am on Dec 5, 2013 (gmt 0)

Are dingbats-symbols like the 10052-snowflake "non-standard" and the other (more usual) ones (Wikipedia link) "standard" in your eyes?

I think you're fine with those. It would be "other language or really obscure" characters I would be more concerned about.

It should make absolutely no difference. In spite of the term "charset", the encoding of a page has no effect on the characters it is able to display.

Er, uh, maybe it does?

Content is composed of a sequence of characters. Characters represent letters of the alphabet, punctuation, etc. But content is stored in a computer as a sequence of bytes, which are numeric values. Sometimes more than one byte is used to represent a single character. Like codes used in espionage, the way that the sequence of bytes is converted to characters depends on what key was used to encode the text. In this context, that key is called a character encoding.

--

An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings.

A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages. Its use also eliminates the need for server-side logic to individually determine the character encoding for each page served or each incoming form submission. This significantly reduces the complexity of dealing with a multilingual site or application.

--

Why does the browser still not recognize the encoding?

Let's say, for example, that you saved your data as UTF-8. Although you saved your data in the right encoding, and even if you declared in the page that the page encoding is UTF-8, your server may still be serving the page with an accompanying HTTP header that says it is something else.

http://www.w3.org/International/questions/qa-choosing-encodings

Emphasis Added

Example of this principle at work: say you have θ in your HTML, but the output is in Latin-1 (which, understandably, does not understand Greek), the following process will occur (assuming you've set the encoding correctly using %Core.Encoding):

The Encoder will transform the text from ISO 8859-1 to UTF-8 (note that theta is preserved here since it doesn't actually use any non-ASCII characters): θ
The EntityParser will transform all named and numeric character entities to their corresponding raw UTF-8 equivalents: θ
HTML Purifier processes the code: θ
The Encoder now transforms the text back from UTF-8 to ISO 8859-1. Since Greek is not supported by ISO 8859-1, it will be either ignored or replaced with a question mark: ?

http://htmlpurifier.org/docs/enduser-utf8.html#whyutf8

TL;DR
The encoding declared for a page matters -- Server headers for a page sent to a browser will override on-page encoding declarations, but the "save as" encoding for a page, server headers sent and declared encoding should all be correct to ensure things display the way they should.

The bottom-line: encoding matters wrt correct display of characters used on a page.

Backtrack. Is there any evidence that google knows what a dingbat is?

If you backtrack here far enough [and can find it somehow -- I'm not even going to try] there was a question about the use of non-standard characters in titles, etc. and "someone in the know" [I think it was GoogleGuy] said they would most likely be treated as a separator, much like a "stand-alone" -, ndash, mdash, etc.

lucy24

8:46 am on Dec 5, 2013 (gmt 0)

Since Greek is not supported by ISO 8859-1, it will be either ignored or replaced with a question mark: ?

This applies strictly to forms, where the page might be receiving new material that was not in the original html. I've personally met a site where the only way non-ASCII input will be processed correctly is if you set your browser's encoding explicitly to ISO-Latin-1 even though the characters you are about to type do not exist in this encoding, and even though it breaks the display of existing page content.

But that's user input. It has nothing to do with pages whose text only travels in one direction (from physical file to server to browser).

JD_Toims

9:01 am on Dec 5, 2013 (gmt 0)

It's not only user input [forms]. Look at the bottom of the quote above:

Emphasis Added

Why does the browser still not recognize the encoding?

Let's say, for example, that you saved your data as UTF-8. Although you saved your data in the right encoding, and even if you declared in the page that the page encoding is UTF-8, your server may still be serving the page with an accompanying HTTP header that says it is something else.

The preceding, quoted from the w3 cited above, is definitely *not* related only to form input -- It's talking about the browser not displaying/recognizing things as intended due to the declared encoding of the page presented to a browser.

lucy24

9:54 am on Dec 5, 2013 (gmt 0)

Yes, but now we're no longer talking about the ability to display characters. Now it's about conflicting encoding information, which is an entirely unrelated subject. (One that I seriously disapprove of, because it flies in the face of a basic principle: later declarations override earlier ones and specific overrides general, so information within a page ought to override information from other sources.)

My primary point is that the term "charset" doesn't mean what it sounds like it ought to mean. You can display non-Latin-1 characters on a page with 8859-1 encoding. You just can't use the raw character in the html. If the page source says ઔ, the intended character will be visible in any current browser* that has an appropriate font.

* Where "current" = newer than MSIE 5, and I wouldn't count on MSIE 6,** but I don't know the exact terminus post quem.
** MSIE 6 does not know what &emsp; means. This is really true.

deeper

12:39 pm on Dec 5, 2013 (gmt 0)

@lucy24:
yes, I refer to Zapf dingbats unicode, as shown at [en.wikipedia.org...]
I don't refer to dingbats legacy font or Wingdings.

My question about using UTF-8 is based on the presumption that Google changes everything into UTF-8 (don't ask me deeper, I just read it and I'm not very savvy in technical matters). So I thouhgt, it could be a good idea.

O.K., summarizing all:
-The usual Google-user should be able to see a dingbats-symbol as well as more common symbols, due to his browser and OS.

-Using UTF-8 as charset and saving the whole HTML-docs as UTF-8 is a safe way to tell browsers "apply UTF-8". But it doesn't help in my case, it does not support Google user showing symbols (dingbats or others) with their browser/OS.

Right so far?

lucy24

9:26 pm on Dec 5, 2013 (gmt 0)

Right. If the end user doesn't have a font that contains the required symbol, there is nothing you can do to make the symbol visible in SERPs.

... for a given definition of "nothing", at least. If your page content requires an unusual character, alternatives include

-- showing the character as an image (only viable for things like dingbats that don't combine with other word characters)
-- embedding a font (doesn't work in MSIE < some-number, goes without saying)
-- something involving svg (will only work in up-to-the-minute browsers which will probably have access to the requisite font anyway)
-- at some time in the future, scalable xml entities (right now, don't even try)

Do SERPs display inline images if they come in the middle of a quoted snippet? I strongly think they don't.

If the special characters are "added value" rather than an essential part of the page, you can use fancier alternatives like testing for named fonts via javascript and changing the page accordingly. But, again, this won't do any good in SERPs.

deeper

10:20 pm on Dec 5, 2013 (gmt 0)

Thanks for your efforts.

At the moment I'd like to use symbols only in the title tag and may be description.
Images there? Nice idea, but I even don't dare to test this.

Btw, here is my first experience with a dingbats symbol:
It is shown in the serps in the description, but not in the title tag. As the tested PCs are quite usual ones and ISO8859-1 is sufficient, there is only one conclusion: Google filters it out.

I will test some other symbols.

tangor

6:10 am on Dec 6, 2013 (gmt 0)

I'm more interested in whether Google knows the semantic meaning of the dingbat and can serve it in a search.

ie. Do your users type in the snowflake dingbats to get results on Google? Do they even know how?

Whether G shows the dingbats or not is of no consequence. Google returns Western and non-Western results... not dingbats.

lucy24

8:44 am on Dec 6, 2013 (gmt 0)

It occurred to me belatedly that of course they know what a dingbat is. The knowledge is already codified. Any application that runs regular expression will have a set of toggles: \w vs. \W, \d vs. \D, \p{Alpha} \p{Punct} \p{Greek} \p{Hebrew} and so on. (Exact format depends on your RegEx engine, but the point is that someone has already applied these labels.) Anything that's neither a word character nor a space nor punctuation can safely be considered some type of dingbat.

:: detour to explore ::

I have no idea what \p{S} and \p{So} are, but the x2700 series is both. Or, more narrowly, \p{So}. (\p{S} covers some other types of quasi-punctuation as well.)

deeper

12:37 pm on Dec 6, 2013 (gmt 0)

@tangor:
Certainly no user searches for a snowflake, but it's the same with almost ALL symbols: Usually user don't search for them, but they are shown in the serps.

deeper

4:14 pm on Dec 6, 2013 (gmt 0)

Today I had the opportunity the ask John M�ller at the german webmaster hangout.

He said:
Some symbols are filtered out in the title tag and description, because they don't fit there in our eyes or because we noticed a misuse of them, for example with stars.
What we (not) show changes.

Not encouraging to use them....

lucy24

10:04 pm on Dec 6, 2013 (gmt 0)

they don't fit there in our eyes

That can't possibly mean their individual human eyeballs, so it has to be done numerically by some automated process. Say:

<li>\p{So} *
>>
<li> (only)

:: detour to check something in css3 spec ::

There's a new pseudo-element called ::marker that can be styled. But that wouldn't come out in a snippet, since those use the search engine's generic format. You can also specify the symbols to be used as markers, though again I doubt the search engine would show these. Marker styles and literal text are different things.

* Or, if you prefer, \pSo, \uSo, \pIsSo, and probably a few others that I can't find now.

deeper

10:23 am on Dec 7, 2013 (gmt 0)

It was meant as general automated process, I think.

Regarding browser and OS, do you know if screenreader and mobile devices are able to show symbols?

lucy24

10:14 pm on Dec 7, 2013 (gmt 0)

All I know is that iOS6 added Euphemia (but none of its necessary keyboards). There's a list of fonts in iOS

:: shuffling papers ::

here [support.apple.com]

The list includes Menlo and Zapf Dingbats, covering the specific range you're asking about.

:: further detour to test site with iPad ::

Well, here is a fascinating detail that is probably documented somewhere, but it was news to me. A handful of characters in the x2700 range were undefined in the unicode standard that comes with my desktop, meaning that the Character Viewer doesn't name or display them. (This is part of the operating system, so it doesn't update like applications do.) This group includes 2705 ("white heavy check mark"), 270A ("raised fist"), 270B ("raised hand") and some later ones, and a few more further along in the block.

On my desktop's browser, some (but not all) of these characters come through. But
:: drumroll ::
in iOS7 they display in full color, like wee icons rather than as simple text. This may be useful and entertaining, but means that it is probably not a good idea to use them in contexts where you don't know what OS the user has got.

:: wandering off to unicode dot org for details ::

Thought so. Most of the 2700 range is earlier-- in fact it goes all the way back to 1.1.0 in 1993-- but these specific characters are from unicode standard 6.0.0 dating from October 2010: after my desktop's OS.

This may mean that the iOS folks found these unassigned characters and decided to use them for their own purposes. This is a huge no-no;* it's what Private Use Areas are for. But there has to be more to it, since the full-color versions do match the standard letterforms (checkmark and so on). They're just fancier.

* Except in the specific case of codepoint 1400, where the unicode consortium simply acted like slimeballs.