Welcome to WebmasterWorld Guest from 18.204.48.199

Forum Moderators: open

Message Too Old, No Replies

Search engine support for national characters.

Engine dependent, but how and where? Is it a problem?

     
3:44 pm on Feb 13, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2000
posts:1477
votes: 0


Asmodean has raised an interesting question in this discussion [webmasterworld.com] in the Keywords forum. Should one write keywords with national charachters in their original form as typed from the keyboard or as named or numbered HTML charachter entities?

or Ø are letters common in Germanic languages. can be written as either Ouml or #214 and Ø as Oslash or #216 - in each case the entities should start with & and end with ; to reach the desired effect. In French ë is fairly common, which can be written euml or #235 and in Spanish ñ is often used, i.e. ntilde or #241. And so on. There are over 200 of these and those commonly used in languages are supported by major browsers.

But what about the search engines? We have all seen search replies where the engine appears not to have supported the character typed. As far as I have seen, any decent html editor will save the letter as typed from the keyboard if it is in <HEAD>, but convert it to an HTML character entity if it is in <BODY>. But not all engines seem to support this, so a keyword that works in one engine, may be completely lost in another.

Is this a big problem? Is it something that we should start paying attention to in the European forum? Perhaps even start noting the degree of foreign language support for each engine, and provide concrete advise on how to tackle the problem engine by engine? Your input is hereby invited.

6:33 pm on Feb 13, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 25, 2000
posts:663
votes: 0


Further to this question (and this isn't a SE issue) - is there really any reason any more to continue to use character entities instead of the html equivalents - I mean what browsers these days can't read etc?

(I just did a quick check on voila.fr and yahoo.fr , the two most important search tools in France - both seem to just strip out any accents. In other words, if you search for lve, or eleve, you get the exact same results.)

7:22 pm on Feb 13, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2000
posts:1477
votes: 0


Interesting point. I think that all really good SE:s have been stripping the letters for a long time. Checked Vsters and Vasteras years ago in Altavista and made the same discovery as you relate above, i.e. same results page.

But, sometimes the meaning may change. Example: sude and suede, which will give different results on engines that index both French and English language pages if no language preference has been set.

On a more humorous note: The Swedish community of Mnsters (pronounced meunsterause) is demanding vigorously that national characters be allowed in domain names. They don't relish their url www.monsteras.se since it means 'monster cadavre' in Swedish and feel very discriminated.

7:52 pm on Feb 13, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 25, 2000
posts:663
votes: 0


:-) !