[smh.com.au...]
"At present there are 37 possible characters that can be used in domain names, but if non-English letters are allowed, this number would rise to 50,000 or more, said Twomey."
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Moderator's Note: Folks, let's keep the dialogue far removed from "us versus them" or "English versus any other language or culture". The object of this thread is to raise awareness of the issues that attach to the domain name system status quo and to dialogue about the benefits or problems associated with changing the status quo.
Please, do not interject any version of "us versus them" into thread. WebmasterWorld is NOT an us versus them place. WebmasterWorld is a how do we get things to work for everyone in the world wide webmaster world". (Someday we'll even have language translation software that will make it a bit easier to post in 120+ languages. ;0) )
Thank you. Webwork, Domain Forum Moderator
[edited by: Webwork at 4:43 am (utc) on Nov. 27, 2006]
I really cannot see users losing; Chinese users willsimply see 'their sites' and not western sites, and vice versa. The only potential losers - other than monoploists - are bilingual folk who may want to see every conceivable site - they mayneed new keyboards.
several webs working in parrallel, on the same system, with virtually no overlap is an interesting concept.
English-speaking folk would find it hard to be fooled by a phish they could not read.
How many people would wade through a foreign language spam just to joyfully click on an 'apparently' English language domain name? :)
And I'm not sure how a domain name using non-'western' characters would appear like an English-language name. I thought the point of this thread was the use of other characters, eg African and Chinese. In domain names.
Do any of them look Engish? Can you give an example of how this would work, I'm just not seeing it. :(
[edited by: Quadrille at 3:51 pm (utc) on Nov. 26, 2006]
Surely we are talking about non-English characters; how would 'what looked like an english domain name', look that way using (eg) Chinese characters.
Show me that, and I'm 100% with ICANN, despite their profiteering and control freekery ;)
Sorry if I'm not clear; I've rephrased twice - and it looks a simple question to me!
He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic. This would confuse the system and make it difficult to direct users to the right website every time.Poor implementation of foreign domain names may also pose security risks, whereby fraud artists could create websites with names that appear identical to current English language sites, but in fact replace some of the English characters with similar-looking foreign characters.
I agree that allowing 50,000 chars in domain names will complicate matters emmensely. On the upside, it may force average Internet users to become more savvy about phishing and other issues.
Perhaps one way to deal with the ambiguities would be to force domain registrants to choose a character set for their domain so they cannot mix and match character sets with the intent to deceive. Or at lease it would be easy to have a client-side security alert that says "this domain is mixing character sets, probably with the intent to deceive you." or somesuch.
Of course there is always the problem of subdomains, which are not regulated by high level DNS.
what I don't get is what looked like an english domain name
Än éxāmplč ļs thģs sėnténēč. (OK, an exaggeration, but you get the idea). It affects not just accented western characters - which are the only ones I can use in a post here due to this board's charset - but a wide range of characters which resemble the ASCII characters currently in use.
The issue is called an "IDN homograph attack" or "Homograph spoofing attack" - a homograph being a character which closely resembles another. Some reading matter if you're interested in the details:
I think that no-one should be sticking their head in the sand with this issue - IDNs are an important step in the internationalization of the web. Why should someone who doesn't use the western alphabet have to use ASCII characters for their main identity? What's more, if progres is not made with IDNs, then there will be several markets (eg. China amongst others) who would go forward with their own system, thus fracturing the universal web at a stroke.
Homograph spoofing is a problem, but careful forethought will hopefully lead to a reasonably safe and standardized solution.
Do any of them look Engish? Can you give an example of how this would work, I'm just not seeing it.
[shmoo.com...] Latest browers by "default" trap the issue.
I thougt the issue here was the 'totally different' character sets.
I think that no-one should be sticking their head in the sand with this issue - IDNs are an important step in the internationalization of the web. Why should someone who doesn't use the western alphabet have to use ASCII characters for their main identity? What's more, if progres is not made with IDNs, then there will be several markets (eg. China amongst others) who would go forward with their own system, thus fracturing the universal web at a stroke.
This would be the parallel web usage I referred to in my first post (above). I'm not sure why this would be 'fracturing'; surely it would all still be there - just requiring the relevant browser/pc/keyboard settings?
I don't have them - but I don't need Mandarin!
Why should someone who doesn't use the western alphabet have to use ASCII characters for their main identity?
For the same reason air traffic controllers around the world communicate in English...
But, back to the main "story" --- I don't see any reason that at a lower level or higher level the domain name couldn't be translated with an additional step between any other character set, the current A-Z,0-9 plus hyphen valid domain name character set and numeric IP addressing.
3 step mapping rather than 2 step.
"...He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic."
Looks like and "are" be two different things.
At the code level, every character has a numeric value, e.g.-
Character #0065 = A
Character #0654 = Ä
¨Character #0655 = Å
For that reason, the argument of "confusion" does not hold water since the problem already exists in the current system.
chr(49) = "O" .... chr(57) = "9"
chr(65) = "A" .... chr(90) = "Z"
chr(97) = "a" .... chr(122)= "z"
chr(45) = "-"
Take these two:
A0L.COM vs. AOL.COM
In the example on the left I used a "zero" for the second character, on the right its an upper case letter "o".
G00GLE.COM vs. GOOGLE.COM
YAHOO.COM vs. YAH00.COM
or:
¨¸lycos.com vs. 1ycos.com
(used lower case "L" on left, number "1" on right)...
For the same reason air traffic controllers around the world communicate in English...
While I agree with your other points, this one is a bit off.
Air traffic controllers around the world communicate in English for safety reasons; having a pilot and a controller splitting language while the plane circles a busy runway could be, er, hazardous.
But of the 2,000,000,000 Chinese speakers, it's safe to estimate that approximately 99.9999999% have no need or desire to communicate with a non-Chinese speaker.
And for the 1,026 who do, it would be cheaper to buy them a second pc/browser/keyboard than re-equip (and re-educate) the whole nation.
Don't you think? ;)
It is fashionable to expect the world to fit itself around us and the language of Shakespeare, but I'm not sure the Chinese will play ball this time. :)
[edited by: Quadrille at 1:36 am (utc) on Nov. 27, 2006]
The fact that you may not have come across any may simply be because you search the English web - if you search in Japanese on the Japanese version of Yahoo!, IDN .jp and .com domains appear fairly regularly amongst the results. Large companies (including the largest advertising company in Japan) have started to acquire the IDN of their names to use in parallel with their earlier ascii domains.
It's just an after-the-fact attempt by ICANN to grab a bit of the action.
[edited by: Edwin at 2:21 am (utc) on Nov. 27, 2006]
IDN spoofing in Firefox led to white-listing specific ccTLDs as being IDN enabled [webmasterworld.com] (see also this CNET article [news.com.com]). The Firefox method of handling IDN issues is outlined here:
[mozilla.org...]
The example given in tests regarding homograph spoofing was a Paypal site using a Cyrillic "a" (which I can't display here for technical reasons). With IDN enabled, there was scant difference between the visible URL for paypal.com versus the spoofed alternative. With IDN disabled in Firefox, the spoof domain is displayed as
xn--pypal-4ve.com in the browser's address bar. This is not just an issue of local alphabets and keyboards: the universal web as a concept is that every user can connect to others across the globe without relying on plugins, differing standards, alternative DNS roots, propriatary or closed domain name systems. Even if you only speak one language, there should never be a barrier to, say, emailing someone in a different country whose email address contains non-ASCII characters.
Homograph spoofing is one identifiable issue with IDNs which has had significant coverage, and there has been a lot of effort made into mitigating the risks posed by homographs. But the web should no longer be seen as a majoritarily English-language resource, and IDNs represent a huge step forward into making the web live up to that "universal" tag. ICANN does have a role to play, and as developers I belive we should always push for open standards which address the needs of the world's internet users rather than fixating on a technical issue which can be overcome with careful implementation.
[[ note: I've been watching and reporting on the huge gap between korean and "english web" for years. its STILL shocking how much ignorance there is as to what is going on either side. (again, big case in point.... google has about 0.7% of the market here in Korea, very easily arguably one of the most advanced as well as important markets on the web). ]]
guess what? there is a rEAL WORLD EXAMPLE.
Did you know korea (again, huge #1 broadband internet penetration and usage and crystal ball for much of most recent web trends) has had and been using a KOREAN CHARACTER domain system for YEARS? it works parallel to the "normal" web. (ie: you can buy a regular domain name and a "korean character" one.)
any problems on your end of the internet world? ever receive a Korean domain name spam? if you did....did you know? (and yes, I'd go so far as to say the average Koreans is internet savvy than the ROTW... and spam insanely..to a globally significant degree. so yes, you would have if it was an issue.)
nuff said.
still... you want paranoid? this smells more like english speakers not wanting to "lose control of the web", more than any worries about fraud. the "parallel" the current web is much more accurate analogy than other break the web theories. gosh forbid that the english speaking world need to learn another language to access a big part of the "other web".
so sayeth GrendelKhan{TSU}
Please, do not interject any version of "us versus them" into thread. WebmasterWorld is NOT an us versus them place. WebmasterWorld is a how do we get things to work for everyone in the world wide webmaster world". (Someday we'll even have language translation software that will make it a bit easier to post in 120+ languages. ;0) )
Thank you. Webwork, Domain Forum Moderator
[edited by: Webwork at 4:43 am (utc) on Nov. 27, 2006]
For the past few weeks I've been working on a site for a customer who uses the umlaut, (ü, or ü or simply "ü"), in their product name.
Let's say they have a product called:
Blue Wüdgits
Using only [A-Z,0-9] they registered a name like:
bluewudgets.tld
Now, within their web pages I encode the umlaut using ü thinking it will not trip the "See English Language Only Results", (yes, we are back to SEO)...
Whadda ya think? Do the SE's consider a page containing an encoded non-English language character to be English or not?
A phishing email can use an IDN which appear to be English, but actually consists of non-English characters. The rest of the email would be in English, so unsuspecting email recipients would be at risk of accepting it as legit.
How about a radical solution?
In the future, all email client software will block clicking of links, perhaps won't even highlight the links in emails - i.e. you'll have to retype an URI you get in an email into your browser. (Note the big banks, eBay, PayPal and friends have been recommending we do this for ages).
You could perhaps imagine allowing copy-and-paste but with a warning ("You are copying characters from an alternative character set into IE's address bar - are you sure? Yes/No)"
This would pretty much solve the problem at the expense of making life slightly harder for all the legitimate users. But hey, that's how airport security works too.
So I backtracked to find out how its supported.
Got real close to home and found the company proving the fonts is a "neighbor"... and they had a press release on their site about the deal to provide the fonts.
WOBURN, Mass., USA, Sep. 18, 2006 – Monotype Imaging Inc., a global leader in font and imaging technologies, has acquired China Type Design Limited, a typeface design and production company based in Hong Kong. As a wholly owned subsidiary of Monotype Imaging, China Type will help lead expansion into Asian consumer electronics and printer markets which require scalable, multilingual text solutions.
[katakanafonts.com...]
@GrendelKhan
Are you saying that you can type in hangul into the address bar?secondly,
Are Korean domain extensions available to foreign buyers
thirdly
My pc is configured for Chinese an Korean script, I am kinda studying them, so would I be able to visit a hangul name site
cheers
1. yes.
2. yes.
3. yes.
woot!
Introducing non-English lettersYou might accuse me of splitting hair - but this is the first time that I read about "English letters". I always thought those were Latin ones?
I'd certainly accuse you of that ;)
You are right, of course, but in this thread I think it's making a distinction between 'English language specific Latin characters' and 'Latin characters which also include those with accents etc.' as opposed to 'other' characters, such as Chinese and African.
I hope that's cleared that up :)