Domainmonster.com Domain Editorials
Home > Editorials
Internationalised Domain Names Considered by ICANN
Currently, there are 37 characters which can be used to make up domain names: the Latin letters A-Z, the Arabic numbers 0-9, and the hyphen. The problem is that this leaves a vast proportion of the world's population on the wrong side of the language gap, because many languages use non-Western alphabets such as Cyrillic, Arabic and Chinese.
Almost ever since the internet began to take off, there have been calls to break down this language barrier and permit non-Latin characters to be used in domain names. Some commentators argue that the lack of support for other alphabets contributes to the so-called "digital divide", resulting in a Westernised internet dominated by the developed world. These critics claim that non-Western languages are ignored to the detriment of diversity and inclusiveness.
And it's not just completely new character sets that campaigners are looking to have included: é, î, ö and ß are all currently left out of the list, as well as the additional vowels found in many Scandinavian languages, such as ø. A UN panel reported in July 2005 that "insufficient progress has been made toward multilingualization." This is no small problem.
As of 2004, the administrators of the German top-level domain ".de" allow registrants to choose from 92 accented and other special characters, including vowels which use the umlaut, which is common in many German names. Tests have been done to try using non-Latin characters in domain names, many of which have proved successful, in the internet sandbox of the ".test" top-level domain. But this highlights the problem about which campaigners for international characters are so concerned: all top-level domains are, and for the meantime will remain, composed solely of the 26 letters of the Latin alphabet. The introduction of the .asia domain later this year may give Asian businesses and personal site owners an alternative to the ubiquitous Anglo-centric ".com", but it's still not in any of the possible native tongues of the vast proportion of Asian internet users.
While the motivation behind encouraging non-standard character sets for domain names is laudable, there is strong resistance to the proposal from several quarters. Part of the problem is that adding in all the character sets that people are calling for could result in a character list of over 50,000 characters. The Chinese "alphabet", for example, does not consist simply of a set of letters, but a huge number of semantic/phonetic characters – a well-educated Chinese person probably recognises about 6,000 characters, but there are more than 40,000 in existence, although many of these are not in common use.
Another concern is that dramatically increasing the number of available characters could lead to an explosion in fraudulent activity on the internet, a major motivation for ICANN's rejection of the idea in 2003. A character system called Unicode assigns a hexadecimal to each character: "a", for example, is 97. However, in the Cyrillic alphabet, exactly the same arrangement of pixels is designated by the number 1072.
The problem is that having two or more numbers for the same character means that two domain names could be registered which look identical but direct to different websites. Someone could register paypal.com with Cyrillic letter "a"s instead of Latin ones, and quietly siphon off innocent visitors' personal details as they attempt to log into their PayPal accounts, a technique called "phishing".
Even the very infrastructure of the internet itself could be at risk. Paul Twomey, Chief Executive of ICANN, expressed concerns at the end of last year that allowing non-Latin characters could "break the whole Internet", because the Domain Name System, which has the task of resolving alphanumeric domain names to specific numbers (IP addresses) was designed with only 37 characters in mind, even leaving out many of the characters that appear on a standard Western keyboard, such as #, & and ! characters.
Twomey said, "The Internet is like a fifteen storey building, and with international domain names what we're trying to do is change the bricks in the basement." How the system would cope with international characters is a serious concern: will the DNS be sufficiently robust to cope with the billions of possible character permutations which will be suddenly made available if international character sets are permitted? Vint Cerf, ICANN chairman, sounded somewhat more optimistic, telling sceptics, "We're confident that none of the infrastructure is likely to encounter a problem but you really don't know until you are in the live environment."
There are various other concerns: browsers will have to be able to support a huge variety of characters if users are to be able to visit international websites, and entering domain names and email addresses will become extremely problematic for everyone. There is a risk of having a newly segregated internet, no longer held together by a common language.
Nevertheless, ICANN is strongly considering introducing new character sets, and a decision is thought to be on the cards for 2008. It remains to be seen how many new characters will be introduced, and what the effect will be on the functioning of the internet.
By Natalie Catchpole

