According to the in­ter­na­tion­al tele­com­mu­nic­a­tions union (ITU), more than three billion people use the World Wide Web and in­creas­ingly so in their mother tongues. This change was in part brought on by the in­tro­duc­tion of in­ter­na­tion­al domain names in 2003. We’ll explain how IDN domains work.

What is an in­ter­na­tion­al­ised domain name (IDN)?

The IETF (Internet En­gin­eer­ing Task Force) refers to IDNs as domain names that contain special char­ac­ters that are not part of the Latin alphabet, such as umlauts or char­ac­ters from other alphabets. However, the Domain Name System (DNS), which is re­spons­ible for trans­lat­ing URLs into IP addresses, cannot un­der­stand these domain names. The DNS is based on the limited standard character set ASCII.

In order to make IDNs un­der­stand­able for the DNS as well as other internet protocols, the internet standard In­ter­na­tion­al­ising Domain Names in Ap­plic­a­tions (IDNA) was created in 2003. This defines a stand­ard­ised trans­la­tion from Unicode to ASCII, therefore enabling the use of non-ASCII char­ac­ters in domain names.

Register your domain name
Launch your business on the right domain
  • Free WordPress with .co.uk
  • Free website pro­tec­tion with one Wildcard SSL
  • Free Domain Connect for easy DNS setup

How does IDNA work?

Much of the internet’s in­fra­struc­ture is only supported by the ASCII character set. In order to make sure that in­ter­na­tion­al domain names can be processed, each IDN that’s available in Unicode is trans­lated into an ACE string, which is based on ASCII. Following this, URLs featuring char­ac­ters with accents or umlauts are displayed. The server, on the other hand, continues to process the addresses as ASCII com­pat­ible. This procedure is specified in the IDNA2003 internet standard and in the IDNA2008 revision, which was approved in 2010. Trans­lat­ing from Unicode to ASCII occurs client-side (in the browser, email program, etc.) and is based on a stand­ard­ised coding process called Punycode.

Punycode

The RFC 3492-stand­ard­ised Punycode was developed for clearly dis­play­ing Unicode character strings as ASCII symbols without loss of quality. All non-ASCII char­ac­ters are removed from the domain name, encoded and separated with a hyphen. This code sequence contains in­form­a­tion about the Unicode symbol in question as well as its position in the domain name. Ad­di­tion­ally, each ACE string created in this way is labelled with the prefix xn–. This clarifies to the reader that the character sequence is an IDN that has been encoded according to IDNA and Punycode standards. See our article on Punycode for a detailed ex­plan­a­tion of the encoding process as well as some examples.

Tip

With an online IDN domain converter, you can convert IDNs to their cor­res­pond­ing ACE strings using Punycode.

Dif­fer­ences between IDNA2003 and IDNA2008

For the original 2003 procedure, in­ter­na­tion­al­ised URLs were nor­m­al­ised prior to Punycode encoding using the nameprep method. This method changed capital letters into lowercase letters, removed control char­ac­ters and trans­ferred equi­val­ent char­ac­ters into a unified form. Nameprep was removed from this process when IDNA2008 was in­tro­duced. Now, IDNA does not specify any nor­m­al­isa­tion. Instead, it re­com­mends an algorithm that converts capital letters into lowercase ones.

This adaption also ac­com­mod­ates users in the German-speaking world, since the Unicode character ‘ß’, which is common in Germany, was ori­gin­ally defined as the equi­val­ent of ‘ss’ according to IDNA2003. Domains such as www.fußball-ergebnisse.de were thus auto­mat­ic­ally nor­m­al­ised to www.fussball-ergebnisse.de in the nameprep process. This is no longer the case since IDNA2008 came into the picture. Since 2010, the ‘ß’ is correctly in­ter­preted as ‘Latin small letter sharp s’ and can be re­gistered as part of an IDN domain.

In addition, around 8,000 char­ac­ters that were possible in domain names under IDNA2003 are no longer supported under IDNA2008. Four char­ac­ters including ‘ß’ are in­ter­preted dif­fer­ently since the standard was revised. For a detailed dis­cus­sion of the dif­fer­ences between IDNA2003 and IDNA2008, see Unicode Technical Standard #46. The following table provides a summary of the main dif­fer­ences:

IDNA2003 IDNA2008
Nameprep procedure required No nor­m­al­isa­tion specified
Valid for Unicode 3.2 Valid for Unicode versions from 5.2 onwards
Strict rules for right-to-left fonts Clearer rules for right-to-left fonts
Upper- and lower-case letters are con­sidered as separate char­ac­ters Upper-case letters are converted to lower-case letters
Many symbols are pro­hib­ited, e.g., graphic symbols that do not belong to any alphabets, as well as some punc­tu­ation
‘Remapping’ removed from some Unicode char­ac­ters, as this could lead to ir­reg­u­lar­it­ies

What problems are there with IDNs?

By now, all common internet programs should be able to un­der­stand IDN. However, problems with in­ter­na­tion­al­ised domain names sometimes occur because the switch from IDNA2003 to IDNA2008 has not yet been con­sist­ently im­ple­men­ted. One example that’s prob­lem­at­ic for German is the different in­ter­pret­a­tion of ‘ß’. Since IDNA2003 com­pulsor­ily converts ‘ß’ to ‘ss’, special ß domains that can be re­gistered according to IDNA2008 are often not dis­cov­er­able for systems that convert according to the outdated standard. Instead, users are directed to the cor­res­pond­ing domain con­tain­ing ‘ss’. This problem can be cir­cum­ven­ted by website operators re­gis­ter­ing both variants and re­dir­ect­ing the second domain to the pri­or­it­ised spelling using a domain redirect.

Domain Checker
Go to Main Menu