Punycode is a stand­ard­ised encoding method that allows Unicode char­ac­ters to be mapped using a limited ASCII character set, meaning that in­ter­na­tion­al­ised domain names (IDN) can also contain non-ASCII char­ac­ters such as umlauts, for example.

How was the encoding method developed?

In 2003, Punycode was stand­ard­ised by the Internet En­gin­eer­ing Task Force (IETF) as syntax for encoding In­ter­na­tion­al­ized Domain Names in Ap­plic­a­tions (IDNA). The IETF defines a domain name as an IDN if it contains special char­ac­ters such as dia­crit­ics, letters or char­ac­ters that are not found in the Latin alphabet (e.g., umlauts in German). Such char­ac­ters cannot be processed by basic protocols such as the Domain Name System (DNS). For this example, we’ll use a domain name in German. Although, following the in­tro­duc­tion of IDNs, müller-büromöbel (Müller’s office furniture) is allowed under the top-level domain .de, it can only be processed by encoding the non-base char­ac­ters, for example, in the context of name res­ol­u­tion. Numerous internet protocols are based on English and therefore only support the limited ASCII character set.

In order to ensure com­pat­ib­il­ity between IDNs and older internet standards, the IETF has pre­scribed a method for encoding in­ter­na­tion­al­ised domain names using the char­ac­ters that were already permitted. This stand­ard­ised encoding procedure is known as Punycode.

Note

For email addresses, Punycode is only used for in­ter­na­tion­al­ised email domains. If the local part (before the @ character) contains non-ASCII char­ac­ters, it is encoded via UTF-8.

How does Punycode encoding work?

An overview of the Punycode process

Punycode is defined by the IETF in RFC 3492 as a possible ap­plic­a­tion of the general coding algorithm known as Boot­string. The Boot­string algorithm enables the mapping of character strings that comprise arbitrary character sets with a limited selection of elements. The de­vel­op­ment of the coding procedure is based on six prin­ciples. In Punycode encoding, these elements are called base char­ac­ters, which consist of lowercase letters, digits, and the hyphen (-). The de­vel­op­ment of the coding method is based on six prin­ciples.

  • Com­plete­ness: Each output string can be mapped to a sim­pli­fied string using a boot string.
  • Unique­ness: Assigning the output string to the re­spect­ive Boot­string encoding is unique. Each Punycode can be assigned exactly one ASCII coun­ter­part and vice versa.
  • Re­vers­ib­il­ity: A Boot­string encoding can be reversed at any time without any in­form­a­tion loss.
  • Ef­fi­ciency: The encoded string is – if at all – only minimally longer than the output string.
  • Sim­pli­city: Boot­string uses simple encoding and decoding al­gorithms.
  • Read­ab­il­ity: Only char­ac­ters that cannot be rep­res­en­ted in the target character set are encoded. All other char­ac­ters remain unchanged.

Punycode specifies Boot­string according to the re­quire­ments for in­ter­na­tion­al­ised domain names. This should enable the Unicode char­ac­ters to be mapped via the pre­vi­ously permitted base char­ac­ters.

Punycode example

The following example shows how the encoding works:

IDN: müller-büromöbel

The IDN müller-büromöbel contains the char­ac­ters ü and ö, which are not included in the pre­vi­ously permitted character set for domain names. As a result, they must be encoded via Punycode to ensure com­pat­ib­il­ity.

Step 1: Nor­m­al­isa­tion

In the first step, the encoding procedure enables nor­m­al­isa­tion of the output character string. All uppercase letters are replaced by cor­res­pond­ing lowercase letters.

Step 2: Erad­ic­a­tion of all non-basic char­ac­ters

In the second step, all non-basic char­ac­ters are erad­ic­ated. These are then added to the domain name in coded form and separated by a hyphen.

If the Punycode syntax is used to encode internet addresses, each result string is provided with an ACE prefix (short for ASCII-com­pat­ible encoding):

ACE prefix: xn–

The ACE prefix ensures that domain names con­tain­ing hyphens are not mis­in­ter­preted as in­ter­na­tion­al domain names.

This results in the following encoding for the IDN müller-büromöbel:

ACE: xn–mller-brombel-rmb4fg

The algorithm un­der­ly­ing the Punycode procedure is re­mark­able. It ensures that, despite the con­ver­sion, domain labels don’t exceed the maximum length of 63 char­ac­ters.

During the encoding process, Unicode char­ac­ters are not converted one-to-one into ASCII char­ac­ters. Instead, the algorithm de­term­ines a string based on the distance between the erased char­ac­ters and the position of the char­ac­ters in the output string.

Related to the example shown above, the string rmb4fg indicates that mller-brombel must be sup­ple­men­ted by the Unicode char­ac­ters ü and ö in the second and seventh position.

Image: Overview of sections of the ACE string
The ACE string consists of the ACE prefix and a puny-coded string.

Ex­cep­tions to the rule

De­vi­ations occur if the domain name doesn’t contain any non-base char­ac­ters or if it only contains non-base char­ac­ters.

A domain name that contains only non-base char­ac­ters shows only the encoded string and the ACE prefix after being encoded. A domain name such as παράδειγμα (Greek for ‘example’) cor­res­ponds to the following encoding:

IDN: παράδειγμα

ACE: xn–hxajbheg2az3al

If a domain name contains only base char­ac­ters, Punycode is not used. Ac­cord­ingly, no ACE prefix is appended. Coding is not necessary in this case because basic internet protocols can already un­der­stand the domain name.

If you consider the Fully Qualified Domain Name (FQDN) as a whole, each label (top-level domain, second-level domain, third-level domain, etc.) is encoded sep­ar­ately. A domain likeпример.бг (Bulgarian for ‘example.bg’) could be encoded as follows

IDN: пример.бг

ACE: xn–e1afmkfd.xn–90ae

The following table gives an overview of the different variants of the Punycode syntax.

IDN Punycode ACE
Base & non-base charΒ­acΒ­ters mΓΌller-bΓΌromΓΆbel.de mller-brombel-rmb4fg.de xn--mller-brombel-rmb4fg.de
Only non-base char­ac­ters ΠαράδΡιγμα.gr hxajbheg2az3al.gr xn--hxajbheg2az3al.gr
Only base charΒ­acΒ­ters example.org example.org No use
Note

The Punycode algorithm is described in detail in RFC 3492. In addition, the document provides an im­ple­ment­a­tion of the coding procedure in the pro­gram­ming language C.

Users usually resort to freely available Punycode con­vert­ers for encoding in­ter­na­tion­al­ised domain names.

Puny encoding with emoji domains

Not only in­ter­na­tion­al­ised domain names but also emoji domains can be realised via Punycode. For this to work however, the top-level domain, has to permit the use of emojis, and the desired emoticon needs to be in the Unicode standard.

Tip

At the moment, the following TLDs allow emoji domains to be re­gistered: .ws, .tk, .to, .ml, .ga, .cf, .gq, and .fm.

Emoji domains are tech­nic­ally processed as Punycode, but in theory should be presented to the user as a com­bin­a­tion of text and emoticons.

Emoji domain: https://i❤.ws/

ACE: https://xn--i-7iq.ws/

Prac­tic­ally no standard browser im­ple­ments this at present. If you enter an emoji domain in Firefox, Chrome, Safari, Edge, or Opera, the address bar only shows the ACE string.

Are there free Punycode con­vert­ers?

Free Punycode gen­er­at­ors that transfer IDNs into an ASCII-com­pat­ible form can be found on various websites. One example is Punycoder.

Image: Punycoder, the Punycode converter
Punycoder converts Punycode to Text/Unicode and vice-versa.

For IDNs of other TLDs, the Punycode converter by Mathias Bynens based on punycode.js is a good choice.

Image: The Punycode converter made by Mathias Bynens based on punycode.js
With his *Punycode domain name converter,*Mathias Bynens offers an open-source tool for con­vert­ing in­ter­na­tion­al­ised domains.
Register your domain name
Launch your business on the right domain
  • Free WordPress with .co.uk
  • Free website pro­tec­tion with one Wildcard SSL
  • Free Domain Connect for easy DNS setup

Does Punycode pose a security risk?

Punycode becomes a security risk in the case of ho­mo­graph­ic phishing – cy­ber­at­tacks where criminals use the similar ap­pear­ance of different char­ac­ters to lure un­sus­pect­ing victims to fake websites. Blogger Xudong Zheng shows what a phishing attack looks like using the following Punycode domain https://www.xn--80ak6aa92e.com/ as an example. This leads internet users to a website with the following IDN: https://www.аррӏе.com/

The URL provided is not the official website of the Cali­for­nia tech­no­logy company Apple Inc., but a phishing website created for demon­stra­tion purposes.

Instead of the ASCII character a with Unicode U+0061, the Cyrillic а (U+0430) is used – these two char­ac­ters can hardly be dis­tin­guished by the naked eye but are in­ter­preted as different char­ac­ters by web browsers. Even cer­ti­fic­ates cannot provide security to protect internet users. For modern phishing campaigns, criminals create valid SSL cer­ti­fic­ates with the goal of making their websites look authentic.

Current versions of Chrome and Opera prevent phishing attacks like these by dis­play­ing the ACE string instead of the in­ter­na­tion­al­ised domain on IDNs that mix char­ac­ters from different character sets. Internet Explorer and Microsoft Edge prevent domains like these from being accessed. Firefox, however, does not offer any pro­tec­tion against Punycode phishing.

Image: Example of a homographic attack
Example of a ho­mo­graph­ic domain: The URL looks the same as Apple’s official website, however, the Unicode character U+0430 is actually a Cyrillic letter that is as­ton­ish­ingly similar to the ASCII character a.

This is how Firefox users can protect them­selves. In order to reduce the risk that phishing websites pose, Firefox users currently only have the option to prevent Punycode from being trans­lated into IDNs in general. Only two steps are necessary for this temporary solution:

  1. Access the con­fig­ur­a­tion editor: Type about:config in the address bar of your web browser to open the Firefox con­fig­ur­a­tion editor.
  2. Force Punycode: Find the setting network.IDN_show_punycode and change its value from false to true.

After con­fig­ur­a­tion, Firefox will display in­ter­na­tion­al­ised domains in the address bar as ACE strings.

Domain Checker
Go to Main Menu