ⓘ Cyrillic script in Unicode

Cyrillic (Unicode block)

Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies.


ⓘ Cyrillic script in Unicode

As of Unicode version 13.0 Cyrillic script is encoded across several blocks, all in the BMP:

  • Phonetic Extensions: U+1D2B, U+1D78, 2 Cyrillic characters
  • Cyrillic Extended-C: U+1C80–U+1C8F, 9 characters
  • Cyrillic Supplement: U+0500–U+052F, 48 characters
  • Combining Half Marks: U+FE2E–U+FE2F, 2 Cyrillic characters
  • Cyrillic Extended-A: U+2DE0–U+2DFF, 32 characters
  • Cyrillic: U+0400–U+04FF, 256 characters
  • Cyrillic Extended-B: U+A640–U+A69F, 96 characters

The characters in the range U+0400–U+045F are basically the characters from ISO 8859-5 moved upward by 864 positions. The next characters in the Cyrillic block, range U+0460–U+0489, are historical letters, some being still used for Church Slavonic. The characters in the range U+048A–U+04FF and the complete Cyrillic Supplement block U+0500-U+052F are additional letters for various languages that are written with Cyrillic script. Two characters in the block Phonetic Extensions block complete the Uralic Phonetic Alphabet: U+1D2B ᴫ CYRILLIC LETTER SMALL CAPITAL EL and U+1D78 ᵸ MODIFIER LETTER CYRILLIC EN.

Unicode includes few precomposed accented Cyrillic letters; the others can be combined by adding U+0301 "combining acute accent" after the accented vowel e.g., ы́ э́ ю́ я́ see below.

The following two diacritical marks not specific to Cyrillic can be used with Cyrillic text:

  • U+0301 ◌́ COMBINING ACUTE ACCENT = Cyrillic stress mark, in Combining Diacritical Marks block U+0300–U+036F
  • U+20DD ◌⃝ COMBINING ENCLOSING CIRCLE = Cyrillic ten thousands sign, in Combining Diacritical Marks for Symbols block U+20D0–U+20F0

In the table below, small letters are ordered according to their Unicode numbers; capital letters are placed immediately before the corresponding small letters. Standard Unicode names and canonical decompositions are included.


1. Blocks

The Cyrillic block U+0400 – U+04FF was added to the Unicode Standard in October, 1991 with the release of version 1.0:

The Cyrillic Supplement block U+0500 – U+052F was added to the Unicode Standard in March, 2002 with the release of version 3.2:

The Cyrillic Extended-A U+2DE0 – U+2DFF and Cyrillic Extended-B U+A640 – U+A69F blocks were added to the Unicode Standard in April, 2008 with the release of version 5.1:

The Cyrillic Extended-C block U+1C80 – U+1C8F was added to the Unicode Standard in June, 2016 with the release of version 9.0: