Text - Double Byte Character Set

1 - About

A Double Byte Character Set is a character set where:

DBCS meant that you need to write code that would treat these pair of code points as one.

The DBCS supports national languages that contain a large number of unique characters or symbols (the maximum number of characters that can be represented with 1 byte is 256 characters).

For programming awareness, a set of points are set aside to represent the first byte of the set and are not valued unless they are immediately followed by a defined second byte.

3 - Example

Examples of such languages include :

  • Japanese,
  • Korean,
  • and Chinese.

Each Asian character is represented by a pair of code points (thus double-byte). Programs written for single-byte code pages won't work for Asian languages. A set of code points used for Japanese is called a double-byte code page; and a Japanese font character set is called a double-byte character set (DBCS).

4 - History

Windows codepage 1253 provides character codes required in the Greek writing system and codepage 1250 provides the characters for Latin writing systems including English, German and French.

It is the upper 128 code points that contain either:

  • the accent characters
  • or the Greek characters.

Thus you cannot store Greek and German in the same code stream unless you put some type of identifier to indicate what codepage you are referencing.

Asian languages far exceed the 256-character limit imposed by a single byte. Japanese, for example, uses about 2000 kanji for everyday purposes, more kanji for special vocabularies, two phonetic syllabaries, Latin alphabetic characters, Arabic numerals, and both Japanese and Western punctuation marks.

A different scheme needed to be developed but it had to be based on the concept of 256 character codepages. Thus DBCS (Double Byte Character Sets) were born.

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap