A character is:
- an atomic unit of text (ISO/IEC 10646:2000 Character specification]
- is categorized as a primitive data type
A character is the smallest component of written language that has semantic value; refers to the abstract meaning and/or shape …
Character are the basic unit of organization of encoded text.
A Character can also be simply a set of characters:
- symbols (mathematical),
- logograms (from non-phonetic writing systems such as kanji),
For example, the following character set appears in several code pages:
- 26 non-accented letters A through Z ( A,B,C….X,Y,Z)
- 26 non-accented letters a through z ( a,b,c,…x,y,z)
- digits 0 through 9
- special characters:
- punctuation: . , : ; ? !
- ( ) ' “ / - _ & + % * = < >
Encoding, File Storage
Problem: Which character is –
The Hexadecimal in UTF8 of this character is e2 80 93. It corresponds to the unicode character 2013 - EN DASH. See Translation of a UTF-8 Multibyte sequence to Unicode - Example 2. 0a is the end of file.
echo – | hexdump -C
00000000 e2 80 93 0a |....| 00000004
The charCodeAt() method returns the UTF-16 code unit (an integer between 0 and 65535) at the given index.
Example with the cldr/utility/character.jsp
The below code point reporter is based on the above function and shows for each character of a string its code point.
The character map application of windows where you can search
For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).
Characters such as an hyphen (-) and a dash are really difficult to separate from each other visually.
In this case, you should transform them as code point to see the difference. See the dedicated page: How to see the difference between two characters (hyphen and dash) ?
Each character requires: