Table of Contents


A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]

Every unit of text (character) is assigned a unique integer known as a code point.

All the characters within a string have a common coding representation (ie character set) that translate a code point to a glyph (visual character representation).

A Text representation in computer is a String.

Without an associated data schema (such as Java script, XML, …), a text is primarily said to be unstructured.

Text is the basis of any language:

Text Editor use also often a text tree (Rope_(data_structure)) to speed up text transformation.

Many different characters look alike and they may be the cause of attack. See Characters - Homograph

Regular Expressions defined the structure of text.

Powered by ComboStrap