A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]

Every unit of text (character) is assigned a unique integer known as a code point.

All the characters within a string have a common coding representation (ie character set) that translate a code point to a glyph (visual character representation).

A Text representation in computer is a String.

Without an associated data schema (such as Java script, XML, …), a text is primarily said to be unstructured.

Text is the basis of any language:

Text Editor use also often a text tree (wiki/Rope_(data_structure)) to speed up text transformation.

Many different characters look alike and they may be the cause of attack. See Characters - Homograph

Regular Expressions defined the structure of text.


Text seems at first hand easy but it's not.

Below you can find a couple of text operations:

  • Code Page Conversion: Convert text data to or from a code page
  • Collation: Compare strings according to the conventions and standards of a particular language, region or country.
  • Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc.
  • Bidi: support for handling text containing a mixture of left to right (English) and right to left (Arabic or Hebrew) data.
  • Text Boundaries: Locate the positions of words, sentences, paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.
Task Runner