A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]
Every unit of text (character) is assigned a unique integer known as a code point.
All the characters within a string have a common coding representation (ie character set) that translate a code point to a glyph (visual character representation).
A Text representation in computer is a String.
Without an associated data schema (such as Java script, XML, …), a text is primarily said to be unstructured.
Text is the basis of any language:
- of natural
Text Editor use also often a text tree (wiki/Rope_(data_structure)) to speed up text transformation.
Many different characters look alike and they may be the cause of attack. See Characters - Homograph
Regular Expressions defined the structure of text.
Text seems at first hand easy but it's not.
Below you can find a couple of text operations:
- Code Page Conversion: Convert text data to or from a code page
- Collation: Compare strings according to the conventions and standards of a particular language, region or country.
- Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc.
- Bidi: support for handling text containing a mixture of left to right (English) and right to left (Arabic or Hebrew) data.
- Text Boundaries: Locate the positions of words, sentences, paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.