Table of Contents

About

Collation is a general term for the process and function of determining the sorting order of strings of characters (or in other term, how a strings comparison is performed)

Collation implementations must deal with the complex linguistic conventions for ordering text in specific languages, and provide for common customizations based on user preferences.

Collation can be based on:

  • character position (that varies according to language and culture: Germans, French and Swedes sort the same characters differently)
  • application dictionaries (that may sort differently than phonebooks or book indices)
  • phonetic or appearance of the character (for non-alphabetic scripts such as East Asian ideographs)

Usage

  • sorting a list of strings
  • sorting records in a database (ie order by)
  • selecting sets of records with fields within given bounds.
  • search. For instance, “v” and “w” sort as if they were the same base letter in Swedish, a loose search should pick up words with either one of them.

Parameters

A collation will compare strings according to:

  • the locale of your application (country/language)
  • Case sensitivitivty comparisons
    • (a=A)
    • putting uppercase before lowercase (or vice versa)
  • accent sensitive, so 'é' does not equal 'e'
  • ignoring punctuation or not
  • WS = width sensitive
  • KS = kanatype sensitive
  • and any other application property that you can think of

Algorithm