Collation - String comparison

About

Collation is a general term for the process and function of determining the sorting order of strings of characters (or in other term, how a strings comparison is performed)

Collation implementations must deal with the complex linguistic conventions for ordering text in specific languages, and provide for common customizations based on user preferences.

Collation can be based on:

character position (that varies according to language and culture: Germans, French and Swedes sort the same characters differently)
application dictionaries (that may sort differently than phonebooks or book indices)
phonetic or appearance of the character (for non-alphabetic scripts such as East Asian ideographs)

Usage

sorting a list of strings
sorting records in a database (ie order by)
selecting sets of records with fields within given bounds.
search. For instance, “v” and “w” sort as if they were the same base letter in Swedish, a loose search should pick up words with either one of them.

Parameters

A collation will compare strings according to:

the locale of your application (country/language)
a code page
Case sensitivitivty comparisons
- (a=A)
- putting uppercase before lowercase (or vice versa)
accent sensitive, so 'é' does not equal 'e'
ignoring punctuation or not
WS = width sensitive
KS = kanatype sensitive
and any other application property that you can think of

Algorithm

Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository