Collation implementations must deal with the complex linguistic conventions for ordering text in specific languages, and provide for common customizations based on user preferences.
Collation can be based on:
- character position (that varies according to language and culture: Germans, French and Swedes sort the same characters differently)
- application dictionaries (that may sort differently than phonebooks or book indices)
- phonetic or appearance of the character (for non-alphabetic scripts such as East Asian ideographs)
- sorting a list of strings
- sorting records in a database (ie order by)
- selecting sets of records with fields within given bounds.
- search. For instance, “v” and “w” sort as if they were the same base letter in Swedish, a loose search should pick up words with either one of them.
A collation will compare strings according to:
- the locale of your application (country/language)
- Case sensitivitivty comparisons
- putting uppercase before lowercase (or vice versa)
- accent sensitive, so 'é' does not equal 'e'
- ignoring punctuation or not
- WS = width sensitive
- KS = kanatype sensitive
- and any other application property that you can think of
- Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository