www.delorie.com/gnu/docs/regex/regex_5.html   search  
Buy the book!


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3 Collating Elements vs. Characters

POSIX generalizes the notion of a character to that of a collating element. It defines a collating element to be "a sequence of one or more bytes defined in the current collating sequence as a unit of collation."

This generalizes the notion of a character in two ways. First, a single character can map into two or more collating elements. For example, the German "es-zet" collates as the collating element `s' followed by another collating element `s'. Second, two or more characters can map into one collating element. For example, the Spanish `ll' collates after `l' and before `m'.

Since POSIX's "collating element" preserves the essential idea of a "character," we use the latter, more familiar, term in this document.

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003