X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs Date: Fri, 14 Sep 2001 11:37:05 +0200 (MET DST) From: Hans-Bernhard Broeker X-Sender: broeker AT acp3bf To: Eli Zaretskii cc: djgpp-workers AT delorie DOT com Subject: Re: NLS and djgpp.env In-Reply-To: <8011-Fri14Sep2001101346+0300-eliz@is.elta.co.il> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Fri, 14 Sep 2001, Eli Zaretskii wrote: > > Date: Thu, 13 Sep 2001 18:39:35 +0200 (MET DST) > > From: Hans-Bernhard Broeker > > > > If at all, it would be a good idea to set LC_COLLATE to "C", > > lest users be badly surprised by 'grep' and other regex tools if they "set > > LANG=de" and then use pattern like [a-z]. > > If we decide to set LC_COLLATE, I would suggest doing so in a special > section for Grep programs, not a general setting. Fine with me, too. The problem might be to nail down all that need it. In a nutshell, every program using regular expressions is supposed to behave in that new, and IMHO seriously braindamaged way. At the minimum, that would mean 'sed', 'awk', 'grep', possibly 'lex', and the POSIX standard regex library for C. Every one of them would need an entry in DJGPP.ENV added. BTW, for those of you who don't know what we're talking about: the latest versions of 'grep' and similar tools, following some upcoming new POSIX revision, have changed meaning of character classes: they now match all letters in collation order, rather than code order positions between the given endpoints. I.e. if your collation order sorts without respect for letter case (many, including LANG=us, do that), echo "BOOM!" | grep "[a-e]" will suddenly echo "BOOM!", which it never did before, because [a-e] will now be equivalent to [aBbCcDdE] (it may match 'A', too, I'm not quite sure) instead of the traditionally expected [abcde]. This is guaranteed to break a big fraction of the existing base of shell scripts using grep, sed or awk. Some of which are older than most of the DJGPP team, and have been working fine ever since, but now, all of a sudden, they'll produce nonsense. Even German-based Linux distributor S.u.S.E. decided that this was so inconvenient that they set LC_COLLATE=C in all login scripts they create, even if you configured the account for a German user (--> LANG=de_DE AT iso8859_15) -- Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de) Even if all the snow were burnt, ashes would remain.