www.delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-SWARE-Spam-Status: | No, hits=-6.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD |
X-Spam-Check-By: | sourceware.org |
Message-ID: | <4C977AB8.90702@redhat.com> |
Date: | Mon, 20 Sep 2010 09:16:08 -0600 |
From: | Eric Blake <eblake AT redhat DOT com> |
User-Agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100907 Fedora/3.1.3-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.3 |
MIME-Version: | 1.0 |
To: | cygwin AT cygwin DOT com |
Subject: | Re: awk gsub problem |
References: | <AANLkTikzGH8GUZ5ZUytSJShfYE=KMyphyue83Q8XMm4- AT mail DOT gmail DOT com> <20100916092458 DOT GB15121 AT calimero DOT vinschen DOT de> <AANLkTimwcbmxMtfZWbkztef+fxQfKtoM9CsFOd38E2a3 AT mail DOT gmail DOT com> <20100918092139 DOT GE14602 AT calimero DOT vinschen DOT de> <20100918200851 DOT GA5760 AT calimero DOT vinschen DOT de> <AANLkTi=O_VkQEdXfCLsRQa40zM7min2X=cwosFM95oTU AT mail DOT gmail DOT com> |
In-Reply-To: | <AANLkTi=O_VkQEdXfCLsRQa40zM7min2X=cwosFM95oTU@mail.gmail.com> |
X-IsSubscribed: | yes |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
On 09/19/2010 02:33 PM, Lee wrote: >> If LANG is "en_US" or "en_US.utf8", then the regular expression "[a-z]" >> does *not* correspond anymore to the ASCII codes. Rather it corresponds >> to something like "[aAbBcCdD...zZ]", independent of the actual character >> encoding ISO-8859-1 or UTF-8. In glibc, [a-z] gets translated according to locale collation order. If A collates before a, then it maps to [aBbCc..Zz], if A collates after a, then it maps to [aAbB...yYz] (notice that in either case, one of the two capital letters is omitted, so it is NOT the same as all 26 letters in both cases). This has been a MUCH complained-about feature of glibc, which has in turn been copied by bash, awk, grep, etc. Note that POSIX explicitly states that [a-z] has unspecified results in any locale except C. So the glibc behavior is permitted, but so is the traditional behavior of just the 26 lowercase letters. If you can convince the glibc folks that [a-z] should have the traditional behavior, more power to you. http://lists.gnu.org/archive/html/bug-grep/2010-09/msg00030.html -- Eric Blake eblake AT redhat DOT com +1-801-349-2682 Libvirt virtualization library http://libvirt.org -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |