www.delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; q=dns; s=default; b=QfiFyb/0p+WuHyfr4KBQJ4Z1lmPl | |
B7geJB1KY2cyIcS67P5RV9SwgM+lZ0v4+HwTG+9k6jIQIQquNn0u5jIR54j0RkXR | |
R1mmk49mwP/Ntgdyx8kACPcmzyuF7ST2dKcV2ll2baIdC88eaO/qLBzAoahCFL1G | |
5iMnIteFTLN2pNo= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; s=default; bh=++lvGGtq+qTpQbkjXH060saji/4=; b=um | |
qejv6/wxMBUVcr4UfqNo5NiDDzfV2LeRobuDOVT+XCgt9UNdlkl0O2HZaw+2antl | |
wPzhP/NyOkS1USsUuhk6aaQ5C4a+UZW8RvEbLjdWxCdQ1vI/O8DB9SuXF2kQF/tI | |
O7R+ovnyCl/v0UGUIa8FBdl11o9geHQGArqT8eE9A= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=1.6 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 |
X-HELO: | gproxy8-pub.mail.unifiedlayer.com |
X-Authority-Analysis: | v=2.1 cv=Zox+dbLG c=1 sm=1 tr=0 a=x/h8IXy5FZdipniTS+KQtQ==:117 a=x/h8IXy5FZdipniTS+KQtQ==:17 a=cNaOj0WVAAAA:8 a=f5113yIGAAAA:8 a=IkcTkHD0fZMA:10 a=z1iSbGl3AAAA:8 a=CnPQkyIfcMwA:10 a=rD4U560VbWoA:10 a=h1PgugrvaO0A:10 a=20KFwNOVAAAA:8 a=WYcy3mCKFWwyspbR7_MA:9 a=QEXdDO2ut3YA:10 |
Mime-Version: | 1.0 (Mac OS X Mail 8.2 \(2098\)) |
Subject: | Re: Grepping Unicode files? |
From: | Vince Rice <vrice AT solidrocksystems DOT com> |
In-Reply-To: | <5554D09B.3030209@redhat.com> |
Date: | Thu, 14 May 2015 12:14:20 -0500 |
Message-Id: | <47AFF066-46C5-41FA-A99B-F53C680DF09A@solidrocksystems.com> |
References: | <3C280897-291A-4A8C-8C3F-46D1D9BEFCFE AT solidrocksystems DOT com> <746170827 DOT 20150514185648 AT yandex DOT ru> <313678DD-A000-4F82-A015-836B882C09FC AT solidrocksystems DOT com> <5554D09B DOT 3030209 AT redhat DOT com> |
To: | cygwin AT cygwin DOT com |
X-Identified-User: | {3986:box867.bluehost.com:solidrr2:solidrocksystems.com} {sentby:smtp auth 65.118.57.199 authed with vrice AT solidrocksystems DOT com} |
X-IsSubscribed: | yes |
X-MIME-Autoconverted: | from quoted-printable to 8bit by delorie.com id t4EHEkH1029580 |
> On May 14, 2015, at 11:43 AM, Eric Blake <eblake AT redhat DOT com> wrote: > > On 05/14/2015 10:32 AM, Vince Rice wrote: > > … >> >> Now, pardon my continued ignorance, but which of those variables needs to be set to UTF16 in order for grep to work? And I assume it (they?) should be set to en_US.UTF-16? > > None. UTF16 is not a valid locale. It is a valid encoding (wide > character), but locales must operate on multi-byte sequences, not wide > characters. So you HAVE to convert from wide character to multi-byte > before you can do anything that requires a locale to work correctly. Oh my, the rabbit-hole gets deeper. I don’t know the difference between wide character and multi-byte. A little searching appears to indicate that Unicode is a type of wide-character, while multi-byte is … well, I still don’t know what multi-byte is. :) But, we’re definitely out in the weeds of non-cygwinness here, and my file is UTF16, so I can learn what multi-byte is and the difference later. Bottom-line… >> >> Thanks to everyone for your help. I think you’ve all confirmed this isn’t cygwin-specific, but I couldn’t find anything even searching generically (“grep unicode” and now “grep utf16”). I did finally find an external reference to iconv, but if grep is supposed to be handle this natively, I haven’t been able to find much on how to do it. > > grep cannot handle UTF16 natively. iconv exists to do encoding > transformations, so that the rest of the system can live in multi-byte > world instead of worrying about wide-character encodings. … grep can’t handle unicode files. Good to know. iconv it is. Thanks again! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright 2019 by DJ Delorie | Updated Jul 2019 |