www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/03/19/14:20:53

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Thu, 19 Mar 2009 20:20:31 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?
Message-ID: <20090319192031.GB9322@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> <49C29366 DOT 8080708 AT acm DOT org>
MIME-Version: 1.0
In-Reply-To: <49C29366.8080708@acm.org>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Mar 19 11:48, David Rothenberger wrote:
> On 3/19/2009 11:13 AM, Corinna Vinschen wrote:
>> On Mar 19 10:33, David Rothenberger wrote:
>>> On 3/19/2009 6:09 AM, Corinna Vinschen wrote:
>>>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8
>>>> charset *iff* the application switched the codepage by calling something
>>>> along the lines of `setlocale(LC_ALL, "");'.
>>>> An application which does not call setlocale (which means, it's not
>>>> native language aware anyway) would still use the default ANSI codepage.
>>>
>>> I ran into an issue yesterday where I was trying to "du -sh" a directory
>>> that contained files whose names included UTF characters, I think.
>>> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added
>>> CYGWIN=codepage:utf8.
>>
>> Yes, sure.  As described in the User's Guide.  That's exactly what bugs
>> me right now.  To get UTF-8 support you have to set LANG or LC_ALL or
>> whatever, *and* CYGWIN=codepage:utf8.
>
> In my specific case, I didn't need to set LANG or LC_ALL, just  
> CYGWIN=codepage:utf8.

Yes, sure.  LANG and freinds are used in the locale-specific functions
in newlib, codepage:xxx is used in Cygwin.  Your case is only a case
of converting filenames from UTF-16 to some multipbyte charset.  That
conversion is using the codepage:xxx right now.  Every other multibyte/
wide character stuff in the application is controlled by setlocale,
though.

>>> So my question is, will this work if codepage is dropped and I set LANG
>>> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses
>>> codepage that might be valuable to enable even for applications that
>>> aren't native language aware and don't call setlocale()?
>>
>> Not exactly.  However, assuming you have a file using characters which
>> are not in your current ANSI codeset, then you could only manipulate
>> that file when setting LANG="xx_YY.UTF-8", and only in applications
>> which call setlocale().
>
> I have no idea whether du calls setlocale() or not. I think you're  
> saying that today, with codepage:utf8, it is able to get sizes for files  
> using non-ANSI characters, but if codepage is removed, it would not be  
> able to do so unless it called setlocale(). Is that right?

Right.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019