Mail Archives: cygwin/2009/03/19/14:20:53
On Mar 19 11:48, David Rothenberger wrote:
> On 3/19/2009 11:13 AM, Corinna Vinschen wrote:
>> On Mar 19 10:33, David Rothenberger wrote:
>>> On 3/19/2009 6:09 AM, Corinna Vinschen wrote:
>>>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8
>>>> charset *iff* the application switched the codepage by calling something
>>>> along the lines of `setlocale(LC_ALL, "");'.
>>>> An application which does not call setlocale (which means, it's not
>>>> native language aware anyway) would still use the default ANSI codepage.
>>>
>>> I ran into an issue yesterday where I was trying to "du -sh" a directory
>>> that contained files whose names included UTF characters, I think.
>>> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added
>>> CYGWIN=codepage:utf8.
>>
>> Yes, sure. As described in the User's Guide. That's exactly what bugs
>> me right now. To get UTF-8 support you have to set LANG or LC_ALL or
>> whatever, *and* CYGWIN=codepage:utf8.
>
> In my specific case, I didn't need to set LANG or LC_ALL, just
> CYGWIN=codepage:utf8.
Yes, sure. LANG and freinds are used in the locale-specific functions
in newlib, codepage:xxx is used in Cygwin. Your case is only a case
of converting filenames from UTF-16 to some multipbyte charset. That
conversion is using the codepage:xxx right now. Every other multibyte/
wide character stuff in the application is controlled by setlocale,
though.
>>> So my question is, will this work if codepage is dropped and I set LANG
>>> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses
>>> codepage that might be valuable to enable even for applications that
>>> aren't native language aware and don't call setlocale()?
>>
>> Not exactly. However, assuming you have a file using characters which
>> are not in your current ANSI codeset, then you could only manipulate
>> that file when setting LANG="xx_YY.UTF-8", and only in applications
>> which call setlocale().
>
> I have no idea whether du calls setlocale() or not. I think you're
> saying that today, with codepage:utf8, it is able to get sizes for files
> using non-ANSI characters, but if codepage is removed, it would not be
> able to do so unless it called setlocale(). Is that right?
Right.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -