X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Thu, 19 Mar 2009 20:20:31 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting? Message-ID: <20090319192031.GB9322@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> <49C29366 DOT 8080708 AT acm DOT org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49C29366.8080708@acm.org> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Mar 19 11:48, David Rothenberger wrote: > On 3/19/2009 11:13 AM, Corinna Vinschen wrote: >> On Mar 19 10:33, David Rothenberger wrote: >>> On 3/19/2009 6:09 AM, Corinna Vinschen wrote: >>>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8 >>>> charset *iff* the application switched the codepage by calling something >>>> along the lines of `setlocale(LC_ALL, "");'. >>>> An application which does not call setlocale (which means, it's not >>>> native language aware anyway) would still use the default ANSI codepage. >>> >>> I ran into an issue yesterday where I was trying to "du -sh" a directory >>> that contained files whose names included UTF characters, I think. >>> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added >>> CYGWIN=codepage:utf8. >> >> Yes, sure. As described in the User's Guide. That's exactly what bugs >> me right now. To get UTF-8 support you have to set LANG or LC_ALL or >> whatever, *and* CYGWIN=codepage:utf8. > > In my specific case, I didn't need to set LANG or LC_ALL, just > CYGWIN=codepage:utf8. Yes, sure. LANG and freinds are used in the locale-specific functions in newlib, codepage:xxx is used in Cygwin. Your case is only a case of converting filenames from UTF-16 to some multipbyte charset. That conversion is using the codepage:xxx right now. Every other multibyte/ wide character stuff in the application is controlled by setlocale, though. >>> So my question is, will this work if codepage is dropped and I set LANG >>> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses >>> codepage that might be valuable to enable even for applications that >>> aren't native language aware and don't call setlocale()? >> >> Not exactly. However, assuming you have a file using characters which >> are not in your current ANSI codeset, then you could only manipulate >> that file when setting LANG="xx_YY.UTF-8", and only in applications >> which call setlocale(). > > I have no idea whether du calls setlocale() or not. I think you're > saying that today, with codepage:utf8, it is able to get sizes for files > using non-ANSI characters, but if codepage is removed, it would not be > able to do so unless it called setlocale(). Is that right? Right. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/