X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 20 Mar 2009 13:40:31 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting? Message-ID: <20090320124030.GM9322@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> <49C29366 DOT 8080708 AT acm DOT org> <20090319192031 DOT GB9322 AT calimero DOT vinschen DOT de> <20090319192229 DOT GC9322 AT calimero DOT vinschen DOT de> <20090319201144 DOT GE9322 AT calimero DOT vinschen DOT de> <20090319203046 DOT GF9322 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090319203046.GF9322@calimero.vinschen.de> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Mar 19 21:30, Corinna Vinschen wrote: > Here's another idea: > > If the codeset is not UTF-8, and if a filename contains wide chars not > representable in the current ANSI codeset, use the good old ASCII "SO/SI" > method. > > Example: Assuming the ANSI codepage is CP1252. Assuming the filename > is in UTF-16 > > /dir/to/foo\x1234bar > > All chars except for \x1234 are convertible to the current ANSI code > page. The convertible chars are converted as usual. The > non-convertible characters are converted to an ASCII SO/SI sequence: > > /dir/to/foo\x0e\x12\x34\x0fbar Of course this requires to convert the wchar to a utf-8 sequence. > On the way back, Cygwin converts SO/SI sequences back to their > UTF-16 counterpart and converts everything else using the current\ > codepage to UTF-16 conversion. > > This would allow to manipulate all files on the disk regardless of > using characters invalid in the current CP. > > Does that solution make sense? Apart from that I now proposed a change to newlib, so that setlocale on Cygwin always chooses the charset which is equivalent to the current ANSI codepage, if the charset is not given explicitely. The list of so far suported codepages is the one I posted in http://cygwin.com/ml/cygwin/2009-03/msg00693.html For instance, if you set $LANG to "de_DE", the charset will become CP1252, as is the default on german Windows systems. If you set $LANG to "de_DE.ISO-8859-15", you will get iso-8859-15 instead. Setting it to "de_DE.UTF-8" ... you get the idea. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/