X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Thu, 24 Sep 2009 09:34:41 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: The C locale Message-ID: <20090924073441.GA30267@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <416096c60908300959i1e0084b1xc8f6e65e792b035d AT mail DOT gmail DOT com> <20090831005258 DOT GG2068 AT ednor DOT casa DOT cgf DOT cx> <416096c60909012329l2f25e735yc07145b8d6698cda AT mail DOT gmail DOT com> <3f0ad08d0909020656v7d9fce6ft4afea63ed363b9a9 AT mail DOT gmail DOT com> <416096c60909071308qc5ff057sbe9cb1dbc270554f AT mail DOT gmail DOT com> <20090908193456 DOT GC17515 AT calimero DOT vinschen DOT de> <416096c60909081449r1fe024dbm7b82a3719be05e9e AT mail DOT gmail DOT com> <20090921103758 DOT GE20981 AT calimero DOT vinschen DOT de> <416096c60909211420g4ac8ea93l80fc1f00dcd5c0f3 AT mail DOT gmail DOT com> <3f0ad08d0909240003j435818e7h6f7cde2e26188f7e AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f0ad08d0909240003j435818e7h6f7cde2e26188f7e@mail.gmail.com> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Sep 24 16:03, IWAMURO Motonori wrote: > 2009/9/22 Andy Koppe : > > Let's use the Windows "ANSI" codepage as the character set for the C > > locale, for both the conversion functions and filenames. This means > > CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese > > ones, and so on. > > I oppose the approach (the ANSI codepage is used at C locale) because > CP932 (the codepage for Japanese) is hostile to the UNIX-like tools. > > The reason is that the CP932 format contains a lot of meta characters > as follows. > > single character of CP932: > /[\x00-\x7F\xA0-\xDF]|[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC]/ I don't understand. Are you saying that the single character in CP932 consists of 12 bytes? As far as I can see, CP932 is S-JIS, which is a just a simple double byte character set. What am I missing. > This has a ruined influence to the tools that don't see locale. Can you please try to explain the problem in a bit more detail for those of us not fluent in eastern asian languages? What do you mean with "hostile" and "ruined influence"? Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple