X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 12 Jun 2009 18:42:30 +0200 From: Corinna Vinschen To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com Subject: Re: [Fwd: [1.7] wcwidth failing configure tests] Message-ID: <20090612164230.GG5039@calimero.vinschen.de> Mail-Followup-To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <416096c60905120956n5521929bm69586f5e6325a994 AT mail DOT gmail DOT com> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905140858j17c7b374paa649f18ef18178d AT mail DOT gmail DOT com> <200905201652 DOT n4KGqYGm000509 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <200906051625 DOT n55GP6t3028411 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <3f0ad08d0906060242t275a78e7tb9913bf78d1c5e83 AT mail DOT gmail DOT com> <200906121538 DOT n5CFcSld014997 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200906121538.n5CFcSld014997@mail.bln1.bf.nsn-intra.net> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jun 12 17:38, Thomas Wolff wrote: > IWAMURO Motonori wrote to me by private mail: > > I oppose your proposal because I think that it is useless for us. > > > > 2009/6/6 Thomas Wolff : > >> the intention is that the "codepage" information should be the same > >> for all locales having thbe "UTF-8" (or any other) charmap. So you > >> cannot freely change width information among locales with the same > >> charmap. > > > > I don't think that there is such a restriction. > > The standard of the character doesn't provide for the width of the > > character as a standard. > I'm not sure which "standard" you are referring to. The problem appears to be that there is no standard for the handling of ambiguous characters. > I have checked source data files in /usr/share/i18n/charmaps on my Linux system, e.g. "UTF-8.gz". > These files are used when creating a new locale with the "localedef" command. > They contain not only the mapping but also (by the end of the file) a > list of combining and double-width characters. So obviously, even > stronger than I had argued, this would imply a scheme of predefined > character widths defined by each such "charmap", thus assuming that > character widths are the same for all locales with the same "charmap". I'm not sure the Linux solution is overly flexible. AFAICS, when using the UTF-8 charset, the ambiguous characters always have width 1. Only when switching to GB18030, the width of these chars is two. That seems to be a bit unsatisfying for CJK users. > >> Also, if ja_JP.UTF-8 would mean "CJK width", how would you specify a > >> working locale setting for a terminal that does not run a CJK width > >> font but should yet use other Japanese settings? E.g. with rxvt > >> which does not support CJK width. Wouldn't that be covered by using your own proposal just backwards? Define the default for ja, ko, and zh to use width = 2, with a @cjknarrow (or whatever) modifier to use width = 1. > The approach I've taken in mined is quite successful. The other > approach, via locale names, will also have limited success provided it > is taken "up-stream". Whatever "upstream" means. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/