X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <200906051625.n55GP6t3028411@mail.bln1.bf.nsn-intra.net> References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <416096c60905120956n5521929bm69586f5e6325a994 AT mail DOT gmail DOT com> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905140858j17c7b374paa649f18ef18178d AT mail DOT gmail DOT com> <200905201652 DOT n4KGqYGm000509 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <200906051625 DOT n55GP6t3028411 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> Date: Sat, 6 Jun 2009 21:21:51 +0900 Message-ID: <3f0ad08d0906060521w13c096bcw570436a2c3c9f2b3@mail.gmail.com> Subject: Re: [Fwd: [1.7] wcwidth failing configure tests] From: IWAMURO Motonori To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com # Continuation of discussion. # # I hope that all the applications work correctly only by setting "LANG=ja_JP.UTF-8". # I don't hope that I give up the use of the binary packages and that I keep applying many local patches. > I don't think that it is the good idea because: > > - It is "a cygwin-specific solution (or workaround)". > - In NetBSD, the change to which wcwidth of East Asian Ambiguous Characters returns 2 by CJK locale is planned. - and, I don't think that I need make special cases give priority more than general cases. >> - I heard that there is an existing implementation that behave like my >> proposal. (Sorry, I didn't hear the system name.) > Even if so, I think the way I described is more compatible with the locale > mechanism as used elsewhere. I think that ALL locale implementations should treat East Asian Ambiguous Character Width as 2 for CJK locale. >> It is no problem because we -- most Japanese language users -- need >> not change the settings of mintty and locale after first setup. >> We set LANG=ja_JP.UTF-8 and select a Japanese font for mintty. > In any case, mined running in mintty will detect CJK width itself, > regardless of locale setting, with coming versions of both programs > even when it gets changed on-the-fly :) Sorry, I can't understand above because I am not good at English. > This sounds complicated. I don't think so. I think that we should consider the following issues if a new mechanism is introduced. The existing locale / terminal API don't support: - Unicode BiDi. - Unicode control characters. - Unicode combining characters. - Multilingualization. (*) - Detect font/fontset information selected with terminal emulator. (including, need to consider the case of no-tty) * Now, we can't use Japanese, Chinese, and Korean at the same time even if we use Unicode. Because many font glyphs are quite different even if the code point is the same in each language. > With my proposal, an application that wishes to auto-adjust on width > properties (maybe even when changing) and which (unlike mined) uses > the system wcwidth functions could proceed as follows: > * Detect CJK width by using a simple test string width detection. > * (Optional) When receiving a SIGWINCH signal (future version of MinTTY), > repeat this detection. > * If e.g. LC_CTYPE starts with "ja_JP.UTF-8", call setlocale with > either "ja_JP DOT UTF-8 AT cjkwidth" or "ja_JP.UTF-8". How to detect it? The application using wcwidth is not necessarily executed with terminal emulator. (e.g. text formatter) >> > I'm not happy with the idea of a cygwin-specific solution (or workaround). >> I think that it is not cygwin-specific solution. > As I tried to suggest above, using "UTF-8" for different width data on one > system would be quite specific, using the "@" modifier syntax would not. "UTF-8" is only an encoding scheme. It does not specify the character width. -- IWAMURO Motnori -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/