X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Date: Thu, 24 Sep 2009 17:13:58 +0200 (CEST) Message-Id: <200909241513.n8OFDw8i010703@mail.bln1.bf.nsn-intra.net> From: Thomas Wolff To: cygwin AT cygwin DOT com Subject: Re: non-BMP character width References: <200909161148 DOT n8GBm4ha001469 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <20090921163348 DOT GL20981 AT calimero DOT vinschen DOT de> <20090921175759 DOT GM20981 AT calimero DOT vinschen DOT de> <4AB8592F DOT 9060803 AT lapo DOT it> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Corinna Vinschen wrote: > Can you please create a simple self-contained testcase? I'm not exactly > sure how this is supposed to work and if a solution exists. Is that a > problem for the non-UTF-8 case, too, or for UTF-8 only? Sorry for the late response; I see you reproduced the case meanwhile - anyway, here is a test case, to be used with gcc or just with cat: /* print U+20000 𠀀 */ int main () { printf (" is <𠀀>\n"); } where you could enter the character in mined with Control-V #20000 Enter :) About non-UTF-8, I tried to test in Big5, using character 0x8750 which is U+242BF, and the test suggests it's OK (in cygwin console, mintty, and rxvt-unicode); however, that may not be significant since although its Unicode code point is non-BMP, the Big5 character is only 16 bits and Windows, having supported CJK before Unicode, probably doesn't handle this via Unicode. I also tried to test eucJP, but that doesn't seem to work at all and mintty crashes... See my other comment below, please. On Sep 22 06:57, Lapo Luchini wrote: > ... > Actually, I can't reproduce that, but I guess it's a problem of the > specific console he's using (Thomas, which one is that?): on mintty it > works ok (I'm not really sure it outputs U+10001, but it surely shows a > single box)... The problem used to be in mintty as well until I pointed it out and Andy was so ambitious to find a workaround - maybe he could supply a code snipplet which would fix this in the cygwin console too, despite the bug origin being in the Windows API... > and on rxvt it just shows as four ISO-8859-1 chars: > (es expected, as native rxvt doesn't support Unicode) You would have to test this with rxvt-unicode (urxvt in cygwin) where the test case passes (one box). (Not very relevant maybe, if reports are true that rxvt is not maintained anymore.) Corinna wrote: > > ... > Uh, I see. That occurs in the normal Windows console. This is not > Cygwin's fault. Cygwin's console code converts the multibyte string to > the WCHAR representation and prints it to the console using the > WriteConsoleW function. That function prints two blocks/question marks > for a surrogate pair. Look at the file in a cmd shell, it will also > print two blocks/question marks for the surrogate pair. I was assuming that, like for mintty, the fault was not in the cygwin domain, however, as there is a workaround, I thought it would be nice for the cygwin console as well. Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple