www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/24/11:14:34

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Date: Thu, 24 Sep 2009 17:13:58 +0200 (CEST)
Message-Id: <200909241513.n8OFDw8i010703@mail.bln1.bf.nsn-intra.net>
From: Thomas Wolff <towo AT towo DOT net>
To: cygwin AT cygwin DOT com
Subject: Re: non-BMP character width
References: <200909161148 DOT n8GBm4ha001469 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <20090921163348 DOT GL20981 AT calimero DOT vinschen DOT de> <h98b17$jbj$1 AT ger DOT gmane DOT org> <20090921175759 DOT GM20981 AT calimero DOT vinschen DOT de> <4AB8592F DOT 9060803 AT lapo DOT it>
MIME-Version: 1.0
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:
> Can you please create a simple self-contained testcase?  I'm not exactly
> sure how this is supposed to work and if a solution exists.  Is that a
> problem for the non-UTF-8 case, too, or for UTF-8 only?

Sorry for the late response; I see you reproduced the case meanwhile -
anyway, here is a test case, to be used with gcc or just with cat:

/* print U+20000 ð €€ */
int main () {
  printf ("<U+20000> is <ð €€>\n");
}

where you could enter the character in mined with Control-V #20000 Enter :)

About non-UTF-8, I tried to test in Big5, using character 0x8750 which is U+242BF,
and the test suggests it's OK (in cygwin console, mintty, and rxvt-unicode); 
however, that may not be significant since although its Unicode code 
point is non-BMP, the Big5 character is only 16 bits and Windows, 
having supported CJK before Unicode, probably doesn't handle this via Unicode.
I also tried to test eucJP, but that doesn't seem to work at all and mintty crashes...

See my other comment below, please.


On Sep 22 06:57, Lapo Luchini wrote:
> ...
> Actually, I can't reproduce that, but I guess it's a problem of the
> specific console he's using (Thomas, which one is that?): on mintty it
> works ok (I'm not really sure it outputs U+10001, but it surely shows a
> single box)...
The problem used to be in mintty as well until I pointed it out and 
Andy was so ambitious to find a workaround - maybe he could supply a 
code snipplet which would fix this in the cygwin console too, despite 
the bug origin being in the Windows API...

> and on rxvt it just shows as four ISO-8859-1 chars:
> (es expected, as native rxvt doesn't support Unicode)
You would have to test this with rxvt-unicode (urxvt in cygwin) 
where the test case passes (one box). (Not very relevant maybe, 
if reports are true that rxvt is not maintained anymore.)

Corinna wrote:
> > ...
> Uh, I see.  That occurs in the normal Windows console.  This is not
> Cygwin's fault.  Cygwin's console code converts the multibyte string to
> the WCHAR representation and prints it to the console using the
> WriteConsoleW function.  That function prints two blocks/question marks
> for a surrogate pair.  Look at the file in a cmd shell, it will also
> print two blocks/question marks for the surrogate pair.
I was assuming that, like for mintty, the fault was not in the cygwin domain, 
however, as there is a workaround, I thought it would be nice for the cygwin 
console as well.

Kind regards,
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019