www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/04/06/07:05:22

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Mon, 6 Apr 2009 13:04:57 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: [1.7] Support for CJK Character Sets
Message-ID: <20090406110457.GA4134@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <20090403173212 DOT 51916 DOT qmail AT web4102 DOT mail DOT ogk DOT yahoo DOT co DOT jp>
MIME-Version: 1.0
In-Reply-To: <20090403173212.51916.qmail@web4102.mail.ogk.yahoo.co.jp>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Apr  4 02:32, neomjp wrote:
> LANG=en_US.ISO-2022-JP
> 0000000  71  71  0e  e4  b8  80  0a
>           q   q  so   d   8 nul  nl
> 0000007
> This must be identical to:
> 0000000  71  71  1b  24  42  30  6c  1b  28  42  0a
>           q   q esc   $   B   0   l esc   (   B  nl
> 0000013

After some long mulling over this problem, I gave up on supporting JIS.
It's an incredible dumb character set, using escape sequences so that
you have to know the current state at every stage of the conversion.
This isn't easily doable using the underlying Win32 functions, at least
not in a way which is compatible with the equivalent POSIX functions.

So, I removed JIS support from Cygwin again. Given that SJIS and eucJP
are both available, this shoudn't pose a big problem for Japanese users.

> LANG=en_US.eucJP
> 0000000  71  71  0e  e4  b8  80  0a
>           q   q  so   d   8 nul  nl
> 0000007
> This must be identical to:
> 0000000  71  71  b0  ec  0a
>           q   q   0   l  nl
> 0000005

This is fixed now in CVS.  The fact that eucJP knows triplebyte
sequences but the Windows eucJP codepage 20932 does not (converting
these widechars to incompatible doublebyte sequences instead) was not
really helpful but as far as I can test it, it appears to work now.

Please note that eucJP does not work by default on Windows XP and
earlier OSes!  At least not on the so-called "western languages"
installations, US, French, Italian, whatever.  The reason is that the
codepage 20932 is not installed by default.  You can easily install it,
though, in the "Regional and Language Options" control panel -> Advanced
-> Code page conversion tables.  Just click on codepage "20932 (JIS X
0208-1990 & 0212-1990)" and have your XP installation disk ready.

So, if you're running XP or earlier, unless you installed CP 20932,
eucJP support in Cygwin is as broken as in the underlying Windows.


Thanks for this report,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019