www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/02/02/18:04:04

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,TW_WW
X-Spam-Check-By: sourceware.org
X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ==
X-RZG-CLASS-ID: mo00
From: Bruno Haible <bruno AT clisp DOT org>
To: bug-gnulib AT gnu DOT org
Subject: Re: 16-bit wchar_t on Windows and Cygwin
Date: Thu, 3 Feb 2011 00:03:45 +0100
User-Agent: KMail/1.9.9
Cc: Eric Blake <eblake AT redhat DOT com>, cygwin AT cygwin DOT com
References: <201101310304 DOT 42975 DOT bruno AT clisp DOT org> <201102021229 DOT 04623 DOT bruno AT clisp DOT org> <4D49CB7C DOT 5040000 AT redhat DOT com>
In-Reply-To: <4D49CB7C.5040000@redhat.com>
MIME-Version: 1.0
Message-Id: <201102030003.46763.bruno@clisp.org>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Hello Eric,

> > Here's a new proposal:
> >   - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t
> >     on Windows platforms and to 'wchar_t' otherwise.
> >   - Define functions 'mbrtowwc', 'iswwalpha', 'wwcwidth', and similar.
> >     Their definition will be a trivial redirection to 'mbrtowc', 'iswalpha',
> >     'wcwidth' on most platforms, and a use of libunistring modules on
> >     Windows platforms.
> ...
> Are you thinking of making a sane wrapping around either 4-byte wchar_t
> or which maps to 2-byte wchar_t but sanely handles UTF-16 (which makes
> it a thin wrapper on both Linux and Cygwin, but needing more work on
> mingw), or are you thinking that it is always a 4-byte type (needing
> lots more memory manipulation on cygwin to convert between 2- and 4-byte
> representations when using cygwin's functions, or else reimplementing
> everything from scratch by completely bypassing cygwin)?

I'm not sure I understand your question. The plan is that

  - On platforms with a 32-bit wchar_t, like glibc, *BSD, and many others,
    'wwchar_t' is identical to 'wchar_t', and the function wrappers are
    simple redirections.

  - On Cygwin and mingw, wwchar_t is 'uint32_t' (so as to accommodate
    all Unicode characters and WEOF and so that it plays well with 'wint_t').
    mbrtowwc is implemented by 1 or 2 calls to mbrtowc. mbsrtowwcs may be
    implemented by a call to mbsrtowcs and an additional conversion loop,
    or it might be implemented on top of mbrtowwc; that's merely a speed
    vs. memory trade-off.
    The plan is not to "completely bypassing cygwin", but to use as much
    of Cygwin's built-ins as makes sense.

  - On platforms with a 16-bit wchar_t but where the wchar_t[] encoding
    in Unicode locales is merely UCS-2, like AIX, use the no-op thin
    wrappers as well. If the platform does not support more than the BMP,
    it makes not much sense for GNU programs to try to work around that.

> As to the name: I agree the opinion of others that xchar_t is easier to
> type and easier to avoid typos of a missing 'w' than wwchar_t.

If a developer makes a typo here, he's likely to get a gcc warning or
a link error. But yes, it's possible to pass a 'wwchar_t' to
iswalpha(), which will yield wrong results. I don't think this risk
can be much reduced through a different name.

> gnulib already has xprintf as a counterpart to xmalloc (which calls
> exit() if the printf fails for memory allocation or other non-I/O
> related reasons), so we can't blindly use 'x'

Good point. The 'x' prefix has already several meanings in gnulib:
  - checking against memory allocation failure,
  - checking against errors,
  - no size limitation,
  - a more convenient interface,
  - a wrapper that prints an error message.
It doesn't seem wise to add another meaning to it.

Thanks for the feedback.

-- 
In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019