X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_WW,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Message-ID: <4D49E68C.2030509@redhat.com> Date: Wed, 02 Feb 2011 16:19:40 -0700 From: Eric Blake User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 MIME-Version: 1.0 To: Bruno Haible CC: bug-gnulib AT gnu DOT org, cygwin AT cygwin DOT com Subject: Re: 16-bit wchar_t on Windows and Cygwin References: <201101310304 DOT 42975 DOT bruno AT clisp DOT org> <201102021229 DOT 04623 DOT bruno AT clisp DOT org> <4D49CB7C DOT 5040000 AT redhat DOT com> <201102030003 DOT 46763 DOT bruno AT clisp DOT org> In-Reply-To: <201102030003.46763.bruno@clisp.org> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigC14C548A4E5274D5FBD17E2B" X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com --------------enigC14C548A4E5274D5FBD17E2B Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 02/02/2011 04:03 PM, Bruno Haible wrote: >> Are you thinking of making a sane wrapping around either 4-byte wchar_t >> or which maps to 2-byte wchar_t but sanely handles UTF-16 (which makes >> it a thin wrapper on both Linux and Cygwin, but needing more work on >> mingw), or are you thinking that it is always a 4-byte type (needing >> lots more memory manipulation on cygwin to convert between 2- and 4-byte >> representations when using cygwin's functions, or else reimplementing >> everything from scratch by completely bypassing cygwin)? >=20 > I'm not sure I understand your question. The plan is that >=20 > - On platforms with a 32-bit wchar_t, like glibc, *BSD, and many others, > 'wwchar_t' is identical to 'wchar_t', and the function wrappers are > simple redirections. >=20 > - On Cygwin and mingw, wwchar_t is 'uint32_t' (so as to accommodate > all Unicode characters and WEOF and so that it plays well with 'wint_= t'). > mbrtowwc is implemented by 1 or 2 calls to mbrtowc. mbsrtowwcs may be > implemented by a call to mbsrtowcs and an additional conversion loop, > or it might be implemented on top of mbrtowwc; that's merely a speed > vs. memory trade-off. > The plan is not to "completely bypassing cygwin", but to use as much > of Cygwin's built-ins as makes sense. You answered my question in spite of myself. I was asking: should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but unlike the POSIX definition of wchar_t being always 1 character per unit, the new type is explicitly documented as being multi-unit on some platforms but with sane semantics or should it always be 4-bytes, where conversion from wchar_t to wwchar_t requires some efforts, and where the new type must be used everywhere (which means wrapping a lot of APIs), but where you can once again assume POSIX semantics of 1 character per unit, simplifying life of callers at the expense of converting to the new type And on asking the question in those more detailed words, I agree with your conclusion - on cygwin, wwchar_t should be 4 bytes. >=20 > - On platforms with a 16-bit wchar_t but where the wchar_t[] encoding > in Unicode locales is merely UCS-2, like AIX, use the no-op thin > wrappers as well. If the platform does not support more than the BMP, > it makes not much sense for GNU programs to try to work around that. Agreed. Next question/thought. Gnulib should definitely tackle this first. But if it works out, should we also add wwchar_t natively into cygwin? It would certainly be easier to add new interfaces incrementally, in preparation for a possible future ABI conversion to make wchar_t become 4 bytes. --=20 Eric Blake eblake AT redhat DOT com +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enigC14C548A4E5274D5FBD17E2B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJNSeaMAAoJEKeha0olJ0Nq77cIAJ3N82aL/xObGKJMlTKY/rfJ fs5MT2K3hic8KqqNSnZzhO7fQwKqnvO3Hun6PiyyOljqx4h3SMZe4nna/fq4KEEP ncWgOwlASapf4rJRdir9H8RC1Tj5i8aOrmyVoMxID+U3FcY4AVxaktHjc86lRK/V bStpl2Ev2lp97J2xXsbm5hVHe+6j+R6JoZJCZUu3clHU/3G0WSGwT5NoRS8lodTw PqvFfdqpXjmy2OseAtDnVx5PictGxq6TtaLiFcHIcoCpypxvVegsNAWDAf1rTv8F jWsgGapfGYdR4Ob7e6jcCVwdEeuSeIT21nqBa1fp2/MfaKcWXf2DBeITTWoKbVk= =NCi/ -----END PGP SIGNATURE----- --------------enigC14C548A4E5274D5FBD17E2B--