| www.delorie.com/archives/browse.cgi | search |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 651I16M43737549 |
| Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
| Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 651I16M43737549 |
| Authentication-Results: | delorie.com; |
| dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=k3grX6sV | |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 127244BA2E2D |
| DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
| s=default; t=1780336865; | |
| bh=B+2j8nmgXb+BHT3gle639sEeSMZDQNX3UGWIfY4LWmU=; | |
| h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: | |
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: | |
| From; | |
| b=k3grX6sV+Vc86bHUjTbLOallJJPm2beJ8mCrg4MRE9OVP1dMjGY1XUdOfp8koAC8R | |
| 28IHrtSx3hIoSQKB18ppErv1IgYDX2H6B+hwh4/YLGnCWyk0n4JrnPbhc7c7ZXSHPq | |
| CPj6TP/Upb8cHnPXPQ/qfdM4PM5WukQi6mRNFxnw= | |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 496B94BA2E0E |
| ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 496B94BA2E0E |
| ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1780336817; cv=none; |
| b=rECU20wvFmF7jeHRimYViJtN1aFE1JlfQLnG48KDmvWdu4l0uBUPlK6jtFwgRZnb+Z9qNUn/swNjvfZ6OF147d999PZDbe42PSnkr663uYTiD7kH/dUCDRth2q1a6T20lwE/zhTkQwc3GwtyZlrP+HWf9CG9KH8oN/9qij7QyiA= | |
| ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
| t=1780336817; c=relaxed/simple; | |
| bh=5RHJzUCxBC+McCfZKwj5RyO0guXpOjyiRKbSHoqHyKw=; | |
| h=DKIM-Signature:Subject:To:From:Message-ID:Date:MIME-Version; | |
| b=tIkK8QYKGazPRGQlOLvx+milvl7xSoEBIvz2pDq07Tqx9GYWbYOmEkoilPdwsQwS5stDc8K1ILGSkxvRzV+/FcTWjXEY9jd1XvMuQf8q2WTWRd/thde2V318rnPZkQsEjLHbwOE8dR9PId42XwBw7yr4dcbPRQ7YdoNryC8WmNE= | |
| ARC-Authentication-Results: | i=1; sourceware.org; |
| dkim=pass (2048-bit key, unprotected) | |
| header.d=wisemo.com header.i=@wisemo.com header.a=rsa-sha256 header.s=v2016 | |
| header.b=AwPC6kfB | |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 496B94BA2E0E |
| Subject: | Re: Thoughts on the wcwidth confusion |
| To: | cygwin AT cygwin DOT com |
| References: | <19c6f9b4-5f09-6929-891c-d25ebe48af82 AT wisemo DOT com> |
| <b5b175eb-1225-43cf-bda1-eef0cc9cff78 AT towo DOT net> | |
| <4100a583-7419-4bb6-bd19-aee154dbec3b AT towo DOT net> | |
| Organization: | WiseMo A/S |
| Message-ID: | <69aef838-7c9d-5c39-a8df-3fc8c49ff37d@wisemo.com> |
| Date: | Mon, 1 Jun 2026 20:00:14 +0200 |
| X-Mailer: | Epyrus/2.2.0 |
| MIME-Version: | 1.0 |
| In-Reply-To: | <4100a583-7419-4bb6-bd19-aee154dbec3b@towo.net> |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.30 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| From: | Jakob Bohm via Cygwin <cygwin AT cygwin DOT com> |
| Reply-To: | Jakob Bohm <jb-cygwin AT wisemo DOT com> |
| Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 651I16M43737549 |
On 01/06/2026 18:12, Thomas Wolff via Cygwin wrote:
>
> Am 01.06.2026 um 17:59 schrieb Thomas Wolff via Cygwin:
>> Am 01.06.2026 um 17:34 schrieb Jakob Bohm via Cygwin:
>>> Dear list,
>>>
>>> Having read through the recent debate around the wcwidth() POSIX API,
>>> wchar_t definitions, gcc-16 and cygwin, I have an idea not
>>> mentioned in the list so far:
>>>
>>> Using C17 types char32_t and char16_t, the situation can be
>>> summarized as follows:
>>>
>>> - Many, but not all POSIX systems define wchar_t as char32_t and thus
>>> wint_t as uint_least32_t
>>>
>>> - Win32 and thus Cygwin defines wchar_t as char16_t and thus wint_t as
>>> uint_least16_t
>>>
>>> - All systems considered treat wchar_t as unicode, with Win32
>>> supporting
>>> Â UTF-16 since the NT 5.00 (Windows 2000).
>>>
>>> - For char16_t/UTF-16, wcwidth() should use the high surrogate to
>>> Â determine the range of unicode symbols and return a width common to
>>> Â that range, then return 0 for the low surrogates, thereby allowing
>>> Â computation of string width without having to first assemble
>>> surrogates
>>>  into full char32_t values. Deciding if char32_t implementations
>>> should
>>> Â still lump groups of 4 Unicode rows for UTF-16 compatibility is up to
>>> Â each implementation.
>> It's a neat idea to split the width calculation over the surrogates.
>> Unfortunately it does not work this way because widthness does not
>> change in full 1024-byte blocks. For example, U+1F4FC is Wide,
>> U+1F4FD and U+1F4FE are narrow/Neutral (N), and U+1F4FF is W again.
>> As a variant of your idea, wcwidth could return width 1 for every
>> high surrogate, remember it, and if the subsequent invocation is a
>> low surrogate, determine the combined width and return either 1 or 0.
>> Not quite standard behaviour, I suspect, so maybe not a good idea for
>> the purists, but maybe worth some discussion.
> On the other hand, there are also combining characters in the non-BMP,
> so the only way this could work is width 0 for high surrogates, then
> sum up to the actual width on the low surrogate. Leaving the question
> how to handle an (errorneously) single high surrogate...
>
If using this "hidden state" concept, the big question is how to handle
a single or out-of-sync low surrogate in wcwidth(). For wcswidth(),
the full context is always available and lone surrogates will be no
different than other invalid chars such as U+1FFFFE .
>>>
>>> A practical solution would be for Cygwin/newlib to provide new
>>> functions
>>> c16width(), c32width(), c16swidth() and c32swidth(), each being the
>>> explicit size equivalants of their wc and wcs similarly named
>>> functions.
>>>
>>> Then wcwidth() can be a trivial inline alias of the explicit size
>>> equivalent for the compile target by having the newlib header
>>> checking a
>>> compiler or standard define indicating the chosen size of wchar_t.
>>>
>>> // possible wchar.h snippet
>>> //
>>> // C17+ required
>>> // For C2Y+ this should go in uchar.h
>>> //
>>> int c16width(char16_t c);
>>> int c32width(char32_t c);
>>> int c16swidth(const char16_t *s, size_t n);
>>> int c32swidth(const char32_t *s, size_t n);
>>>
>>> // ...
>>>
>>> // This belongs in wchar.h for C1x- compat
>>> //
>>> #if SOMETHING_MEANING_16bit_WCHAR_T
>>> inline int wcwidth(wchar_t c) {
>>> Â return c16width(c);
>>> }
>>> inline int wcswidth(const wchar_t *s, size_t n)
>>> {
>>> Â return c16swidth(s, n);
>>> }
>>> #else
>>> inline int wcwidth(wchar_t c) {
>>> Â return c32width(c);
>>> }
>>> inline int wcswidth(const wchar_t *s, size_t n)
>>> {
>>> Â return c32swidth(s, n);
>>> }
>>> #endif
>>>
>>>
>>> Enjoy
>>>
>>> Jakob
>>
>>
>
>
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |