www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2026/06/01/14:01:06

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 651I16M43737549
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 651I16M43737549
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=k3grX6sV
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 127244BA2E2D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1780336865;
bh=B+2j8nmgXb+BHT3gle639sEeSMZDQNX3UGWIfY4LWmU=;
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=k3grX6sV+Vc86bHUjTbLOallJJPm2beJ8mCrg4MRE9OVP1dMjGY1XUdOfp8koAC8R
28IHrtSx3hIoSQKB18ppErv1IgYDX2H6B+hwh4/YLGnCWyk0n4JrnPbhc7c7ZXSHPq
CPj6TP/Upb8cHnPXPQ/qfdM4PM5WukQi6mRNFxnw=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 496B94BA2E0E
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 496B94BA2E0E
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1780336817; cv=none;
b=rECU20wvFmF7jeHRimYViJtN1aFE1JlfQLnG48KDmvWdu4l0uBUPlK6jtFwgRZnb+Z9qNUn/swNjvfZ6OF147d999PZDbe42PSnkr663uYTiD7kH/dUCDRth2q1a6T20lwE/zhTkQwc3GwtyZlrP+HWf9CG9KH8oN/9qij7QyiA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1780336817; c=relaxed/simple;
bh=5RHJzUCxBC+McCfZKwj5RyO0guXpOjyiRKbSHoqHyKw=;
h=DKIM-Signature:Subject:To:From:Message-ID:Date:MIME-Version;
b=tIkK8QYKGazPRGQlOLvx+milvl7xSoEBIvz2pDq07Tqx9GYWbYOmEkoilPdwsQwS5stDc8K1ILGSkxvRzV+/FcTWjXEY9jd1XvMuQf8q2WTWRd/thde2V318rnPZkQsEjLHbwOE8dR9PId42XwBw7yr4dcbPRQ7YdoNryC8WmNE=
ARC-Authentication-Results: i=1; sourceware.org;
dkim=pass (2048-bit key, unprotected)
header.d=wisemo.com header.i=@wisemo.com header.a=rsa-sha256 header.s=v2016
header.b=AwPC6kfB
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 496B94BA2E0E
Subject: Re: Thoughts on the wcwidth confusion
To: cygwin AT cygwin DOT com
References: <19c6f9b4-5f09-6929-891c-d25ebe48af82 AT wisemo DOT com>
<b5b175eb-1225-43cf-bda1-eef0cc9cff78 AT towo DOT net>
<4100a583-7419-4bb6-bd19-aee154dbec3b AT towo DOT net>
Organization: WiseMo A/S
Message-ID: <69aef838-7c9d-5c39-a8df-3fc8c49ff37d@wisemo.com>
Date: Mon, 1 Jun 2026 20:00:14 +0200
X-Mailer: Epyrus/2.2.0
MIME-Version: 1.0
In-Reply-To: <4100a583-7419-4bb6-bd19-aee154dbec3b@towo.net>
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Jakob Bohm via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Jakob Bohm <jb-cygwin AT wisemo DOT com>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 651I16M43737549

On 01/06/2026 18:12, Thomas Wolff via Cygwin wrote:
>
> Am 01.06.2026 um 17:59 schrieb Thomas Wolff via Cygwin:
>> Am 01.06.2026 um 17:34 schrieb Jakob Bohm via Cygwin:
>>> Dear list,
>>>
>>> Having read through the recent debate around the wcwidth() POSIX API,
>>> wchar_t definitions, gcc-16 and cygwin, I have an idea not
>>> mentioned in the list so far:
>>>
>>> Using C17 types char32_t and char16_t, the situation can be
>>> summarized as follows:
>>>
>>> - Many, but not all POSIX systems define wchar_t as char32_t and thus
>>> wint_t as uint_least32_t
>>>
>>> - Win32 and thus Cygwin defines wchar_t as char16_t and thus wint_t as
>>> uint_least16_t
>>>
>>> - All systems considered treat wchar_t as unicode, with Win32 
>>> supporting
>>>  UTF-16 since the NT 5.00 (Windows 2000).
>>>
>>> - For char16_t/UTF-16, wcwidth() should use the high surrogate to
>>>  determine the range of unicode symbols and return a width common to
>>>  that range, then return 0 for the low surrogates, thereby allowing
>>>  computation of string width without having to first assemble 
>>> surrogates
>>>  into full char32_t values.  Deciding if char32_t implementations 
>>> should
>>>  still lump groups of 4 Unicode rows for UTF-16 compatibility is up to
>>>  each implementation.
>> It's a neat idea to split the width calculation over the surrogates. 
>> Unfortunately it does not work this way because widthness does not 
>> change in full 1024-byte blocks. For example, U+1F4FC is Wide, 
>> U+1F4FD and U+1F4FE are narrow/Neutral (N), and U+1F4FF is W again.
>> As a variant of your idea, wcwidth could return width 1 for every 
>> high surrogate, remember it, and if the subsequent invocation is a 
>> low surrogate, determine the combined width and return either 1 or 0.
>> Not quite standard behaviour, I suspect, so maybe not a good idea for 
>> the purists, but maybe worth some discussion.
> On the other hand, there are also combining characters in the non-BMP, 
> so the only way this could work is width 0 for high surrogates, then 
> sum up to the actual width on the low surrogate. Leaving the question 
> how to handle an (errorneously) single high surrogate...
>
If using this "hidden state" concept, the big question is how to handle
a single or out-of-sync low surrogate in wcwidth().  For wcswidth(),
the full context is always available and lone surrogates will be no
different than other invalid chars such as U+1FFFFE .
>>>
>>> A practical solution would be for Cygwin/newlib to provide new 
>>> functions
>>> c16width(), c32width(), c16swidth() and c32swidth(), each being the
>>> explicit size equivalants of their wc and wcs similarly named 
>>> functions.
>>>
>>> Then wcwidth() can be a trivial inline alias of the explicit size
>>> equivalent for the compile target by having the newlib header 
>>> checking a
>>> compiler or standard define indicating the chosen size of wchar_t.
>>>
>>> // possible wchar.h snippet
>>> //
>>> // C17+ required
>>> // For C2Y+ this should go in uchar.h
>>> //
>>> int c16width(char16_t c);
>>> int c32width(char32_t c);
>>> int c16swidth(const char16_t *s, size_t n);
>>> int c32swidth(const char32_t *s, size_t n);
>>>
>>> // ...
>>>
>>> // This belongs in wchar.h for C1x- compat
>>> //
>>> #if SOMETHING_MEANING_16bit_WCHAR_T
>>> inline int wcwidth(wchar_t c) {
>>>   return c16width(c);
>>> }
>>> inline int wcswidth(const wchar_t *s, size_t n)
>>> {
>>>   return c16swidth(s, n);
>>> }
>>> #else
>>> inline int wcwidth(wchar_t c) {
>>>   return c32width(c);
>>> }
>>> inline int wcswidth(const wchar_t *s, size_t n)
>>> {
>>>   return c32swidth(s, n);
>>> }
>>> #endif
>>>
>>>
>>> Enjoy
>>>
>>> Jakob
>>
>>
>
>
Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019