DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 651I16M43737549 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 651I16M43737549 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=k3grX6sV X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 127244BA2E2D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1780336865; bh=B+2j8nmgXb+BHT3gle639sEeSMZDQNX3UGWIfY4LWmU=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=k3grX6sV+Vc86bHUjTbLOallJJPm2beJ8mCrg4MRE9OVP1dMjGY1XUdOfp8koAC8R 28IHrtSx3hIoSQKB18ppErv1IgYDX2H6B+hwh4/YLGnCWyk0n4JrnPbhc7c7ZXSHPq CPj6TP/Upb8cHnPXPQ/qfdM4PM5WukQi6mRNFxnw= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 496B94BA2E0E ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 496B94BA2E0E ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1780336817; cv=none; b=rECU20wvFmF7jeHRimYViJtN1aFE1JlfQLnG48KDmvWdu4l0uBUPlK6jtFwgRZnb+Z9qNUn/swNjvfZ6OF147d999PZDbe42PSnkr663uYTiD7kH/dUCDRth2q1a6T20lwE/zhTkQwc3GwtyZlrP+HWf9CG9KH8oN/9qij7QyiA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1780336817; c=relaxed/simple; bh=5RHJzUCxBC+McCfZKwj5RyO0guXpOjyiRKbSHoqHyKw=; h=DKIM-Signature:Subject:To:From:Message-ID:Date:MIME-Version; b=tIkK8QYKGazPRGQlOLvx+milvl7xSoEBIvz2pDq07Tqx9GYWbYOmEkoilPdwsQwS5stDc8K1ILGSkxvRzV+/FcTWjXEY9jd1XvMuQf8q2WTWRd/thde2V318rnPZkQsEjLHbwOE8dR9PId42XwBw7yr4dcbPRQ7YdoNryC8WmNE= ARC-Authentication-Results: i=1; sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=wisemo.com header.i=@wisemo.com header.a=rsa-sha256 header.s=v2016 header.b=AwPC6kfB DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 496B94BA2E0E Subject: Re: Thoughts on the wcwidth confusion To: cygwin AT cygwin DOT com References: <19c6f9b4-5f09-6929-891c-d25ebe48af82 AT wisemo DOT com> <4100a583-7419-4bb6-bd19-aee154dbec3b AT towo DOT net> Organization: WiseMo A/S Message-ID: <69aef838-7c9d-5c39-a8df-3fc8c49ff37d@wisemo.com> Date: Mon, 1 Jun 2026 20:00:14 +0200 X-Mailer: Epyrus/2.2.0 MIME-Version: 1.0 In-Reply-To: <4100a583-7419-4bb6-bd19-aee154dbec3b@towo.net> Content-Language: en-US X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Jakob Bohm via Cygwin Reply-To: Jakob Bohm Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 651I16M43737549 On 01/06/2026 18:12, Thomas Wolff via Cygwin wrote: > > Am 01.06.2026 um 17:59 schrieb Thomas Wolff via Cygwin: >> Am 01.06.2026 um 17:34 schrieb Jakob Bohm via Cygwin: >>> Dear list, >>> >>> Having read through the recent debate around the wcwidth() POSIX API, >>> wchar_t definitions, gcc-16 and cygwin, I have an idea not >>> mentioned in the list so far: >>> >>> Using C17 types char32_t and char16_t, the situation can be >>> summarized as follows: >>> >>> - Many, but not all POSIX systems define wchar_t as char32_t and thus >>> wint_t as uint_least32_t >>> >>> - Win32 and thus Cygwin defines wchar_t as char16_t and thus wint_t as >>> uint_least16_t >>> >>> - All systems considered treat wchar_t as unicode, with Win32 >>> supporting >>>  UTF-16 since the NT 5.00 (Windows 2000). >>> >>> - For char16_t/UTF-16, wcwidth() should use the high surrogate to >>>  determine the range of unicode symbols and return a width common to >>>  that range, then return 0 for the low surrogates, thereby allowing >>>  computation of string width without having to first assemble >>> surrogates >>>  into full char32_t values.  Deciding if char32_t implementations >>> should >>>  still lump groups of 4 Unicode rows for UTF-16 compatibility is up to >>>  each implementation. >> It's a neat idea to split the width calculation over the surrogates. >> Unfortunately it does not work this way because widthness does not >> change in full 1024-byte blocks. For example, U+1F4FC is Wide, >> U+1F4FD and U+1F4FE are narrow/Neutral (N), and U+1F4FF is W again. >> As a variant of your idea, wcwidth could return width 1 for every >> high surrogate, remember it, and if the subsequent invocation is a >> low surrogate, determine the combined width and return either 1 or 0. >> Not quite standard behaviour, I suspect, so maybe not a good idea for >> the purists, but maybe worth some discussion. > On the other hand, there are also combining characters in the non-BMP, > so the only way this could work is width 0 for high surrogates, then > sum up to the actual width on the low surrogate. Leaving the question > how to handle an (errorneously) single high surrogate... > If using this "hidden state" concept, the big question is how to handle a single or out-of-sync low surrogate in wcwidth().  For wcswidth(), the full context is always available and lone surrogates will be no different than other invalid chars such as U+1FFFFE . >>> >>> A practical solution would be for Cygwin/newlib to provide new >>> functions >>> c16width(), c32width(), c16swidth() and c32swidth(), each being the >>> explicit size equivalants of their wc and wcs similarly named >>> functions. >>> >>> Then wcwidth() can be a trivial inline alias of the explicit size >>> equivalent for the compile target by having the newlib header >>> checking a >>> compiler or standard define indicating the chosen size of wchar_t. >>> >>> // possible wchar.h snippet >>> // >>> // C17+ required >>> // For C2Y+ this should go in uchar.h >>> // >>> int c16width(char16_t c); >>> int c32width(char32_t c); >>> int c16swidth(const char16_t *s, size_t n); >>> int c32swidth(const char32_t *s, size_t n); >>> >>> // ... >>> >>> // This belongs in wchar.h for C1x- compat >>> // >>> #if SOMETHING_MEANING_16bit_WCHAR_T >>> inline int wcwidth(wchar_t c) { >>>   return c16width(c); >>> } >>> inline int wcswidth(const wchar_t *s, size_t n) >>> { >>>   return c16swidth(s, n); >>> } >>> #else >>> inline int wcwidth(wchar_t c) { >>>   return c32width(c); >>> } >>> inline int wcswidth(const wchar_t *s, size_t n) >>> { >>>   return c32swidth(s, n); >>> } >>> #endif >>> >>> >>> Enjoy >>> >>> Jakob >> >> > > Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple