DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 4ANBNgM3773515 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4ANBNgM3773515 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=mM42FraO X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE60C3858019 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1732361021; bh=Tp9gzLvLh50KUs3wwZ2HgXnt3wiTx+W/FpYu9cZc9t0=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=mM42FraOhZxkEwUKDbc582vyBq451FW9JT04UWnqNBy3kLPOrVg001NLkpT6MKe7i do+8oL+coyX9DuQY7n+UFwHPrAFxHjHHeD3gMvvwm+mWPztxbd8W0Qr5Nfmz42beDt 04Yv4MxrWatabHfBILa3gbTqxsGIFPJNl+qJOPpQ= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4ED063858D37 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4ED063858D37 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732360955; cv=none; b=T0eJixm7JDA8Ca5rt4wqkJR/tCcQI42al+NbcqOIwXEgC32Ebbayhqv3JhPsJLBktcYx/BARQemTpqh6nRjBVKk73YVqXlB4caGtr0uhtejozwzNEtfyjw9h7707bdJQna1PhQlaR6xV0i1JdwhZgC33dfWxld5On4zb+iTX1RU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732360955; c=relaxed/simple; bh=/8sfXcHBJT0WZdbAdVTePw+/PBov9yfhWcpic7WnqNE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=cfgddsx7FwjWW3Djhtx3nc4Bgteb/MMz1zvmufZRi8e6Bm6O/jXtxX6u8hD6C1uOWJiSd9J/yseOxI+y8ZH8nbTzu3jhkZc9dN+uwwjgi6Rbepk1ywwtq7irK/5Zv6rpCY64wVd5VxxVfifJ90u+yq4nMGeNvNL9i6Hq2fpgya4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4ED063858D37 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732360953; x=1732965753; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=607md+0osDQCV/KmW4YooyexylBpno3SCK8+8ryUyqQ=; b=kY48ICWlkkY6Ha0yUpnXPY25/mZL85LSgoL5rEAWg8sUBqMs4WRmrd7SCDh2ThTgog cxix0tlOpuHnkg0aeoymZNrHfXKIv1rorCoVhu9gUaeGQvV+ggd7VhHc+CBzDRMOd3gP VT/we858EykXIvwDt+blRrlVhLKsnq/SONNRHKesqRxQfqAv8ZpEgzl0+pLFHLWURFJk V516Xb60GnI9DkFYbOaF6SnwkbhjiVMxfZwx74kq4ap/WFgJD2d5B+7A00wskcrxAfgo WWovltuWzAOrxRJBrt3I8M3kP5tDUZgJ2+CPnGirKq61ahvOM7NDqArLwWiB57aDSiru 5n3w== X-Gm-Message-State: AOJu0YyI8iKf7N1M02rKUn08DdHMD6iL1rX8ZLg+ylHkB7BcpA4OM9aP S1OwBAyf3Jp17MOY9vc+3xEbNQoIDiTzPYCd1YK3nj7EpxEKSohe83uQl3zyAS0Isz7QP7HqI5M qws+wHNVCzlMb2e5mzAULcN4Q0WXMJYNK X-Gm-Gg: ASbGncv+MzK9MnIftyrWZhjMfaAlAJJ+ZSWvcms87xvJcpOwZxIZolFVuPfnO2xxx8L i4YVEPYoTYSq2nsH2v9GNMWVjs024dVE= X-Google-Smtp-Source: AGHT+IHGmRHic73hnTXgRGb2yd311vshANn332zw1+tYdYGjKG1X5aDruZQpXAngJb8iGkLvWo1Knx6FM661TgwL/sE= X-Received: by 2002:a05:6402:42cb:b0:5cf:cfa8:d6bd with SMTP id 4fb4d7f45d1cf-5d0207995b0mr5010068a12.25.1732360952975; Sat, 23 Nov 2024 03:22:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Sat, 23 Nov 2024 12:21:56 +0100 Message-ID: Subject: Re: /bin/ls -l cannot handle printable Unicode characters outside the BMP ... To: cygwin AT cygwin DOT com X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 List-Id: General Cygwin discussions and problem reports List-Archive: List-Post: List-Help: List-Subscribe: , From: Cedric Blancher via Cygwin Reply-To: Cedric Blancher Content-Type: text/plain; charset="utf-8" Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4ANBNgM3773515 On Sat, 23 Nov 2024 at 11:44, Cedric Blancher wrote: > > Good morning! > > /bin/ls -l cannot handle printable Unicode characters outside the BMP > > Example using '𝒯' > bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T > (yes, our mathematicians want to use THAT as file name) > > On Linux: > LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"' > ls -la > total 8 > -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ > -rw-r--r-- 2 ced staden 4 Nov 23 11:31 𝒯 > -rw-r--r-- 2 ced staden 4 Nov 23 11:31𝒯𝒯 > > On Cygwin: > LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"' > $ ls -la > -rw-r--r-- 1 ced staden 0 Nov 23 11:29 ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ > -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257' > -rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257' > > Looks like the Cygwin locale has a problem with non-BMP chars. find(1) is even worse: $ find . . ./ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ ./???? ./x??x The Microsoft Explorer GUI shows the file names correctly, so IMO this is not a Windows or Win32 API problem. Ced -- Cedric Blancher [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple