www.delorie.com/archives/browse.cgi | search |
DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 55PJj00U656545 |
Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 55PJj00U656545 |
Authentication-Results: | delorie.com; |
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=u/imtZVH | |
X-Recipient: | archive-cygwin AT delorie DOT com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 765D33856DE3 |
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
s=default; t=1750880699; | |
bh=vRycyhNhkxZm6zyAys6KeolDDYSHu2Od6fwTHxqcRDQ=; | |
h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: | |
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: | |
From; | |
b=u/imtZVHHRnshnVbmoXdw2m4+nbiGC50BJ8m2IYLokKZaD23muPZ4ifEkoMzfLB42 | |
dB1c/NE7CvekGqqnc/rqwlyi6iwfkN8DmjACZ/LSqd/WuY1C7vXQr/3wbpQZnoXScS | |
5wSO+GLr6l1hcDH6PDHah3S69rapQ17LC+/ixlpg= | |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 7BF5A3857400 |
Date: | Wed, 25 Jun 2025 21:43:56 +0200 |
To: | cygwin AT cygwin DOT com |
Subject: | Re: readdir() returns inaccessible name if file was created with |
invalid UTF-8 | |
Message-ID: | <aFxRfI4NdZ8y5IlK@calimero.vinschen.de> |
Mail-Followup-To: | cygwin AT cygwin DOT com |
References: | <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> |
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> | |
MIME-Version: | 1.0 |
In-Reply-To: | <03c4fae7-7322-572c-ae72-52e300f0b438@t-online.de> |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.30 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
From: | Corinna Vinschen via Cygwin <cygwin AT cygwin DOT com> |
Reply-To: | cygwin AT cygwin DOT com |
Cc: | Corinna Vinschen <corinna-cygwin AT cygwin DOT com> |
Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 55PJj00U656545 |
On Jun 25 16:59, Christian Franke via Cygwin wrote: > On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote: > > If a file name contains an invalid (truncated) UTF-8 sequence, open() > > does not refuse to create the file. Later readdir() returns a different > > name which could not be used to access the file. > > > > Testcase with U+1F321 (Thermometer): > > > > $ uname -r > > 3.5.4-1.x86_64 > > > > $ printf $'\U0001F321' | od -A none -t x1 > > Â f0 9f 8c a1 > > > > $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext' > > > > $ touch 'file2-'$'\xf0\x9f\x8c''.ext' > > > > $ touch 'file3-'$'\xf0\x9f\x8c' > > > > $ ls -1 > > ls: cannot access 'file2-.?ext': No such file or directory > > ls: cannot access 'file3-': No such file or directory > > 'file1-'$'\360\237\214\241''.ext' > > file2-.?ext > > file3- > > > > > > Name mapping according to "fhandler_disk_file::readdir" strace lines: > > > > "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext" > > -(readdir)-> > > "file1-\xF0\x9F\x8C\xA1.ext" > > > > "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)-> > > "file2-.\xE1\x9E\xB3ext" > > > > "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)-> > > "file3-" I don't know exactly where this happens, but the input of the conversion is invalid UTF-8 because it's missing the 4th byte. There's no way to represent these filenames on Windows filesystems storing filenames as UTF-16 values. So the problem here is that the conversion somehow misses that the 4th byte is invalid and just plods forward and converts the leading three bytes into the matching high surrogate value and then stumbles over the conversion for the low surrogate. It would be really helpful to have an STC for this problem. Thanks, Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |