DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55QH8HRN1079276 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55QH8HRN1079276 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=B+/ruRm0 X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C5D03854A8A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1750957695; bh=Ji6xWL+ViIxHZdf4Mxh4DY7l2LbUynW/yh86B3oo2z4=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=B+/ruRm01EY1XmBXXloq/MqlS40FpaFWaQT1J4LHxHA7MzBrz7m+2UB+HBJmfDNjS tFblsT1s6dsxTNrLKwghyuX/uRCt4VicqNsB991DQug0Jfy+hXViFqJfGwt6RTTxMy CEzU3dMUTB2Wvy69S+L5mm5KpWvF4pXlfS1VjE40= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 022E538560AB ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 022E538560AB ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750957633; cv=none; b=k6BAbkGzJ479asynPZWdPJMJ8g72bgqM/0jQe97LhtMgWAP4e0wzFHCAICSvCHH1aNOCcRPznmFKcWpmsOOt9XFXhPJyhi2KtiiSTVaWdHVIkMwEJHERELEhkibuygYVOBYnY4EqMBw2l42onmhsgJkpqGYejXifQqh67Z3NuZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750957633; c=relaxed/simple; bh=Ajvd8X71AiJwPupruEuGC5KGv3pPAWaAYSlT+2MtbnE=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=FNr16fTYZdPhmZM3NLd4gWet8ndcc2nfoPepJxCDYuBxkhOfpxdUHlhkfVjpOzTsngSgRjDggQSydbLaVcuY8K2v8flFu0scdMTB4BmtI/83h7TdHTR1Bp/zTYJ9deKWnviwUlbMU4CuqV1cKJNhPQHvPui7slQpYCEFKNZ/sfQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 022E538560AB Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> Message-ID: Date: Thu, 26 Jun 2025 19:07:05 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 SeaMonkey/2.53.20 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------A85F4BCC90A14E526CD0E834" X-TOI-EXPURGATEID: 150726::1750957628-8CFF7560-FC748099/0/0 CLEAN NORMAL X-TOI-MSGID: 39863b64-027a-482b-9262-7ba8864f23bb X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Christian Franke via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Christian Franke Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" This is a multi-part message in MIME format. --------------A85F4BCC90A14E526CD0E834 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Corinna Vinschen via Cygwin wrote: > On Jun 25 16:59, Christian Franke via Cygwin wrote: >> On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote: >>> If a file name contains an invalid (truncated) UTF-8 sequence, open() >>> does not refuse to create the file. Later readdir() returns a different >>> name which could not be used to access the file. >>> >>> Testcase with U+1F321 (Thermometer): >>> >>> $ uname -r >>> 3.5.4-1.x86_64 >>> >>> $ printf $'\U0001F321' | od -A none -t x1 >>>  f0 9f 8c a1 >>> >>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext' >>> >>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext' >>> >>> $ touch 'file3-'$'\xf0\x9f\x8c' >>> >>> $ ls -1 >>> ls: cannot access 'file2-.?ext': No such file or directory >>> ls: cannot access 'file3-': No such file or directory >>> 'file1-'$'\360\237\214\241''.ext' >>> file2-.?ext >>> file3- >>> >>> >>> Name mapping according to "fhandler_disk_file::readdir" strace lines: >>> >>> "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext" >>> -(readdir)-> >>> "file1-\xF0\x9F\x8C\xA1.ext" >>> >>> "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)-> >>> "file2-.\xE1\x9E\xB3ext" >>> >>> "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)-> >>> "file3-" > I don't know exactly where this happens, but the input of the > conversion is invalid UTF-8 because it's missing the 4th byte. > There's no way to represent these filenames on Windows > filesystems storing filenames as UTF-16 values. > > So the problem here is that the conversion somehow misses that > the 4th byte is invalid and just plods forward and converts the > leading three bytes into the matching high surrogate value and > then stumbles over the conversion for the low surrogate. > > It would be really helpful to have an STC for this problem. With some trial and error I found a testcase for this more serious problem reported yesterday but not quoted above: > >> In cases like file3-... above, the converted Windows path ends with >> 0xF000. This suggests that this is an accidental conversion of the >> terminating null to the 0xF0xx range. >> >> In some cases, the created Windows file name has random garbage >> behind the 0xF000. Then even Cygwin is not able to access or unlink >> the file after creation. Testcase (attached): $ uname -r 3.7.0-0.160.g922719ba36e0.x86_64 $ gcc -o badname badname.c $ ./badname unlink() failed, errno=2, Win path: L"t-\xda01\xf000a" unlink() failed, errno=2, Win path: L"t-\xda01\xf000b" unlink() failed, errno=2, Win path: L"t-\xda01\xf000c" unlink() failed, errno=2, Win path: L"t-\xda01\xf000d" unlink() failed, errno=2, Win path: L"t-\xda01\xf000e" unlink() failed, errno=2, Win path: L"t-\xda01\xf000f" unlink() failed, errno=2, Win path: L"t-\xda01\xf000g" unlink() failed, errno=2, Win path: L"t-\xda01\xf000h" unlink() failed, errno=2, Win path: L"t-\xda01\xf000i" unlink() failed, errno=2, Win path: L"t-\xda01\xf000j" Conclusion: The terminating null char is accidentally converted to 0xF000 and no new null is appended. A trailing fragment of a previously used path appears. >> In fortunately very rare cases, the created Windows file is not >> accessible from Win32 layer itself because it looks like >>   L"file3-\xD83C\xF000garbage." >> or >>   L"file3-\xD83C\xF000garbage " >> which is invalid on Win32 layer due to trailing '.' or space. Then a >> tool which removes the file via Nt*() layer is required. Testcase: enable one of the "DON'T DO THIS" lines and make sure that a suitable file removal tool is available :-) -- Regards, Christian --------------A85F4BCC90A14E526CD0E834 Content-Type: text/plain; charset=UTF-8; name="badname.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="badname.c" I2luY2x1ZGUgPGRpcmVudC5oPg0KI2luY2x1ZGUgPGVycm5vLmg+DQojaW5jbHVkZSA8ZmNu dGwuaD4NCiNpbmNsdWRlIDxzdGRpby5oPg0KI2luY2x1ZGUgPHVuaXN0ZC5oPg0KI2luY2x1 ZGUgPHdjaGFyLmg+DQojaW5jbHVkZSA8d2luZG93cy5oPg0KDQpzdGF0aWMgdm9pZCBwcmlu dF93KEZJTEUgKiBmLCBjb25zdCB3Y2hhcl90ICogcykNCnsNCiAgZnB1dHMoIkxcIiIsIGYp Ow0KICB3Y2hhcl90IGM7DQogIGZvciAoaW50IGkgPSAwOyAoYyA9IHNbaV0pOyBpKyspIHsN CiAgICBpZiAoYyA9PSBMJyInIHx8IGMgPT0gTCdcXCcpDQogICAgICBmcHJpbnRmKGYsICJc XCVjIiwgYyk7DQogICAgZWxzZSBpZiAoTCcgJyA8PSBjICYmIGMgPD0gTCd+JykNCiAgICAg IGZwdXRjKGMsIGYpOw0KICAgIGVsc2UNCiAgICAgIGZwcmludGYoZiwgIlxceCUwNHgiLCBj ICYgMHhmZmZmKTsNCiAgfQ0KICBmcHV0YygnIicsIGYpOw0KfQ0KDQpzdGF0aWMgdm9pZCBn ZXRfd2lubmFtZSh3Y2hhcl90ICogbmFtZSkNCnsNCiAgV0lOMzJfRklORF9EQVRBVyBlOw0K ICBIQU5ETEUgaCA9IEZpbmRGaXJzdEZpbGVXKEwiKiIsICZlKTsNCiAgaWYgKGggPT0gSU5W QUxJRF9IQU5ETEVfVkFMVUUpIHsNCiAgICBmcHJpbnRmKHN0ZGVyciwgIkZpbmRGaXJzdEZp bGVXKCk6IEVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7DQogICAgZXhpdCgxKTsNCiAg fQ0KICBpbnQgaSA9IDA7DQogIGRvIHsNCiAgICBpZiAoIXdjc2NtcChlLmNGaWxlTmFtZSwg TCIuIikgfHwgIXdjc2NtcChlLmNGaWxlTmFtZSwgTCIuLiIpKQ0KICAgICAgY29udGludWU7 DQogICAgaWYgKCsraSA+IDEpIHsNCiAgICAgIGZwcmludGYoc3RkZXJyLCAiRXJyb3I6IG1v cmUgdGhhbiBvbmUgV2luMzIgZmlsZSBmb3VuZFxuIik7DQogICAgICBleGl0KDEpOw0KICAg IH0NCiAgICB3Y3NjcHkobmFtZSwgZS5jRmlsZU5hbWUpOw0KICB9IHdoaWxlIChGaW5kTmV4 dEZpbGVXKGgsICZlKSk7DQogIEZpbmRDbG9zZShoKTsNCn0NCg0Kc3RhdGljIHZvaWQgdGVz dG5hbWUoY29uc3QgY2hhciAqIG5hbWUpDQp7DQogIGludCBmZCA9IG9wZW4obmFtZSwgT19X Uk9OTFl8T19DUkVBVCwgMDY2Nik7DQogIGlmIChmZCA8IDApIHsNCiAgICBwcmludGYoIm9w ZW4oKSBmYWlsZWQsIGVycm5vPSVkXG4iLCBlcnJubyk7DQogICAgcmV0dXJuOw0KICB9DQog IGNsb3NlKGZkKTsNCg0KICB3Y2hhcl90IHdpbm5hbWVbTUFYX1BBVEhdOw0KICBnZXRfd2lu bmFtZSh3aW5uYW1lKTsNCg0KICBpZiAoIXVubGluayhuYW1lKSkNCiAgICByZXR1cm47DQoN CiAgcHJpbnRmKCJ1bmxpbmsoKSBmYWlsZWQsIGVycm5vPSVkLCBXaW4gcGF0aDogIiwgZXJy bm8pOw0KICBwcmludF93KHN0ZG91dCwgd2lubmFtZSk7IHByaW50ZigiXG4iKTsNCg0KICBp ZiAoIURlbGV0ZUZpbGVXKHdpbm5hbWUpKSB7DQogICAgcHJpbnRmKCJGQVRBTDogRGVsZXRl RmlsZVcoKSBmYWlsZWQsIGVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7DQogICAgZXhp dCgxKTsNCiAgfQ0KfQ0KDQppbnQgbWFpbigpDQp7DQogIGNvbnN0IGNoYXIgKiBkaXIgPSAi dGVzdC50bXAiOw0KICBybWRpcihkaXIpOw0KICBpZiAobWtkaXIoZGlyLCAwNjY2KSkgew0K ICAgIHBlcnJvcihkaXIpOyByZXR1cm4gMTsNCiAgfQ0KICBpZiAoY2hkaXIoZGlyKSkgew0K ICAgIHBlcnJvcihkaXIpOyByZXR1cm4gMTsNCiAgfQ0KDQogIGZvciAoaW50IGkgPSAwOyBp IDwgMTA7IGkrKykgew0KICAgIGNvbnN0IGNoYXIgbmFtZVtdID0gInQtXHhmMlx4OTBceDkw IjsNCiAgICBjaGFyIHByZXZbc2l6ZW9mKG5hbWUpKzJdOw0KICAgIG1lbXNldChwcmV2LCAn WCcsIHNpemVvZihwcmV2KS0yKTsgcHJldltzaXplb2YocHJldiktMV0gPSAwOw0KICAgIHBy ZXZbc2l6ZW9mKG5hbWUpXSA9ICdhJyArIChpICUgMjYpOw0KICAvL3ByZXZbc2l6ZW9mKG5h bWUpXSA9ICcuJzsgLy8gRE9OJ1QgRE8gVEhJUyENCiAgLy9wcmV2W3NpemVvZihuYW1lKV0g PSAnICc7IC8vIERPTidUIERPIFRISVMhDQogICAgDQogICAgYWNjZXNzKHByZXYsIDApOw0K ICAgIHRlc3RuYW1lKG5hbWUpOw0KICB9DQogIHJldHVybiAxOw0KfQ0K --------------A85F4BCC90A14E526CD0E834 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple --------------A85F4BCC90A14E526CD0E834--