www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/06/26/13:08:17

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55QH8HRN1079276
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55QH8HRN1079276
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=B+/ruRm0
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C5D03854A8A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1750957695;
bh=Ji6xWL+ViIxHZdf4Mxh4DY7l2LbUynW/yh86B3oo2z4=;
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=B+/ruRm01EY1XmBXXloq/MqlS40FpaFWaQT1J4LHxHA7MzBrz7m+2UB+HBJmfDNjS
tFblsT1s6dsxTNrLKwghyuX/uRCt4VicqNsB991DQug0Jfy+hXViFqJfGwt6RTTxMy
CEzU3dMUTB2Wvy69S+L5mm5KpWvF4pXlfS1VjE40=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 022E538560AB
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 022E538560AB
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750957633; cv=none;
b=k6BAbkGzJ479asynPZWdPJMJ8g72bgqM/0jQe97LhtMgWAP4e0wzFHCAICSvCHH1aNOCcRPznmFKcWpmsOOt9XFXhPJyhi2KtiiSTVaWdHVIkMwEJHERELEhkibuygYVOBYnY4EqMBw2l42onmhsgJkpqGYejXifQqh67Z3NuZI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1750957633; c=relaxed/simple;
bh=Ajvd8X71AiJwPupruEuGC5KGv3pPAWaAYSlT+2MtbnE=;
h=Subject:To:From:Message-ID:Date:MIME-Version;
b=FNr16fTYZdPhmZM3NLd4gWet8ndcc2nfoPepJxCDYuBxkhOfpxdUHlhkfVjpOzTsngSgRjDggQSydbLaVcuY8K2v8flFu0scdMTB4BmtI/83h7TdHTR1Bp/zTYJ9deKWnviwUlbMU4CuqV1cKJNhPQHvPui7slQpYCEFKNZ/sfQ=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 022E538560AB
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de>
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de>
Message-ID: <f78c615c-aefe-b3d0-aada-5f9d0cf73a0a@t-online.de>
Date: Thu, 26 Jun 2025 19:07:05 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
SeaMonkey/2.53.20
MIME-Version: 1.0
In-Reply-To: <aFxRfI4NdZ8y5IlK@calimero.vinschen.de>
X-TOI-EXPURGATEID: 150726::1750957628-8CFF7560-FC748099/0/0 CLEAN NORMAL
X-TOI-MSGID: 39863b64-027a-482b-9262-7ba8864f23bb
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Christian Franke via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Christian Franke <Christian DOT Franke AT t-online DOT de>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>

This is a multi-part message in MIME format.
--------------A85F4BCC90A14E526CD0E834
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Corinna Vinschen via Cygwin wrote:
> On Jun 25 16:59, Christian Franke via Cygwin wrote:
>> On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote:
>>> If a file name contains an invalid (truncated) UTF-8 sequence, open()
>>> does not refuse to create the file. Later readdir() returns a different
>>> name which could not be used to access the file.
>>>
>>> Testcase with U+1F321 (Thermometer):
>>>
>>> $ uname -r
>>> 3.5.4-1.x86_64
>>>
>>> $ printf $'\U0001F321' | od -A none -t x1
>>>   f0 9f 8c a1
>>>
>>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>>>
>>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>>>
>>> $ touch 'file3-'$'\xf0\x9f\x8c'
>>>
>>> $ ls -1
>>> ls: cannot access 'file2-.?ext': No such file or directory
>>> ls: cannot access 'file3-': No such file or directory
>>> 'file1-'$'\360\237\214\241''.ext'
>>> file2-.?ext
>>> file3-
>>>
>>>
>>> Name mapping according to "fhandler_disk_file::readdir" strace lines:
>>>
>>> "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext"
>>> -(readdir)->
>>> "file1-\xF0\x9F\x8C\xA1.ext"
>>>
>>> "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)->
>>> "file2-.\xE1\x9E\xB3ext"
>>>
>>> "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)->
>>> "file3-"
> I don't know exactly where this happens, but the input of the
> conversion is invalid UTF-8 because it's missing the 4th byte.
> There's no way to represent these filenames on Windows
> filesystems storing filenames as UTF-16 values.
>
> So the problem here is that the conversion somehow misses that
> the 4th byte is invalid and just plods forward and converts the
> leading three bytes into the matching high surrogate value and
> then stumbles over the conversion for the low surrogate.
>
> It would be really helpful to have an STC for this problem.


With some trial and error I found a testcase for this more serious 
problem reported yesterday but not quoted above:

>
>> In cases like file3-... above, the converted Windows path ends with 
>> 0xF000. This suggests that this is an accidental conversion of the 
>> terminating null to the 0xF0xx range.
>>
>> In some cases, the created Windows file name has random garbage 
>> behind the 0xF000. Then even Cygwin is not able to access or unlink 
>> the file after creation.

Testcase (attached):

$ uname -r
3.7.0-0.160.g922719ba36e0.x86_64

$ gcc -o badname badname.c

$ ./badname
unlink() failed, errno=2, Win path: L"t-\xda01\xf000a"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000b"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000c"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000d"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000e"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000f"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000g"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000h"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000i"
unlink() failed, errno=2, Win path: L"t-\xda01\xf000j"

Conclusion: The terminating null char is accidentally converted to 
0xF000 and no new null is appended. A trailing fragment of a previously 
used path appears.

>> In fortunately very rare cases, the created Windows file is not 
>> accessible from Win32 layer itself because it looks like
>>   L"file3-\xD83C\xF000garbage."
>> or
>>   L"file3-\xD83C\xF000garbage "
>> which is invalid on Win32 layer due to trailing '.' or space. Then a 
>> tool which removes the file via Nt*() layer is required. 

Testcase: enable one of the "DON'T DO THIS" lines and make sure that a 
suitable file removal tool is available :-)

-- 
Regards,
Christian


--------------A85F4BCC90A14E526CD0E834
Content-Type: text/plain; charset=UTF-8;
 name="badname.c"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="badname.c"

I2luY2x1ZGUgPGRpcmVudC5oPg0KI2luY2x1ZGUgPGVycm5vLmg+DQojaW5jbHVkZSA8ZmNu
dGwuaD4NCiNpbmNsdWRlIDxzdGRpby5oPg0KI2luY2x1ZGUgPHVuaXN0ZC5oPg0KI2luY2x1
ZGUgPHdjaGFyLmg+DQojaW5jbHVkZSA8d2luZG93cy5oPg0KDQpzdGF0aWMgdm9pZCBwcmlu
dF93KEZJTEUgKiBmLCBjb25zdCB3Y2hhcl90ICogcykNCnsNCiAgZnB1dHMoIkxcIiIsIGYp
Ow0KICB3Y2hhcl90IGM7DQogIGZvciAoaW50IGkgPSAwOyAoYyA9IHNbaV0pOyBpKyspIHsN
CiAgICBpZiAoYyA9PSBMJyInIHx8IGMgPT0gTCdcXCcpDQogICAgICBmcHJpbnRmKGYsICJc
XCVjIiwgYyk7DQogICAgZWxzZSBpZiAoTCcgJyA8PSBjICYmIGMgPD0gTCd+JykNCiAgICAg
IGZwdXRjKGMsIGYpOw0KICAgIGVsc2UNCiAgICAgIGZwcmludGYoZiwgIlxceCUwNHgiLCBj
ICYgMHhmZmZmKTsNCiAgfQ0KICBmcHV0YygnIicsIGYpOw0KfQ0KDQpzdGF0aWMgdm9pZCBn
ZXRfd2lubmFtZSh3Y2hhcl90ICogbmFtZSkNCnsNCiAgV0lOMzJfRklORF9EQVRBVyBlOw0K
ICBIQU5ETEUgaCA9IEZpbmRGaXJzdEZpbGVXKEwiKiIsICZlKTsNCiAgaWYgKGggPT0gSU5W
QUxJRF9IQU5ETEVfVkFMVUUpIHsNCiAgICBmcHJpbnRmKHN0ZGVyciwgIkZpbmRGaXJzdEZp
bGVXKCk6IEVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7DQogICAgZXhpdCgxKTsNCiAg
fQ0KICBpbnQgaSA9IDA7DQogIGRvIHsNCiAgICBpZiAoIXdjc2NtcChlLmNGaWxlTmFtZSwg
TCIuIikgfHwgIXdjc2NtcChlLmNGaWxlTmFtZSwgTCIuLiIpKQ0KICAgICAgY29udGludWU7
DQogICAgaWYgKCsraSA+IDEpIHsNCiAgICAgIGZwcmludGYoc3RkZXJyLCAiRXJyb3I6IG1v
cmUgdGhhbiBvbmUgV2luMzIgZmlsZSBmb3VuZFxuIik7DQogICAgICBleGl0KDEpOw0KICAg
IH0NCiAgICB3Y3NjcHkobmFtZSwgZS5jRmlsZU5hbWUpOw0KICB9IHdoaWxlIChGaW5kTmV4
dEZpbGVXKGgsICZlKSk7DQogIEZpbmRDbG9zZShoKTsNCn0NCg0Kc3RhdGljIHZvaWQgdGVz
dG5hbWUoY29uc3QgY2hhciAqIG5hbWUpDQp7DQogIGludCBmZCA9IG9wZW4obmFtZSwgT19X
Uk9OTFl8T19DUkVBVCwgMDY2Nik7DQogIGlmIChmZCA8IDApIHsNCiAgICBwcmludGYoIm9w
ZW4oKSBmYWlsZWQsIGVycm5vPSVkXG4iLCBlcnJubyk7DQogICAgcmV0dXJuOw0KICB9DQog
IGNsb3NlKGZkKTsNCg0KICB3Y2hhcl90IHdpbm5hbWVbTUFYX1BBVEhdOw0KICBnZXRfd2lu
bmFtZSh3aW5uYW1lKTsNCg0KICBpZiAoIXVubGluayhuYW1lKSkNCiAgICByZXR1cm47DQoN
CiAgcHJpbnRmKCJ1bmxpbmsoKSBmYWlsZWQsIGVycm5vPSVkLCBXaW4gcGF0aDogIiwgZXJy
bm8pOw0KICBwcmludF93KHN0ZG91dCwgd2lubmFtZSk7IHByaW50ZigiXG4iKTsNCg0KICBp
ZiAoIURlbGV0ZUZpbGVXKHdpbm5hbWUpKSB7DQogICAgcHJpbnRmKCJGQVRBTDogRGVsZXRl
RmlsZVcoKSBmYWlsZWQsIGVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7DQogICAgZXhp
dCgxKTsNCiAgfQ0KfQ0KDQppbnQgbWFpbigpDQp7DQogIGNvbnN0IGNoYXIgKiBkaXIgPSAi
dGVzdC50bXAiOw0KICBybWRpcihkaXIpOw0KICBpZiAobWtkaXIoZGlyLCAwNjY2KSkgew0K
ICAgIHBlcnJvcihkaXIpOyByZXR1cm4gMTsNCiAgfQ0KICBpZiAoY2hkaXIoZGlyKSkgew0K
ICAgIHBlcnJvcihkaXIpOyByZXR1cm4gMTsNCiAgfQ0KDQogIGZvciAoaW50IGkgPSAwOyBp
IDwgMTA7IGkrKykgew0KICAgIGNvbnN0IGNoYXIgbmFtZVtdID0gInQtXHhmMlx4OTBceDkw
IjsNCiAgICBjaGFyIHByZXZbc2l6ZW9mKG5hbWUpKzJdOw0KICAgIG1lbXNldChwcmV2LCAn
WCcsIHNpemVvZihwcmV2KS0yKTsgcHJldltzaXplb2YocHJldiktMV0gPSAwOw0KICAgIHBy
ZXZbc2l6ZW9mKG5hbWUpXSA9ICdhJyArIChpICUgMjYpOw0KICAvL3ByZXZbc2l6ZW9mKG5h
bWUpXSA9ICcuJzsgLy8gRE9OJ1QgRE8gVEhJUyENCiAgLy9wcmV2W3NpemVvZihuYW1lKV0g
PSAnICc7IC8vIERPTidUIERPIFRISVMhDQogICAgDQogICAgYWNjZXNzKHByZXYsIDApOw0K
ICAgIHRlc3RuYW1lKG5hbWUpOw0KICB9DQogIHJldHVybiAxOw0KfQ0K
--------------A85F4BCC90A14E526CD0E834
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

--------------A85F4BCC90A14E526CD0E834--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019