DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55SAJZSg1853475 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55SAJZSg1853475 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=iXVGf74p X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BEDAD38560BC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1751105973; bh=zVRjxM51AAWnHFu8ytJEQtQo7yoAc+4TtuU9ISwaEOs=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=iXVGf74pO7gnJkssh4gezhStCNG9C4ANL4Uf48DCFqJNgxFNgqVYEUP8HdKwInUKm lYDUWDTfnDH9wuCU4mQp4kYYOzmObuRfz3DSUNvXTkm5Yy/3R2tT82/jFMMQZqyrXA CjnKUeQ5gkuRVQrdzuvoR0D93o2XibuphigkzgRQ= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75091385781B ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 75091385781B ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751105944; cv=none; b=g09xW7EvTMCMtT//rSOU43OBB7z2Fws0Z0cpplx+GsssVJ2SZf90AbfmSmoQA7wFlvFPkixqSvFkB36hugnujq1YbS/AJCwmV7wrilYIwBWN+TeDv8Wh19lBeeiWmOQo6kKQZ4Ih/02fMeSs4tlp/lLUQzZLw1eDU7eQz0j0+3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751105944; c=relaxed/simple; bh=F1OK78Q7wFHXoll8kU8rdiRh6W4bOhDZRf7MngBwwOc=; h=From:Subject:To:Message-ID:Date:MIME-Version; b=g+k4Ioxblq0DpkWyqKSQir8ra7Rw4FF6Am2KBr3LXmwHBHYjw50mDI5Bop3gUVXQqoMWTXdY8eiijWIr1CXz1gMoqT/uSKX1MbrCFWYm3XQ2kmpJdJtJ0N79wOcbAew7KPmkrQ9mEX99TQyUtIirT7x4vuANAGWK6uAlIerbZCo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 75091385781B Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> <3295c8bd-2c09-76c7-8b5f-0106dc39dd96 AT t-online DOT de> Message-ID: <5fae4fcc-6847-ab19-b487-3a28c76d96e4@t-online.de> Date: Sat, 28 Jun 2025 12:18:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 SeaMonkey/2.53.20 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------B19B623E207A25A37D693D40" X-TOI-EXPURGATEID: 150726::1751105940-057F8536-6F7683DD/0/0 CLEAN NORMAL X-TOI-MSGID: 54bf753f-892e-4dbe-b9ad-14dbce175f03 X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Christian Franke via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Christian Franke Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" This is a multi-part message in MIME format. --------------B19B623E207A25A37D693D40 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Corinna Vinschen via Cygwin wrote: > On Jun 27 15:32, Christian Franke via Cygwin wrote: >> $ touch $'t-\xef\x80\x80' >> The name mapping is: >> "t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-" > Did you copy/paste this from the old mail, by any chance? Sorry, I accidentally mixed two cases with same readdir() result: "t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-" "t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-" $ touch $'t-\xed\xad\x99' $ touch $'t-\xef\x80\x80' $ ls | uniq -c       2 t- Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below. > Using the latest test DLL the mapping is > > "t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" > > And that's basically correct, albeit it leads to problems. > > You know that we defined the area from 0xf000 to 0xf0ff as our private > use area to create filenames with characters invalid in DOS filenames > by transposing these chars into the private use area. When converting > the filenames back, the 0xf0XX chars are transposed back to 0xXX. Yes. > But yeah, I found the bug here. The problem is that the transpose table > incorrectly contains NUL as transposable character. So if you create > L"t-\xF000", that's fine. However, when converting this name back to > UTF-8, the filename becomes L"t-\0". Oops. > > I dropped the ASCII NUL from the list of transposable characters and > now what you get is this: > > $ touch $'t-\xef\x80\x80' > $ touch $'t-\xef\x80\x81' > $ ls -l > total 0 > -rw-r--r-- 1 corinna vinschen 0 Jun 27 16:49 't-'$'\001' > -rw-r--r-- 1 corinna vinschen 0 Jun 27 16:49 't-'$'\357\200\200' > > Apart from the incorrect transposition of ASCII NUL, the transposition > works transparently: > > $ echo foo > $'t-\xef\x80\x81' > $ cat $'t-\xef\x80\x81' > foo > $ cat $'t-\x01' > foo > > I'll apply the patch shortly. $ touch $'t-\xed\xad\x90' $ touch $'t-\xed\xad\x91' $ touch $'t-\xed\xad\x92' $ touch $'t-\xed\xad\x93' $ touch $'t-\xed\xad\x94' $ ls | uniq -c       5 t- $ ls -s ls: cannot access 't-': No such file or directory ls: cannot access 't-': No such file or directory ls: cannot access 't-': No such file or directory ls: cannot access 't-': No such file or directory ls: cannot access 't-': No such file or directory total 0 ? t-  ? t-  ? t-  ? t-  ? t- All results found by several runs with different seeds of the attached test program have in common that the Windows path name contains an invalid word in UTF-16 High Surrogate range: $ ./randnames 42 $'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2: $'t-\xED\xA4\xA8\x80\xE0': original path L"t-\xD928\xF080\xF0E0": Windows path $'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2: $'t-\xED\xA6\xB0\xE1': original path L"t-\xD9B0\xF0E1": Windows path ... $'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2: $'t-\xED\xA2\x96\xB3': original path L"t-\xD896\xF0B3": Windows path -- Thanks, Christian --------------B19B623E207A25A37D693D40 Content-Type: text/plain; charset=UTF-8; name="randnames.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="randnames.c" I2luY2x1ZGUgPGRpcmVudC5oPg0KI2luY2x1ZGUgPGVycm5vLmg+DQojaW5jbHVkZSA8ZmNu dGwuaD4NCiNpbmNsdWRlIDxzdGRpby5oPg0KI2luY2x1ZGUgPHN0ZGxpYi5oPg0KI2luY2x1 ZGUgPHN0cmluZy5oPg0KI2luY2x1ZGUgPHVuaXN0ZC5oPg0KI2luY2x1ZGUgPHdjaGFyLmg+ DQojaW5jbHVkZSA8d2luZG93cy5oPg0KDQpzdGF0aWMgdm9pZCBwcmludF9jKEZJTEUgKiBm LCBjb25zdCBjaGFyICogcykNCnsNCiAgZnB1dHMoIiQnIiwgZik7DQogIGNoYXIgYzsNCiAg Zm9yIChpbnQgaSA9IDA7IChjID0gc1tpXSk7IGkrKykgew0KICAgIGlmIChjID09ICdcJycp DQogICAgICBmcHV0cygiJ1xcJyQnIiwgZik7DQogICAgZWxzZSBpZiAoJyAnIDw9IGMgJiYg YyA8PSAnficpDQogICAgICBmcHV0YyhjLCBmKTsNCiAgICBlbHNlDQogICAgICBmcHJpbnRm KGYsICJcXHglMDJYIiwgYyAmIDB4ZmYpOw0KICB9DQogIGZwdXRjKCdcJycsIGYpOw0KfQ0K DQpzdGF0aWMgdm9pZCBwcmludF93KEZJTEUgKiBmLCBjb25zdCB3Y2hhcl90ICogcykNCnsN CiAgZnB1dHMoIkxcIiIsIGYpOw0KICB3Y2hhcl90IGM7DQogIGZvciAoaW50IGkgPSAwOyAo YyA9IHNbaV0pOyBpKyspIHsNCiAgICBpZiAoYyA9PSBMJyInIHx8IGMgPT0gTCdcXCcpDQog ICAgICBmcHJpbnRmKGYsICJcXCVjIiwgYyk7DQogICAgZWxzZSBpZiAoTCcgJyA8PSBjICYm IGMgPD0gTCd+JykNCiAgICAgIGZwdXRjKGMsIGYpOw0KICAgIGVsc2UNCiAgICAgIGZwcmlu dGYoZiwgIlxceCUwNFgiLCBjICYgMHhmZmZmKTsNCiAgfQ0KICBmcHV0YygnIicsIGYpOw0K fQ0KDQpzdGF0aWMgdm9pZCBnZXRfd2lubmFtZSh3Y2hhcl90ICogbmFtZSkNCnsNCiAgV0lO MzJfRklORF9EQVRBVyBlOw0KICBIQU5ETEUgaCA9IEZpbmRGaXJzdEZpbGVXKEwiKiIsICZl KTsNCiAgaWYgKGggPT0gSU5WQUxJRF9IQU5ETEVfVkFMVUUpIHsNCiAgICBmcHJpbnRmKHN0 ZGVyciwgIkZpbmRGaXJzdEZpbGVXKCk6IEVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7 DQogICAgZXhpdCgxKTsNCiAgfQ0KICBpbnQgaSA9IDA7DQogIGRvIHsNCiAgICBpZiAoIXdj c2NtcChlLmNGaWxlTmFtZSwgTCIuIikgfHwgIXdjc2NtcChlLmNGaWxlTmFtZSwgTCIuLiIp KQ0KICAgICAgY29udGludWU7DQogICAgd2NzY3B5KG5hbWUsIGUuY0ZpbGVOYW1lKTsNCiAg ICBpKys7DQogIH0gd2hpbGUgKEZpbmROZXh0RmlsZVcoaCwgJmUpKTsNCiAgRmluZENsb3Nl KGgpOw0KICBpZiAoaSAhPSAxKSB7DQogICAgZnByaW50ZihzdGRlcnIsICJFcnJvcjogJWQg V2luMzIgZmlsZXMgZm91bmRcbiIsIGkpOw0KICAgIGV4aXQoMSk7DQogIH0NCn0NCg0Kc3Rh dGljIHZvaWQgZ2V0X2N5Z25hbWUoY2hhciAqIG5hbWUpDQp7DQogIERJUiAqIGQgPSBvcGVu ZGlyKCIuIik7IA0KICBpZiAoIWQpIHsNCiAgICBwZXJyb3IoIm9wZW5kaXIiKTsNCiAgICBl eGl0KDEpOw0KICB9DQogIGludCBpID0gMDsNCiAgY29uc3Qgc3RydWN0IGRpcmVudCAqIGU7 DQogIHdoaWxlICgoZSA9IHJlYWRkaXIoZCkpKSB7DQogICAgaWYgKCFzdHJjbXAoZS0+ZF9u YW1lLCAiLiIpIHx8ICFzdHJjbXAoZS0+ZF9uYW1lLCAiLi4iKSkNCiAgICAgIGNvbnRpbnVl Ow0KICAgIHN0cmNweShuYW1lLCBlLT5kX25hbWUpOw0KICAgIGkrKzsNCiAgfQ0KICBjbG9z ZWRpcihkKTsNCiAgaWYgKGkgIT0gMSkgew0KICAgIGZwcmludGYoc3RkZXJyLCAiRXJyb3I6 ICVkIEN5Z3dpbiBmaWxlcyBmb3VuZFxuIiwgaSk7DQogICAgZXhpdCgxKTsNCiAgfQ0KfQ0K DQpzdGF0aWMgdm9pZCByYW5kbmFtZShjaGFyICogbmFtZSwgaW50IG1heGxlbikNCnsNCiAg aW50IGxlbiA9IDEgKyByYW5kKCkgJSAobWF4bGVuICsgMSAtIDEpOw0KICBmb3IgKGludCBp ID0gMDsgaSA8IGxlbjsgaSsrKSB7DQogICAgY2hhciBjID0gMSArIHJhbmQoKSAlICgyNTYg LSAyIC0gMSk7DQogICAgaWYgKGMgPj0gJy8nKQ0KICAgICAgYysrOw0KICAgIGlmIChjID49 ICdcXCcpDQogICAgICBjKys7DQogICAgbmFtZVtpXSA9IGM7DQogIH0NCiAgbmFtZVtsZW5d ID0gMDsNCn0NCg0Kc3RhdGljIGludCB0ZXN0bmFtZShjb25zdCBjaGFyICogbmFtZSkNCnsN CiAgaW50IGZkID0gb3BlbihuYW1lLCBPX1dST05MWXxPX0NSRUFULCAwNjQ0KTsNCiAgaWYg KGZkIDwgMCkgew0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1lKTsgcHJpbnRmKCI6IG9wZW4o KSBmYWlsZWQsIGVycm5vPSVkXG4iLCBlcnJubyk7DQogICAgZXhpdCgxKTsNCiAgfQ0KICBj bG9zZShmZCk7DQoNCiAgY2hhciBjeWduYW1lW01BWF9QQVRIXTsNCiAgZ2V0X2N5Z25hbWUo Y3lnbmFtZSk7DQogIHdjaGFyX3Qgd2lubmFtZVtNQVhfUEFUSF07DQogIGdldF93aW5uYW1l KHdpbm5hbWUpOw0KDQogIGludCByYyA9IDE7DQogIGlmIChhY2Nlc3MoY3lnbmFtZSwgMCkp IHsNCiAgICBwcmludF9jKHN0ZG91dCwgY3lnbmFtZSk7IHByaW50ZigiOiBhY2Nlc3MoKSBm YWlsZWQsIGVycm5vPSVkOlxuIiwgZXJybm8pOw0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1l KTsgcHJpbnRmKCI6IG9yaWdpbmFsIHBhdGhcbiIpOyANCiAgICBwcmludF93KHN0ZG91dCwg d2lubmFtZSk7IHByaW50ZigiOiBXaW5kb3dzIHBhdGhcblxuIik7DQogICAgcmMgPSAwOw0K ICB9DQoNCiAgaWYgKHVubGluayhuYW1lKSkgew0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1l KTsgcHJpbnRmKCI6IHVubGluaygpIGZhaWxlZCwgZXJybm89JWRcbiIsIGVycm5vKTsNCiAg ICBwcmludF93KHN0ZG91dCwgd2lubmFtZSk7IHByaW50ZigiOiBXaW5kb3dzIHBhdGhcbiIp Ow0KICAgIGV4aXQoMSk7DQogIH0NCiAgcmV0dXJuIHJjOw0KfQ0KDQppbnQgbWFpbihpbnQg YXJnYywgY2hhciAqKmFyZ3YpDQp7DQogIGlmIChhcmdjID4gMSkNCiAgICBzcmFuZChhdG9p KGFyZ3ZbMV0pKTsNCg0KICBjb25zdCBjaGFyICogZGlyID0gInRlc3QudG1wIjsNCiAgcm1k aXIoZGlyKTsNCiAgaWYgKG1rZGlyKGRpciwgMDc1NSkpIHsNCiAgICBwZXJyb3IoZGlyKTsg cmV0dXJuIDE7DQogIH0NCiAgaWYgKGNoZGlyKGRpcikpIHsNCiAgICBwZXJyb3IoZGlyKTsg cmV0dXJuIDE7DQogIH0NCg0KICBpbnQgZXJycyA9IDA7DQogIGZvciAoaW50IGkgPSAwOyBp IDwgMTAwMDAwOyBpKyspIHsNCiAgICBjaGFyIG5hbWVbOF0gPSAidC0iOw0KICAgIHJhbmRu YW1lKG5hbWUgKyAyLCBzaXplb2YobmFtZSkgLSAxIC0gMik7DQogICAgaWYgKCF0ZXN0bmFt ZShuYW1lKSAmJiArK2VycnMgPj0gMTApDQogICAgICBicmVhazsNCiAgfQ0KICByZXR1cm4g MDsNCn0NCg== --------------B19B623E207A25A37D693D40 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple --------------B19B623E207A25A37D693D40--