DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55PF12dC560490 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55PF12dC560490 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=SWWHmy+U X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4702A385AC22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1750863660; bh=T3rpAplytxsh6D3pYmtyVVUKlj1DBFx6t9HcT354u5w=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=SWWHmy+UMq3ZySXaDgLRza39XTj/25+u0rwBXs/hegEzowatNsKhVv3g5d8fv1uuO 6szeMxSXxBYcWCb2XPTYljorAotbQlK/Fr/0Q647ZZel681YuaNEwS1lWfjsGZeZ8q +B6e/p2ZQ8Zi5lYoyNT8NWzlOc3Z9GDfZ0hKOHj8= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4ADD13856DED ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4ADD13856DED ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750863550; cv=none; b=UVEJX/suJ9rfyBvcewrUM8oRsTLM4TV0h3VVSCR6rYuZoGR2gU486sLIvRf4RndYq4ZiUdMdBf1TV3IecorbyNzOe3rJ/uUN4S91LPZC6uLRfzM/iS+H+k47tNQY9OBGyHpTRPFfZhTHqWRU8qHcM/lwUC9881uaw6xn5qBo9AQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750863550; c=relaxed/simple; bh=gG/M3AqGS9BXmeKBbrbC6TVYtwe/DGriPPq2XnuUB5U=; h=Subject:From:To:Message-ID:Date:MIME-Version; b=invKVrqql2b5HkvpvGuQcBIKvFQWywjMERfnMkJhxHdlIZ5zZBcwr5HW0LC2ESmAVmKTTrOrI4MUTXvwvmxz9lJEeOWHTBHiA0tjY9IevNfYCkv/Wv6CmAyZU1AAGBXNFM91hPER1DXOuj6oHHr0VO2IrvfoUm0SqlDvNZ4nWWM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4ADD13856DED Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> Message-ID: <03c4fae7-7322-572c-ae72-52e300f0b438@t-online.de> Date: Wed, 25 Jun 2025 16:59:04 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 SeaMonkey/2.53.20 MIME-Version: 1.0 In-Reply-To: <96f2253b-791b-b8a0-97dd-8d257eefb9b1@t-online.de> X-TOI-EXPURGATEID: 150726::1750863546-5F7FC4E1-A8F6384A/0/0 CLEAN NORMAL X-TOI-MSGID: f792e4fa-8ff7-4347-8ebc-c71428cfcfe2 X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Christian Franke via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Christian Franke Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 55PF12dC560490 On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote: > If a file name contains an invalid (truncated) UTF-8 sequence, open() > does not refuse to create the file. Later readdir() returns a > different name which could not be used to access the file. > > Testcase with U+1F321 (Thermometer): > > $ uname -r > 3.5.4-1.x86_64 > > $ printf $'\U0001F321' | od -A none -t x1 >  f0 9f 8c a1 > > $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext' > > $ touch 'file2-'$'\xf0\x9f\x8c''.ext' > > $ touch 'file3-'$'\xf0\x9f\x8c' > > $ ls -1 > ls: cannot access 'file2-.?ext': No such file or directory > ls: cannot access 'file3-': No such file or directory > 'file1-'$'\360\237\214\241''.ext' > file2-.?ext > file3- > > > Name mapping according to "fhandler_disk_file::readdir" strace lines: > > "file1-\xF0\x9F\x8C\xA1.ext" -(open)-> L"file1-\xD83C\xDF21.ext" > -(readdir)-> > "file1-\xF0\x9F\x8C\xA1.ext" > > "file2-\xF0\x9f\x8C.ext" -(open)-> L"file2-\xD83C\xF02Eext" -(readdir)-> > "file2-.\xE1\x9E\xB3ext" > > "file3-\xF0\x9F\x8C" -(open)-> L"file3-\xD83C\xF000" -(readdir)-> > "file3-" > > Issue found because 'stress-ng --filename ...' could not cleanup its > temp directory. > A closer look many month later with Cygwin 3.7.0-0.137.g756669312c97 and current upstream of stress-ng reveals a related problem which is possibly more serious: In cases like file3-... above, the converted Windows path ends with 0xF000. This suggests that this is an accidental conversion of the terminating null to the 0xF0xx range. In some cases, the created Windows file name has random garbage behind the 0xF000. Then even Cygwin is not able to access or unlink the file after creation. In fortunately very rare cases, the created Windows file is not accessible from Win32 layer itself because it looks like   L"file3-\xD83C\xF000garbage." or   L"file3-\xD83C\xF000garbage " which is invalid on Win32 layer due to trailing '.' or space. Then a tool which removes the file via Nt*() layer is required. Could not provide a reproducible testcase, sorry. 'stress-ng --filename 1' succeeds, but may silently leave temp files behind. The next stress-ng release will report an error if unlink() of such a file fails. Caution: Files created that way may be not removable with "onboard" tools, see above. -- Regards, Christian -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple