X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C745B388C03B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1602606644; bh=DF3hOp7C294NJnvVFDiluJ17Gp54rGYOJm/01vjh6Pg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=e0GguJ5ZQF3utjofHOu4WwnQZRTWudqSdgc+nFm7hoBUXFoN0Ow6q6WUD3xbnEq5G BNOzqWUW8V574+Iiom4OtNQvf9ku2bGtAlksl05/YdqJOhNPs2HthO0m/y8MTbh5pI NVnDgBVKA5dxzIv8bWGweE7v+YG8fkv61MlTRIjE= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 495E23861817 X-Authority-Analysis: v=2.4 cv=bZHV7MDB c=1 sm=1 tr=0 ts=5f85d62f a=95A0EdhkF1LMGt25d7h1IQ==:117 a=95A0EdhkF1LMGt25d7h1IQ==:17 a=IkcTkHD0fZMA:10 a=SMorJkV_YP8A:10 a=afefHYAZSVUA:10 a=FhMo6CzChv-EA_v4RMMA:9 a=QEXdDO2ut3YA:10 To: =?UTF-8?Q?J=C3=A9r=C3=B4me_Froissart?= Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments X-PHP-Originating-Script: 501:rcmail.php MIME-Version: 1.0 Date: Tue, 13 Oct 2020 09:30:37 -0700 In-Reply-To: References: <634821436 DOT 20201004141809 AT yandex DOT ru> Message-ID: X-Sender: 743-406-3965 AT kylheku DOT com User-Agent: Roundcube Webmail/0.9.2 X-CMAE-Envelope: MS4xfAOWl4UyINI3/YdXFUdJqQucx6HZnVCgceMrq2FvqIolN1Zrcp4G10T5cEU016SrdCGm66XGIyf9prS4JznAFjoicbopesLEEwZI45+HFear+NUkuJ9P 0ddpaQNb4bzW82qIrIxz5bGCecJ4B0R5yzqs9E+ngXN7/YyUTAKNgPWrLFgjqy5Qz0wT87peNDj3lVJdkn05GVPJSqp+hQvHQFpyugbcmOlX/M+hqtWvmmXj X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, FROM_STARTS_WITH_NUMS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: "Kaz Kylheku \(Cygwin\) via Cygwin" Reply-To: "Kaz Kylheku \(Cygwin\)" <743-406-3965 AT kylheku DOT com> Cc: cygwin AT cygwin DOT com Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 09DGVC3X012177 On 2020-10-06 14:36, Jérôme Froissart wrote: > Here is an example C file > $ cat example.c > #include > > const char *GetCommandLineA(void); > > int main(int argc, char *argv[]) > { > const char *s = GetCommandLineA(); > printf("C=%s\n", s); > > for (int i = 0; argc > i; i++) > printf("%d=%s\n", i, argv[i]); > > return 0; > } Your program's comparison seems to be based on the hypothesis that Cygwin parses the GetCommandLineA() command line. But this hypothesis is almost certainly wrong. > Now, let's start a Windows shell (cmd.exe) > Note that I had to copy cygwin1.dll from my Cygwin installation > directory, otherwise binary.exe would not start. > I do not know whether there is a `locale` equivalent in Windows > command prompt, so I merely ran my program. > C:\Users\Public>binary.exe "foo bar" "Jérôme" > C=binary.exe "foo bar" "J□r□me" > 0=binary > 1=foo bar > 2="Jérôme" The "A" command line from GetCommandLineA has "tofu" characters: é and ô were not decoded properly. The é and ô characters we see in the Cygwin-parsed arguments coming into main could not have been recovered from these "tofu" replacement characters. What is actually being parsed must be the WCHAR command line corresponding to what comes from GetCommandLineW(). It's necessary to show that one to get a more complete understanding. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple