www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/29/17:22:02

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_53,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A2051E5.6060600@sidefx.com>
Date: Fri, 29 May 2009 17:21:41 -0400
From: Edward Lam <edward AT sidefx DOT com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line
References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <e2480c70905281131u37651a2eoba946637bd414516 AT mail DOT gmail DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <e2480c70905291142o2bcc65ccw2287d175dbd09dd5 AT mail DOT gmail DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <e2480c70905291337g6c8bcca7xd0baba79c84629db AT mail DOT gmail DOT com>
In-Reply-To: <e2480c70905291337g6c8bcca7xd0baba79c84629db@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Alexey Borzenkov wrote:
 > No, the bug is not that it gets wrong number of arguments. In fact,
 > Windows has no concept of arguments, only C runtime does, which parses
 > the command line. If command line is truncated, then C runtime will
 > have missing arguments when it tries to parse it.

Sorry, I had meant to comment on this previously but hit send too soon.

I think the problem I'm running into is:
- I give cygwin 1.7's bash a string that is in my system default code page.
- cygwin 1.7 thinks the string is actually UTF-8 and tries to convert it 
as UTF-8 into UTF-16, resulting in a truncated command line that is 
passed to child process.

Here's some more investigation:

$ cat bug.c
#include <stdio.h>

int wmain(int argc, wchar_t *argv[], wchar_t *envp[])
{
     int i;
     for (i = 0; i < argc; i++)
         wprintf(L"%d: %s\n", i, argv[i]);
     return 0;
}

... and compiled using MSVC ....

$ ./bug arg1 "before `cat copyright.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before

So note that even when I'm seems to be an UNICODE-AWARE child process, 
I'm still getting a truncated command line. In fact, call 
GetCommandLineW() directly seems to give a truncated command line
as well.

Regards,
-Edward

Alexey Borzenkov wrote:
> On Sat, May 30, 2009 at 12:10 AM, Edward Lam <edward AT sidefx DOT com> wrote:
>> Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision
>> to use UTF-8 for the C locale is questionable.
> 
> Not at all, because utf-8, as far as I understand, is used for
> communication with the system in this context, and does not force
> anything to the application. Most modern unixes use utf-8 nowadays, it
> means that even if you have a C locale your terminal outputs text in
> utf-8, your input is utf-8, your filenames are utf-8 (well, not
> really, but the rest of the system sees them that way). Same stuff
> here, except that launching non-cygwin processes is communication with
> the system as well, and it needs conversion. And where is conversion
> there is always possible loss of data. One way or the other.
> 
>> It seems to me that it would be much safer to use the SYSTEM DEFAULT code
>> page (ie. the return value of the system GetACP() function) for CYGWIN
>> instead, ensuring compatibility for the large class native Windows
>> applications that are non-Unicode, non-CodePage aware.
> 
> It might be safe for you, but not for other people. If you have a
> Russian default codepage and ever need to work with chineese/japanese
> filenames and cygwin uses default codepage for filesystem operations
> (as in 1.5 right now), then you are really screwed. In my opinion
> utf-8 is a silver bullet here, and I'm very glad it went that way.
> 
>> I think it's very bad that changing LANG can result in a truncated *command
>> line*, that has nothing to do with printf. The printf in the code was just
>> for testing. The HUGE bug is that the application gets the  WRONG NUMBER OF
>> ARGUMENTS.
> 
> No, the bug is not that it gets wrong number of arguments. In fact,
> Windows has no concept of arguments, only C runtime does, which parses
> the command line. If command line is truncated, then C runtime will
> have missing arguments when it tries to parse it.
> 
> I mentioned wprintf because recently I was wondering why
> mkpasswd/mkgroup had a strange truncating behavior with russian
> usernames and it turned out that wprintf, when it can't encode some
> characters, stops right there and returns an error code. But, honesly,
> who ever checks return codes from printf?
> 
> Here might be something similar. When constructing command line some
> function is called and can't encode some character, returns error
> status, but it's never checked, and you get truncated command line.
> 
> And btw, I'm not cygwin developer here, I'm just a speculating user
> right now, because I haven't been searching this problem in the code.
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Problem reports:       http://cygwin.com/problems.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/
> 


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019