X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_53,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4A2051E5.6060600@sidefx.com> Date: Fri, 29 May 2009 17:21:41 -0400 From: Edward Lam User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Alexey Borzenkov wrote: > No, the bug is not that it gets wrong number of arguments. In fact, > Windows has no concept of arguments, only C runtime does, which parses > the command line. If command line is truncated, then C runtime will > have missing arguments when it tries to parse it. Sorry, I had meant to comment on this previously but hit send too soon. I think the problem I'm running into is: - I give cygwin 1.7's bash a string that is in my system default code page. - cygwin 1.7 thinks the string is actually UTF-8 and tries to convert it as UTF-8 into UTF-16, resulting in a truncated command line that is passed to child process. Here's some more investigation: $ cat bug.c #include int wmain(int argc, wchar_t *argv[], wchar_t *envp[]) { int i; for (i = 0; i < argc; i++) wprintf(L"%d: %s\n", i, argv[i]); return 0; } ... and compiled using MSVC .... $ ./bug arg1 "before `cat copyright.txt` after" arg3 0: E:\cygwin1.7\tmp\bug.exe 1: arg1 2: before So note that even when I'm seems to be an UNICODE-AWARE child process, I'm still getting a truncated command line. In fact, call GetCommandLineW() directly seems to give a truncated command line as well. Regards, -Edward Alexey Borzenkov wrote: > On Sat, May 30, 2009 at 12:10 AM, Edward Lam wrote: >> Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision >> to use UTF-8 for the C locale is questionable. > > Not at all, because utf-8, as far as I understand, is used for > communication with the system in this context, and does not force > anything to the application. Most modern unixes use utf-8 nowadays, it > means that even if you have a C locale your terminal outputs text in > utf-8, your input is utf-8, your filenames are utf-8 (well, not > really, but the rest of the system sees them that way). Same stuff > here, except that launching non-cygwin processes is communication with > the system as well, and it needs conversion. And where is conversion > there is always possible loss of data. One way or the other. > >> It seems to me that it would be much safer to use the SYSTEM DEFAULT code >> page (ie. the return value of the system GetACP() function) for CYGWIN >> instead, ensuring compatibility for the large class native Windows >> applications that are non-Unicode, non-CodePage aware. > > It might be safe for you, but not for other people. If you have a > Russian default codepage and ever need to work with chineese/japanese > filenames and cygwin uses default codepage for filesystem operations > (as in 1.5 right now), then you are really screwed. In my opinion > utf-8 is a silver bullet here, and I'm very glad it went that way. > >> I think it's very bad that changing LANG can result in a truncated *command >> line*, that has nothing to do with printf. The printf in the code was just >> for testing. The HUGE bug is that the application gets the WRONG NUMBER OF >> ARGUMENTS. > > No, the bug is not that it gets wrong number of arguments. In fact, > Windows has no concept of arguments, only C runtime does, which parses > the command line. If command line is truncated, then C runtime will > have missing arguments when it tries to parse it. > > I mentioned wprintf because recently I was wondering why > mkpasswd/mkgroup had a strange truncating behavior with russian > usernames and it turned out that wprintf, when it can't encode some > characters, stops right there and returns an error code. But, honesly, > who ever checks return codes from printf? > > Here might be something similar. When constructing command line some > function is called and can't encode some character, returns error > status, but it's never checked, and you get truncated command line. > > And btw, I'm not cygwin developer here, I'm just a speculating user > right now, because I haven't been searching this problem in the code. > > -- > Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple > Problem reports: http://cygwin.com/problems.html > Documentation: http://cygwin.com/docs.html > FAQ: http://cygwin.com/faq/ > -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/