X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.7 required=5.0 tests=AWL,BAYES_50,J_CHICKENPOX_41,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4A204149.2050009@sidefx.com> Date: Fri, 29 May 2009 16:10:49 -0400 From: Edward Lam User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <4A1EAAED DOT 1060702 AT cygwin DOT com> <4A1EAD61 DOT 5010308 AT sidefx DOT com> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hi Alexey, Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision to use UTF-8 for the C locale is questionable. It seems to me that it would be much safer to use the SYSTEM DEFAULT code page (ie. the return value of the system GetACP() function) for CYGWIN instead, ensuring compatibility for the large class native Windows applications that are non-Unicode, non-CodePage aware. Reading the original mailing list threads now, it seems like Corinna Vinschen also mentioned this using the system code page[1]. I tried to dig through the various mails in that thread didn't find any good objection to it. > The only bug here is that the arguments are truncated instead of using > some kind of a replacement character, is it related to some posix > complience, like with wprintf? I think it's very bad that changing LANG can result in a truncated *command line*, that has nothing to do with printf. The printf in the code was just for testing. The HUGE bug is that the application gets the WRONG NUMBER OF ARGUMENTS. 1. http://www.mail-archive.com/cygwin AT cygwin DOT com/msg96843.html Regards, -Edward Alexey Borzenkov wrote: > On Fri, May 29, 2009 at 8:22 PM, Edward Lam wrote: >> I think there is still a bug here? I set LANG=C, then shouldn't be just NOT >> doing any encoding, thus work? If I do this on Linux, it works. If I use a >> cygwin compiled app, it also works. > > On Linux, internally, system uses multibyte strings (it is encoding > agnostic even), but on Windows, system uses unicode strings, so cygwin > has to decode your byte sequences somehow to pass them to non-cygwin > processes as unicode (the fact that cygwin now understands unicode is > a huge plus to me). In earlier discussions it was decided that cygwin > C locale should use utf-8 encoding, because file system internally > uses unicode it's the safest default to represent all possible > filenames, etc. In previous cygwin versions, your byte sequences were > just silently converted using your system's codepage (by the system > itself, even), so if you want the old behavior you should set > LANG=en_US.CP1252. > > The only bug here is that the arguments are truncated instead of using > some kind of a replacement character, is it related to some posix > complience, like with wprintf? > > -- > Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple > Problem reports: http://cygwin.com/problems.html > Documentation: http://cygwin.com/docs.html > FAQ: http://cygwin.com/faq/ > -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/