X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.8 required=5.0 tests=AWL,BAYES_50,J_CHICKENPOX_41,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4A200BC0.9010704@sidefx.com> Date: Fri, 29 May 2009 12:22:24 -0400 From: Edward Lam User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <20090528145106 DOT GA23970 AT ednor DOT casa DOT cgf DOT cx> <4A1EAA75 DOT 7030203 AT sidefx DOT com> <4A1EAAED DOT 1060702 AT cygwin DOT com> <4A1EAD61 DOT 5010308 AT sidefx DOT com> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> In-Reply-To: <3f0ad08d0905290852xe41338alfda89c622f92f677@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com IWAMURO Motonori wrote: > I think that you should set "export LANG=en_US.ISO-8859-1" instead of > "export LANG=LANG=en_US.ISO-8859-1". Ah, sorry, copy/paste error. Yes, that finally works. Thank you! I think there is still a bug here? I set LANG=C, then shouldn't be just NOT doing any encoding, thus work? If I do this on Linux, it works. If I use a cygwin compiled app, it also works. -Edward > 2009/5/30 Edward Lam : >> IWAMURO Motonori wrote: >>> The encoding of C locale is ASCII, and not ISO-8859-1. >>> I don't think ASCII is the same as ISO-8859-1. >>> Does it work on LANG=en_US.ISO-8859-1? >> No, it doesn't. Mind you though, I haven't managed to get piconv to >> recognize any of my LANG settings other than C in cygwin 1.7. >> >> $ export LANG=LANG=en_US.ISO-8859-1 >> >> $ piconv >> perl: warning: Setting locale failed. >> perl: warning: Please check that your locale settings: >> LC_ALL = (unset), >> LANG = "LANG=en_US.ISO-8859-1" >> are supported and installed on your system. >> >> (... usage omitted...) >> >> $ ./bug arg1 "before `cat copyright.txt` after" arg3 >> 0: E:\cygwin1.7\tmp\bug.exe >> 1: arg1 >> 2: before >> >> Regards, >> -Edward >> >>> 2009/5/29 Edward Lam : >>>> Alexey Borzenkov wrote: >>>>> On Thu, May 28, 2009 at 7:28 PM, Edward Lam wrote: >>>>>> PS. In case you haven't noticed, copyright.txt is not a long file. It >>>>>> consists of a single byte, 0xA9. >>>>> Did you try utf-8 encoding copyright.txt? Perhaps your locale is utf-8 >>>>> and the encoder fails. >>>> How is one supposed to determine one's locale in cygwin? I do NOT have >>>> LANG, >>>> or any of the LC environment variables set. I even tried explicitly >>>> setting >>>> LANG=C and it still fails. >>>> >>>> The problem does seem to stem from the new UTF-8 support in cygwin 1.7. >>>> However, I think something is going on here that is unexpected because >>>> trying something similar on Linux has no problems. To confirm that it was >>>> an >>>> UTF-8 related problem, let me repeat the steps slightly differently >>>> again. >>>> Here we assume that I've already got bug.exe compiled which simply prints >>>> out its arguments. >>>> >>>> $ export LANG=C >>>> >>>> $ ./bug arg1 "before `cat copyright.txt` after" arg3 >>>> 0: E:\cygwin1.7\tmp\bug.exe >>>> 1: arg1 >>>> 2: before >>>> >>>> *Notice that argc is 3 when it should be 4!* >>>> >>>> $ piconv -f iso-8859-1 -t utf8 < copyright.txt > fubar.txt >>>> >>>> $ ./bug arg1 "before `cat fubar.txt` after" arg3 >>>> 0: E:\cygwin1.7\tmp\bug.exe >>>> 1: arg1 >>>> 2: before © after >>>> 3: arg3 >>>> >>>> *So now everything works because I converted the character into UTF-8.* >>>> >>>> I think what this points to is some form of invalid source encoding of >>>> the >>>> command line argument when spawning NATIVE applications. >>>> >>>> Here's what happens when I try to compile bug.c using cygwin's gcc: >>>> >>>> $ gcc bug.c -o bug-gcc.exe >>>> >>>> $ ./bug-gcc arg1 "before `cat copyright.txt` after" arg3 >>>> 0: ./bug-gcc >>>> 1: arg1 >>>> 2: before © after >>>> 3: arg3 >>>> >>>> So there seems to be some sort of special marshaling of the command line >>>> arguments that only works when spawning cygwin apps, but breaks when >>>> running >>>> under native apps. >>>> >>>> Regards, >>>> -Edward >>>> >>>> -- >>>> Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple >>>> Problem reports: http://cygwin.com/problems.html >>>> Documentation: http://cygwin.com/docs.html >>>> FAQ: http://cygwin.com/faq/ >>>> >>>> >>> >>> >> >> -- >> Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple >> Problem reports: http://cygwin.com/problems.html >> Documentation: http://cygwin.com/docs.html >> FAQ: http://cygwin.com/faq/ >> >> > > > -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/