X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.1 required=5.0 tests=AWL,BAYES_50,J_CHICKENPOX_41,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <20090603142755.GM23519@calimero.vinschen.de> References: <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <4A2051E5 DOT 6060600 AT sidefx DOT com> <20090602205440 DOT GF23519 AT calimero DOT vinschen DOT de> <4A26782C DOT 9040207 AT sidefx DOT com> <20090603142755 DOT GM23519 AT calimero DOT vinschen DOT de> Date: Thu, 4 Jun 2009 00:03:29 +0900 Message-ID: <3f0ad08d0906030803o2686f633v2a2e5d1345a6381e@mail.gmail.com> Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line From: IWAMURO Motonori To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hi. How about the addition of the setting of the locale environment variable (like LANG) to the Cygwin installer? 2009/6/3 Corinna Vinschen : > On Jun =A03 09:18, Edward Lam wrote: >> Corinna Vinschen wrote: >>> The question is, what do you expect? =A0[...] >> [...] >> Wikipedia has several suggestions on how to handle invalid UTF-8 byte >> sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the >> rule that uses the replacement character. > > Chris implemented using the invalid code point solution. =A0The discussion > in http://www.mail-archive.com/linux-utf8 AT nl DOT linux DOT org/msg00080.html > supports this solution. =A0What's missing so far is the way back, from > an invalid single second half of a surrogate pair in the 0xDCxx range > back to the correct byte value. =A0I'm just looking into that. > >> > How is anybody supposed to know that the file which consists >> > of the single byte 0xa9 has *any* meaning at all? =A0Why should it be >> > the copyright sign, of all things? >> >> What I was attempting to do was to have NO conversion. In the >> real case that I into this, the "bug.exe" was the one to properly >> interpret what the byte 0xA9 meant from the command line. Yes, I know >> there are several workarounds. > > The command line is always converted to UTF-16 when calling a native > Win32 application. =A0If we don't do it (because we call CreateProcessA), > Windows would do it. =A0As matters stand, we have to convert ourselves, > because we must call CreateProcessW. =A0Either way, the problem persists. > We just don't know what the correct conversion is for the given input. > We have to rely on a correct setting of $LC_ALL/$LANG/$LC_CTYPE. > >>> If we default to the ANSI codepage, you will have the same problem, >>> just upside down. =A0In both cases you will have even more problems if >>> you start using characters not available in your default codepage. >> >> This is where I disagreed with Alexey. What we're really arguing here is >> whether which default will run into the least problems for the most >> common usage. This is subjective of course. > > Definitely. =A0The "right" solution is always only right for a given value > of right. =A0What if the user has set LANG to, say, ja_JP.eucJP? =A0That > user of course expects that the stuff on the command line is converted > to UTF-16 using the eucJP encoding. =A0Everything else would just be very > surprising. > > What's left as questionable is the LANG=3DC default case. =A0Due to the > discussion from the last month we now use UTF-8 as default encoding, > because it's the only encoding which covers all (valid) characters. > Sure, we could also convert the command line using the current ANSI > codepage as Windows does it when calling CreateProcessA in this case. > > Maybe we should do that for testing? =A0Anybody having a strong opinion > here? > > > Corinna > > -- > Corinna Vinschen =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Please, send mails re= garding Cygwin to > Cygwin Project Co-Leader =A0 =A0 =A0 =A0 =A0cygwin AT cygwin DOT com > Red Hat > > -- > Unsubscribe info: =A0 =A0 =A0http://cygwin.com/ml/#unsubscribe-simple > Problem reports: =A0 =A0 =A0 http://cygwin.com/problems.html > Documentation: =A0 =A0 =A0 =A0 http://cygwin.com/docs.html > FAQ: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 http://cygwin.com/faq/ > > --=20 IWAMURO Motnori -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/