X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: References: <4BF55DF8 DOT 2090007 AT towo DOT net> Date: Sat, 29 May 2010 06:16:04 +0100 Message-ID: Subject: Re: LANG=ja_JP.Shift_JIS From: Andy Koppe To: cygwin AT cygwin DOT com Cc: rushojp Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 22 May 2010 14:27, rushojp wrote: >> So why do you need to set it to ja_JP.Shift_JIS if ja_JP.CP932 and >> ja_JP.SJIS do the same thing? > > There is no serious reason. > I think IANA name is more famous. Fair enough, but I think it would be misleading to use the official IANA name for what's a (slightly) different charset. > @centos5.5 > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Shift_JIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f SJIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f CP932 -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 ff5e > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Windows-31J -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 ff5e > 000000a > > @cygwin-1.7 > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Shift_JIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f SJIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f CP932 -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 301c > 000000a Looks as expected to me. Iconv's charset names are independent of the locale charset names, but it is unfortunate that "SJIS" means "Shift_JIS" to iconv whereas it means "CP932" to the locale system. That's why I called the SJIS->CP932 mapping "dodgy", but we need to keep it for compatibility (and convenience). Importantly, nl_langinfo(CODESET) returns "CP932" both for ja_JP.CP932 and ja_JP.SJIS, so that programs that use the CODESET string in iconv end up with the correct encoding. > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Windows-31J -t UTF-16LE|hexdump > iconv: conversion from Windows-31J unsupported > iconv: try 'iconv -l' to get the list of supported encodings I had to look that one up: "Windows-31J" is the official IANA name for CP932. I guess it should be added to Cygwin's iconv. (But how did they come up with that name?) Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple