X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 21 Aug 2009 09:49:27 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: KOI8 Message-ID: <20090821074927.GD32408@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <416096c60908201343g6134c93ao3f4646f6e3fc0dfe AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <416096c60908201343g6134c93ao3f4646f6e3fc0dfe@mail.gmail.com> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Aug 20 21:43, Andy Koppe wrote: > One fairly important character encoding not yet supported by Cygwin > 1.7 is KOI8. Well, two actually, because there are slightly different > versions for Russian and Ukrainian: KOI8-R and KOI8-U, aka Windows > codepages 20866 and 21866. Apparently they're de-facto standards for > Unix machines and the in the former Soviet Union. (Windows uses > CP1251, whereas ISO-8859-5 (Cyrillic) never caught on.) > > Cygwin's Midnight Commander actually uses KOI8 if the locale is set to > "ru" or "uk", even if a charset is specified explicitly, e.g. > "ru.CP1251". Hence you get gibberish where a helpful hint in the > user's language should be. (Of course that's primarily a shortcoming > in mc.) > > Anyway, to help support them, the attached patch adds the KOI8 > charsets to newlib's Unicode conversion and ctype tables. I took the > conversion tables from iconv and adapted the ctype tables from the > CP1251 version. Since KOI8 has printable characters in the C1 range > from 0x80 to 0x9F, it seems easiest to treat them as Windows > codepages. > > To complete support, "KOI8-R" and "KOI8-U" would need to be recognised > in _setlocale_r and mapped to codepages 20866 and 21866. I'd suggest to add the missing code to loadlocale() (the internally used charset should be set to "CP20866"/"CP21866", but it seems you know this already) and send the entire patch, together with a ChangeLog entry, to the newlib list. If you could base it on my pending proposal to make the charset case insensitive http://sourceware.org/ml/newlib/2009/msg00840.html, that would be great. This patch also requires a minor patch to Cygwin, which can be applied as ovious after the newlib change has gone in: Index: strfuncs.cc =================================================================== RCS file: /cvs/src/src/winsup/cygwin/strfuncs.cc,v retrieving revision 1.33 diff -u -p -r1.33 strfuncs.cc --- strfuncs.cc 30 Jun 2009 21:18:43 -0000 1.33 +++ strfuncs.cc 21 Aug 2009 07:48:19 -0000 @@ -339,6 +339,8 @@ __set_charset_from_codepage (UINT cp, ch case 1256: case 1257: case 1258: + case 20866: + case 21866: __small_sprintf (charset, "CP%u", cp); return __cp_mbtowc; case 28591: Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple