From: "Juan Manuel Guerrero" Organization: Darmstadt University of Technology To: Eli Zaretskii Date: Sat, 24 Feb 2001 19:07:40 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: gettext pretest available CC: Bruno Haible , djgpp-workers AT delorie DOT com X-mailer: Pegasus Mail for Windows (v2.54DE) Message-ID: <2E9C0C3501E@HRZ1.hrz.tu-darmstadt.de> Reply-To: djgpp-workers AT delorie DOT com On Fri, 23 Feb 2001 09:44:52 +0200, Eli Zaretskii: > > Of course, you are right. All the pertinent DJGPP libc functions > > work as you have described (they recognize CRLF *and* LF as '\n' if > > the file has been fopen()'ed in text mode) makeing the code I have > > added redundant and superfluous. > > One caveat: can the files that are read as text have unprintable > characters, such as lone CRs or ^Z? If they can, text mode is not > reliable enough to be used with such files. This was the reason why I had opened all files in binary mode. This was the way it was done in the DJGPP port of gettext 0.10.35. But IMHO we can drop this. The only text files we will deal with are the .po files. This files are usualy created with two types of charsets: non-asiatic single byte charsets like iso-8859-xx, cpxxx and koi8-r/u and the asiatic ones double byte charsets like big5, euc-{cn,jp,kr}, JIS-X-0208, shift-JIS, CP9XX (May be I have forgotten someone). There should be no difficulty with the iso/cpxxx/koi8 written .po files. IMHO, if iso/cpxxx/koi8 written .po files contain *lone* CRs or ^Z then they are broken. The asiatic charsets are usually double byte charset. The question arises if ASCII(0x00) to ASCII(0x20) is used in the charset or not. Usualy asiatic characters are coded using two or more bytes. This byte paires are usually organized into a 94 x 94 matrix. This matrix is placed starting at ASCII(0xA0) sometimes. Sometimes it is placed starting at ASCII(0x21); this is 7-bit ISO-2022 AKA shift-JIS. Some of the charsets use ESC, some others use tilde (~), some others use shift in (SI) and shift out (SO) as control character to select different "character planes". *No* character set uses CR, LF or Cntl-Z in any combination, neither for character encoding nor as control sequence. In conclusion: as long as *only* the above described single byte and double byte charsets are used, we can savely open .po files in text mode. BTW, if someone is really interested in CJK encoding, look at: and download CJK.INF. Read: PART 3: CJK ENCODING SYSTEMS Regards, Guerrero, Juan Manuel