www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin-apps/2002/05/01/04:53:27

Mailing-List: contact cygwin-apps-help AT cygwin DOT com; run by ezmlm
Sender: cygwin-apps-owner AT cygwin DOT com
List-Subscribe: <mailto:cygwin-apps-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin-apps/>
List-Post: <mailto:cygwin-apps AT cygwin DOT com>
List-Help: <mailto:cygwin-apps-help AT cygwin DOT com>, <http://sources.redhat.com/lists.html#faqs>
Mail-Followup-To: cygwin-apps AT cygwin DOT com
Delivered-To: mailing list cygwin-apps AT cygwin DOT com
Date: Wed, 1 May 2002 10:53:05 +0200
From: Pavel Tsekov <ptsekov AT syntrex DOT com>
Reply-To: Pavel Tsekov <ptsekov AT syntrex DOT com>
Organization: Syntrex, Inc.
X-Priority: 3 (Normal)
Message-ID: <10340394103.20020501105305@syntrex.com>
To: "Robert Collins" <robert DOT collins AT itdomain DOT com DOT au>
CC: "Gary R. Van Sickle" <g DOT r DOT vansickle AT worldnet DOT att DOT net>,
"Cygwin-Apps" <cygwin-apps AT cygwin DOT com>
Subject: Re[2]: libgetopt++ and setup and libstdc++
In-Reply-To: <FC169E059D1A0442A04C40F86D9BA7600C5F64@itdomain003.itdomain.net.au>
References: <FC169E059D1A0442A04C40F86D9BA7600C5F64 AT itdomain003 DOT itdomain DOT net DOT au>
MIME-Version: 1.0

Hello Robert,

Wednesday, May 01, 2002, 10:22:03 AM, you wrote:

>> -----Original Message-----
>> From: Gary R. Van Sickle [mailto:g DOT r DOT vansickle AT worldnet DOT att DOT net] 
>> Sent: Monday, April 29, 2002 5:39 AM

>> > Except that widechar != unicode. WCHAR is still an 0 terminated 
>> > string, but Unicode strings are not 0 terminated.
>> 
>> Sure they are.  A Unicode '\0' == 0x0000 (regardless of your 
>> byte order ;-)).
>>

Zero terminated strings (C style strings) has nothing to do with the
basic_string template class. basic_string can contain any character
including \0. Its much the same as the STL vector. The WCHAR here
specifies the size of storage of a single character...

I.e. you can have typedef basic_string<struct SomeStrangeChar>
SomeStrangeCharString;

RC> Read http://www.unicode.org/unicode/uni2book/ch05.pdf section 5.2.
RC> Also read http://www.unicode.org/unicode/uni2book/ch02.pdf which does
RC> note that nul(U+0000) can be used as a string terminator.

RC> Then http://www.unicode.org/unicode/reports/tr17/
RC> "C and C++ char* APIs use serialized bytes, which could represent a
RC> variety of different character maps, including ISO Latin 1, UTF-8,
RC> Windows 1252, as well as compound character maps such as Shift-JIS or
RC> 2022-JP. A byte API could also handle UTF-16BE or UTF-16LE, which are
RC> serialized forms of Unicode. However, these APIs must be allow for the
RC> existence of any byte value, and typically use memcpy plus length
RC> instead of strcpy for manipulating strings." (which is possibly
RC> referring to a non-wchar_t aware strcpy, not sure here).

RC> Anyway, things like UTF-8 can confuse the heck out of c-libraries
RC> because of their multi-byte nature, where
RC> a) a NULL may be part way through a chacter, not terminating, and
RC> b) a NULL may be illegal at a given point, and the previous partial
RC> character is invalid.

RC> Finally, note that Unicde requires 21 bits of storage, so a 16 bit WCHAR
RC> will still involve multi-byte sequence.

Quote from "The C++ Programming Language":

  "A wide character - that is, an object of type wchar_t ($4.3) - is
  like a char, except that it take up two or more  bytes."

RC> Does the newlib && lib-gcc and libstdc++ string <WCHAR> correctly
RC> understand unicode (and what representation does it use?). Does it use
RC> the same as Win32 WCHAR does? 

>> > (See the NT kernel defines for
>> > UNICODE_STRING to see how unicode strings are represented.).

Btw I read somewhere else that Windows does not support the full
japanese characterset, but only the most used characters.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019