Mail Archives: djgpp-workers/2005/05/15/06:54:06

www.delorie.com/archives/browse.cgi

search

Mail Archives: djgpp-workers/2005/05/15/06:54:06

X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f

From: <ams AT ludd DOT ltu DOT se>

Message-Id: <200505140300.j4E30drm024968@speedy.ludd.ltu.se>

Subject: wchar_t implementation and multibyte encoding

To: DJGPP-WORKERS <djgpp-workers AT delorie DOT com>

Date: Sat, 14 May 2005 05:00:39 +0200 (CEST)

X-Mailer: ELM [version 2.4ME+ PL78 (25)]

MIME-Version: 1.0

X-ltu-MailScanner-Information: Please contact the ISP for more information

X-ltu-MailScanner: Found to be clean

X-MailScanner-From: ams AT ludd DOT ltu DOT se

Reply-To: djgpp-workers AT delorie DOT com

Hello.

I've been thinking about this a little. Let say we decide to encode
Unicode in wchar_t, which is the only sane choice today.

Then the functions iswalnum(), iswalpha(), etc. are either going to be
implemented as:

1. switch() and many, many case:'s,

2. if( 0 <= char <= 31 ) { return 0 }
   if( 32 <= char <= 126 ) { return 1 }
   if( ... )
   ..., or

3. tables as isalnum(), isalpha(), etc. are today.


1 and 2: A lot of code. If anything I think gcc extended case x ... y:
can come in useful, so I prefer 1 over 2.

3: An enourmous table. As Unicode has the range 0 - 0x10ffff, we are
talking about more than 1MB!


Now if those functions (isw*()) should return different results
depending on locale, the sizes explode. So I hope not.


With regard to which multibyte encoding we should use, I strongly
prefer UTF-8.


Opinions?


Right,

						MartinS

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-Authentication-Warning:	delorie.com: mail set sender to djgpp-workers-bounces using -f
From:	<ams AT ludd DOT ltu DOT se>
Message-Id:	<200505140300.j4E30drm024968@speedy.ludd.ltu.se>
Subject:	wchar_t implementation and multibyte encoding
To:	DJGPP-WORKERS <djgpp-workers AT delorie DOT com>
Date:	Sat, 14 May 2005 05:00:39 +0200 (CEST)
X-Mailer:	ELM [version 2.4ME+ PL78 (25)]
MIME-Version:	1.0
X-ltu-MailScanner-Information:	Please contact the ISP for more information
X-ltu-MailScanner:	Found to be clean
X-MailScanner-From:	ams AT ludd DOT ltu DOT se
Reply-To:	djgpp-workers AT delorie DOT com