Message-ID: From: Shawn Hargreaves To: djgpp AT delorie DOT com Subject: Re: Allegro, Ansi, TTF2PCX and Umlauts Date: Mon, 17 Jan 2000 16:46:22 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Reply-To: djgpp AT delorie DOT com Manni Heumann writes: > I guess the problem is, that Allegro uses Ascii codepages (I called > set_uformat (U_ASCII)), while the windows fonts are based on an Ansi > representation. I'll assume that you are using a 3.9.x work-in-progress version of Allegro, which include Unicode support. If you aren't, it would be a good idea to upgrade, as 3.12 doesn't have nearly such good internationalisation support. (cue short lecture about text encoding formats) Internally, Allegro uses Unicode format text. This can support all the characters needed for any of the major languages (including Chinese, Japanese, etc), and uses character values ranging from 0-65536. See www.unicode.org for tables of what character goes where. Obviously, there are too many different Unicode characters for you to store them all in normal char variables. So you have a choice of many different ways to encode the letters into a string, and can call set_uformat() to choose which method you would prefer to use. You could use U_UNICODE, where each character is a 16 bit value, or U_ASCII, where each character is only 8 bits (so you can only store letters from 0-255), or the default, U_UTF8, where characters from 0-127 are stored directly as 8 bit values, and values from 128 to 65535 are encoded as two or more bytes. This method is cool because it's fairly backward compatible with normal ASCII code, but easily allows you to support all sorts of different character sets needed for other parts of the world. As long as you use only Allegro functions, that's all you need to know. The text printing functions draw strings from whatever encoding format you have selected, and the input functions return characters in the same style. You do need to be careful when manipulating strings in U_UNICODE or U_UTF8 format, though, as you can't just read individual bytes out of a char array when the characters might be more than one byte wide: you have to use the Allegro functions like ugetc(), ugetat(), etc, instead. The problem comes when you want to talk to the outside world, such as using strings that you typed into your text editor. Here, it really all depends on what text format your editor is using. At least for most European countries, Windows and Unix systems will tend to be using the Latin-1 codepage, which is the same thing as the first 256 characters of Unicode. You could use this text directly with Allegro in U_ASCII mode, or run it through the textconv program if you want to convert it into U_UTF8 format. If you are using a DOS editor, though, you are in trouble: DOS can use many different character layouts depending what country you are in, and Allegro doesn't know anything about these. You could find a table to convert whatever format you are using into Unicode, and then use the Allegro U_ASCII_CP mode to convert all your text using that table, but I really don't recommend this because it's very inefficient, and also won't work correctly for other countries that use different DOS codepages. If you need to edit strings that use character values above 127, the best method, IMHO, is to get a Unicode-aware editor so you can create this data directly in UTF-8 format: there are some links on the Allegro utilities page. Failing that, use a program that edits Latin-1 format text files, and then use textconv to convert the results into UTF-8 format before using them with Allegro. If you absolutely insist, you could downgrade Allegro by specifying U_ASCII mode, but then your program will be unable to deal with texts that use a non-Latin alphabet, so I don't recommend it. Shawn Hargreaves.