www.delorie.com/archives/browse.cgi | search |
On Wed, Jun 03, 2009 at 04:27:55PM +0200, Corinna Vinschen wrote: >On Jun 3 09:18, Edward Lam wrote: >> Corinna Vinschen wrote: >>> The question is, what do you expect? [...] >> [...] >> Wikipedia has several suggestions on how to handle invalid UTF-8 byte >> sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the >> rule that uses the replacement character. > >Chris implemented using the invalid code point solution. The discussion >in http://www.mail-archive.com/linux-utf8 AT nl DOT linux DOT org/msg00080.html >supports this solution. What's missing so far is the way back, from >an invalid single second half of a surrogate pair in the 0xDCxx range >back to the correct byte value. I'm just looking into that. The way back was not, AFAIK, needed for Cygwin programs. I don't think there is a valid way back for Windows programs. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |