www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2002/06/17/01:31:18

Date: Mon, 17 Jun 2002 08:25:51 +0300 (IDT)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: sandmann AT clio DOT rice DOT edu
cc: djgpp-workers AT delorie DOT com
Subject: Re: Interesting bug (extended characters in file names)
In-Reply-To: <10206170031.AA15814@clio.rice.edu>
Message-ID: <Pine.SUN.3.91.1020617081955.4796G-100000@is>
MIME-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

On Sun, 16 Jun 2002 sandmann AT clio DOT rice DOT edu wrote:

> In my favorites directory, I have several .url files which have been saved
> with names containing extended characters.  (Registered symbol (R), trademark
> TM).  From a command line these characters don't display properly.  I
> cannot copy these files from the command line on Win95 (OK from Win2K) -
> this tells me that maybe Windows is just broken ...
> 
> The djgpp images see them but can't get information on them.  For example:
> 
> ls (shows them, but characters substituted/truncated in display?)
> ls * (gives the name, then ENOENT)

One possible reason is that these characters don't belong to the 
character set supported by the OEM font Windows uses for the DOS box and 
DOS programs.  For example, if you copy a string with these characters 
into the clipboard, and then paste them into the DOS app, either via the 
Edit->Paste dialog of the DOS box or by invoking the appropriate Int 2Fh 
function from the application (Emacs does that), such characters are 
converted into underscores `_'.

I believe Windows tries to convert between the character sets, and 
replaces those characters it cannot convert with underscores.

> Has this been noticed before?  I'm not sure if it's something we can work
> around, but I thought I would pass it on.  Unicode characters?

Unicode characters should never be exposed to DOS apps.  But it could be 
that W2K emulation of DOS interrupts feeds us with multibyte characters 
which our library cannot grok.

It might be interesting to see what does `findfirst'/`findnext' return in 
those cases.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019