www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/10/14/02:53:22

From: sandmann AT clio DOT rice DOT edu (Charles Sandmann)
Message-Id: <10110140648.AA14621@clio.rice.edu>
Subject: Re: W2K/XP fncase
To: eliz AT is DOT elta DOT co DOT il
Date: Sun, 14 Oct 2001 01:48:28 -0500 (CDT)
Cc: djgpp-workers AT delorie DOT com
In-Reply-To: <2957-Sat13Oct2001194151+0200-eliz@is.elta.co.il> from "Eli Zaretskii" at Oct 13, 2001 07:41:52 PM
X-Mailer: ELM [version 2.5 PL2]
Mime-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

Okay, maybe I need to restart this discussion.

Item 1.  Interrupt 0x21, 71A8 is hopelessly broken on NT and XP.  Even
         using DH=1 results in completely bogus results if any non
         alpha characters are used.  If only alpha characters are used 
         the result is pretty trivial to generate.  Any attempt to
         mess with the function will likely not be 100% compatible with
         Win 9x implementations.  We could document this and forget it.
         We could check for bogosity and then set a flag to jump to
         emulation code.  We could simply truncate to 8.3 as upper case
         if bogus.  

Item 2.  fncase=n handling is default with DJGPP.  This behavior is very
         poorly defined on Windows 9x systems since it relies on an 
         interrupt than no one can completely explain the behavior.  
         Predicting before hand if a file you see will be converted to
         lower case is hit and miss.  The interrupt it relies on does
         not function on the family of Windows which has replaced the 9x
         family (this is similar to refusing to deal with FCBs being
         removed ...).
         
         We could make fncase=y the default on lfn enabled systems, so
         legacy Win9x and future XP+ systems behave the same (unless you
         make a special effort to use a "depreciated" feature fncase=n).
         Provide a utility to downcase SFN files in the filesystem.
         
         We could decouple fncase=n behavior on lfn enabled systems from
         the poorly defined interrupt and implement a much simpler rule
         (such as if all chars are upper case ... or whatever ...) so it
         is consistent, can be easily defined.
         
         We could leave it the way it is, which generates different 
         default behavior on newer windows vs. old ones.  If we decide
         this it would be *BY DESIGN*.  At this point it would not be
         a W2K or XP bug but a design decision to keep an inconsistency
         which wasn't needed.  
         
         I believe the reason we lower case names on lfn systems is 
         convenience using SFN files (dual boot, non-lfn programs) and
         for esthetics - in which case converting G++.EXE to lower 
         case makes more sense than leaving it alone merely because
         + is not a valid short file name character (these rules are all
         documented somewhere, right?)  What about a string like .CVS?

OK?  I think these are two different issues.  We can solve #2 without
solving #1.  I would love to come up with acceptable solutions to both,
but I'm more concerned with #2.  

> > It also appears that in each of the 7 places this appears in the 
> > library it is part of a strcmp with the long name - many of which are 
> > not directly fncase related.  Even more interesting is that in none of
> > those 7 places is the short name returned used at all except in
> > the string comparison.
> 
> All true, but the function is also meant to be used by applications.
> Do we really want to go out and check that none does?  Why waste our
> time?  It's well known that once you provide an external function,
> there's no way back--the genie is out of the bottle for good.

I am trying to make the case that we should remove usage of 
_lfn_gen_short_name from the 7 places in our libc and replace it with
simpler code which just determines if we should potentially lower case
letters (the additional fncase type flag will still be outside the
test loop to override this).

I never intended to replace _lfn_gen_short_name in the library with 
emulation code (except for libc testing purposes).  Even if we remove
it from libc usage we still ship it as part of libc and need to make 
some effort to cover up the bad Win2K/XP behavior.

> > So this function is not actually used
> > anywhere in the library and each of these 7 places could be replaced
> > by an even simpler copy of what I provided - which just returns a
> > true or false flag if any characters would be changed.
> 
> Whether we do or don't replace the code which calls
> _lfn_gen_short_name in the library is a separate matter.  What I was
> arguing in this part was that the new code cannot be called
> _lfn_gen_short_name because it isn't equivalent to what
> _lfn_gen_short_name does now.

I agree with that (decouple _lfn_gen_short_name fix issue from fncase issue).

> > For example, if lfn=n we should always lower case
> > the names (a very simple test) instead of needing to generate a 
> > string we strcmp with, throw away and then duplicate this behavior.
> 
> That would preclude a possibility to see file names on DOS in their
> original UPPER case; for example, try "djecho [A-Z]*" on plain DOS.
> IIRC, some package (Groff?) depends on that for its build procedure.

You would still have fncase and friends in the library to control
it, but if lfn=n you don't need to create a string and compare it just
to decide that yes you need to lower case the characters.  

> I agree with the goal; the argument is about the way to achieve that
> goal.
> 
> This issue is full of hidden gotchas and unintended consequences,
> because Microsoft's implementation of case-preservation is
> semi-broken, haphazard, and sometimes downright nonsensical.  

So why should we base fncase=n behavior on it?  Since it's mostly
esthetics on a lfn enabled system, let's do something clear and
well defined.  

> I have
> scars from fine-tuning these issues all over my heart, and I'm too old
> to see it (my heart) broken again.  We don't even have a test suite
> that is extensive enough to test the effect of such changes, so most
> probably we won't know until it's too late.
> 
> All I want is that we don't break what took so long to get right.

I'm looking for a low risk way to make Win2K and XP behave the same
as Win9x family products.  If there is no way, then maybe fncase=y
should be default if lfn is enabled for all platforms.

> So maybe the code I wrote is wasteful.  I understand that it might
> bug you to see a function which issues an RM interrupt, and whose
> output is used inefficiently, or even not used at all.  But it works;
> it was proven by two years of intensive use; and it certainly isn't a
> bottleneck in any real-life application.

You're taking this part too personally, stop it ! :-(  This should not
be any personal criticism of the code, or the motives, just a 
retrospective to say that we are in a mess with now and need to get
out of it.  (the code in discussion is a lot better than the crt0.s and
exceptn.s mess I left in place ...)

> Therefore, my suggestion is: let's make a local change in
> _lfn_gen_short_name so that it calls 71A8h with DH=1 on W2K and XP.
> (We should see that this doesn't break NT with the LFN TSR.)  The file
> names which come bogus as the result are very rare, and when they do
> happen all that we'll see is that the file name is not downcased when
> it should have been--not a big deal IMHO.

DH=1 breaks for essentially any non-Alpha character, so is a very
poor effort at fixing anything.  If any non-alpha character is rare,
then we can move to a much simpler implementation for Win9x, right? :-)

My emulation code does a better job than the interrupt does, and won't
break in some future Windows release.  But for that matter, I'd probably
replace the lookup table with tests for the 6 or so special chars in 
the 7 bit table and leave it at that.  While this might be acceptable
for _lfn_gen_short_name, I don't think this is acceptable for fncase
behavior since it won't act the same.  How many packages will configure
wrong on one platform or the other because of case?  Let's not design
such insanity into a release on purpose.  I can't think of any sane
reason why the presence of some 8-bit characters in a name would 
sometimes be OK and send the name to lower case, but others would
prevent it from going to lower case.  But that's the way it works today.

> However, you are doing the work, so eventually it's your call.  If you
> want to introduce a new function with the body you sent a while ago,
> and rewrite the other library functions to call it instead of
> _lfn_gen_short_name, feel free to go ahead and do it.

We have these discussions so that I may learn - more about the pain and 
scars in dealing with these issues in the past.  If I had all the answers
I'd have already cvs commit'ed it.  If you are not comfortable with what
I do here, I'm sure I will be missing something that will come back and
haunt everyone in the future.

At this point I'm probably most interested in how fncase=n should behave
on lfn systems.  Should leading periods prevent downcase?  Should special
characters (such as +, space) prevent downcase?  Any 8-bit chars prevent
downcase?  Should it only be for 8.3 type filenames?  Is this designed
to only handle pkziped/dual boot files and make 100% sure no others 
squeeze throught with a downcase?

What percentage of files on a disk being different via ls -R in case with
different algorithms would be acceptable?

I would probably limit it to 8.3 format files, non-leading . and all letters
being upper case A-Z (if any a-z present, it's not SFN).  Simplicity would 
say ignore 8-bit chars and non-legal short chars (such as +) - and just base 
the test on A-Z.  

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019