From: sandmann AT clio DOT rice DOT edu (Charles Sandmann) Message-Id: <10110140648.AA14621@clio.rice.edu> Subject: Re: W2K/XP fncase To: eliz AT is DOT elta DOT co DOT il Date: Sun, 14 Oct 2001 01:48:28 -0500 (CDT) Cc: djgpp-workers AT delorie DOT com In-Reply-To: <2957-Sat13Oct2001194151+0200-eliz@is.elta.co.il> from "Eli Zaretskii" at Oct 13, 2001 07:41:52 PM X-Mailer: ELM [version 2.5 PL2] Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Okay, maybe I need to restart this discussion. Item 1. Interrupt 0x21, 71A8 is hopelessly broken on NT and XP. Even using DH=1 results in completely bogus results if any non alpha characters are used. If only alpha characters are used the result is pretty trivial to generate. Any attempt to mess with the function will likely not be 100% compatible with Win 9x implementations. We could document this and forget it. We could check for bogosity and then set a flag to jump to emulation code. We could simply truncate to 8.3 as upper case if bogus. Item 2. fncase=n handling is default with DJGPP. This behavior is very poorly defined on Windows 9x systems since it relies on an interrupt than no one can completely explain the behavior. Predicting before hand if a file you see will be converted to lower case is hit and miss. The interrupt it relies on does not function on the family of Windows which has replaced the 9x family (this is similar to refusing to deal with FCBs being removed ...). We could make fncase=y the default on lfn enabled systems, so legacy Win9x and future XP+ systems behave the same (unless you make a special effort to use a "depreciated" feature fncase=n). Provide a utility to downcase SFN files in the filesystem. We could decouple fncase=n behavior on lfn enabled systems from the poorly defined interrupt and implement a much simpler rule (such as if all chars are upper case ... or whatever ...) so it is consistent, can be easily defined. We could leave it the way it is, which generates different default behavior on newer windows vs. old ones. If we decide this it would be *BY DESIGN*. At this point it would not be a W2K or XP bug but a design decision to keep an inconsistency which wasn't needed. I believe the reason we lower case names on lfn systems is convenience using SFN files (dual boot, non-lfn programs) and for esthetics - in which case converting G++.EXE to lower case makes more sense than leaving it alone merely because + is not a valid short file name character (these rules are all documented somewhere, right?) What about a string like .CVS? OK? I think these are two different issues. We can solve #2 without solving #1. I would love to come up with acceptable solutions to both, but I'm more concerned with #2. > > It also appears that in each of the 7 places this appears in the > > library it is part of a strcmp with the long name - many of which are > > not directly fncase related. Even more interesting is that in none of > > those 7 places is the short name returned used at all except in > > the string comparison. > > All true, but the function is also meant to be used by applications. > Do we really want to go out and check that none does? Why waste our > time? It's well known that once you provide an external function, > there's no way back--the genie is out of the bottle for good. I am trying to make the case that we should remove usage of _lfn_gen_short_name from the 7 places in our libc and replace it with simpler code which just determines if we should potentially lower case letters (the additional fncase type flag will still be outside the test loop to override this). I never intended to replace _lfn_gen_short_name in the library with emulation code (except for libc testing purposes). Even if we remove it from libc usage we still ship it as part of libc and need to make some effort to cover up the bad Win2K/XP behavior. > > So this function is not actually used > > anywhere in the library and each of these 7 places could be replaced > > by an even simpler copy of what I provided - which just returns a > > true or false flag if any characters would be changed. > > Whether we do or don't replace the code which calls > _lfn_gen_short_name in the library is a separate matter. What I was > arguing in this part was that the new code cannot be called > _lfn_gen_short_name because it isn't equivalent to what > _lfn_gen_short_name does now. I agree with that (decouple _lfn_gen_short_name fix issue from fncase issue). > > For example, if lfn=n we should always lower case > > the names (a very simple test) instead of needing to generate a > > string we strcmp with, throw away and then duplicate this behavior. > > That would preclude a possibility to see file names on DOS in their > original UPPER case; for example, try "djecho [A-Z]*" on plain DOS. > IIRC, some package (Groff?) depends on that for its build procedure. You would still have fncase and friends in the library to control it, but if lfn=n you don't need to create a string and compare it just to decide that yes you need to lower case the characters. > I agree with the goal; the argument is about the way to achieve that > goal. > > This issue is full of hidden gotchas and unintended consequences, > because Microsoft's implementation of case-preservation is > semi-broken, haphazard, and sometimes downright nonsensical. So why should we base fncase=n behavior on it? Since it's mostly esthetics on a lfn enabled system, let's do something clear and well defined. > I have > scars from fine-tuning these issues all over my heart, and I'm too old > to see it (my heart) broken again. We don't even have a test suite > that is extensive enough to test the effect of such changes, so most > probably we won't know until it's too late. > > All I want is that we don't break what took so long to get right. I'm looking for a low risk way to make Win2K and XP behave the same as Win9x family products. If there is no way, then maybe fncase=y should be default if lfn is enabled for all platforms. > So maybe the code I wrote is wasteful. I understand that it might > bug you to see a function which issues an RM interrupt, and whose > output is used inefficiently, or even not used at all. But it works; > it was proven by two years of intensive use; and it certainly isn't a > bottleneck in any real-life application. You're taking this part too personally, stop it ! :-( This should not be any personal criticism of the code, or the motives, just a retrospective to say that we are in a mess with now and need to get out of it. (the code in discussion is a lot better than the crt0.s and exceptn.s mess I left in place ...) > Therefore, my suggestion is: let's make a local change in > _lfn_gen_short_name so that it calls 71A8h with DH=1 on W2K and XP. > (We should see that this doesn't break NT with the LFN TSR.) The file > names which come bogus as the result are very rare, and when they do > happen all that we'll see is that the file name is not downcased when > it should have been--not a big deal IMHO. DH=1 breaks for essentially any non-Alpha character, so is a very poor effort at fixing anything. If any non-alpha character is rare, then we can move to a much simpler implementation for Win9x, right? :-) My emulation code does a better job than the interrupt does, and won't break in some future Windows release. But for that matter, I'd probably replace the lookup table with tests for the 6 or so special chars in the 7 bit table and leave it at that. While this might be acceptable for _lfn_gen_short_name, I don't think this is acceptable for fncase behavior since it won't act the same. How many packages will configure wrong on one platform or the other because of case? Let's not design such insanity into a release on purpose. I can't think of any sane reason why the presence of some 8-bit characters in a name would sometimes be OK and send the name to lower case, but others would prevent it from going to lower case. But that's the way it works today. > However, you are doing the work, so eventually it's your call. If you > want to introduce a new function with the body you sent a while ago, > and rewrite the other library functions to call it instead of > _lfn_gen_short_name, feel free to go ahead and do it. We have these discussions so that I may learn - more about the pain and scars in dealing with these issues in the past. If I had all the answers I'd have already cvs commit'ed it. If you are not comfortable with what I do here, I'm sure I will be missing something that will come back and haunt everyone in the future. At this point I'm probably most interested in how fncase=n should behave on lfn systems. Should leading periods prevent downcase? Should special characters (such as +, space) prevent downcase? Any 8-bit chars prevent downcase? Should it only be for 8.3 type filenames? Is this designed to only handle pkziped/dual boot files and make 100% sure no others squeeze throught with a downcase? What percentage of files on a disk being different via ls -R in case with different algorithms would be acceptable? I would probably limit it to 8.3 format files, non-leading . and all letters being upper case A-Z (if any a-z present, it's not SFN). Simplicity would say ignore 8-bit chars and non-legal short chars (such as +) - and just base the test on A-Z.