X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4B0B21E0.3050909@tlinx.org> Date: Mon, 23 Nov 2009 15:59:28 -0800 From: Linda Walsh User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 ThunderBrowse/3.2.6.5 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: "cygwin AT cygwin DOT com" Subject: cyg1.7 - DOS character remapping: change request. X-Stationery: 0.4.10 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Was thinking about a 1 or 2 mods for the new characters that are being remapped to the 'private area', but also a compatibility bug. Maybe I'll get the bug out of the way first. Filenames created on a samba share are not visible on the server as anything resembling what I used on Cygwin. I see that as pretty bad (scale 1-10, maybe a 7). One possible way to ameliorate that problem is in first suggestion I thought of. Instead of using random characters out of the 'random free area' -- which could display as anything if you aren't in cygwin, depending on what charset you have loaded, why not use 'dedicated' unicode characters that map to the signs for those characters? They aren't exactly equivalent, as they include some built-in display spacing, BUT, they would display a colon as a colon, "*" as a asterisk, etc. There are reserved and 'fullwidth' forms of each of the characters that need remapping. The fullwidth forms add some visual space around the characters though it's still only 1 Unicode character. The mappings I'd suggest as default mappings are as follows: dosch U-char Unicode-val ;Unicode Comment " " U+FF02 ;FULLWIDTH QUOTATION MARK * * U+FF0A ;FULLWIDTH ASTERISK : : U+FF1A ;FULLWIDTH COLON < < U+FF1C ;FULLWIDTH LESS-THAN SIGN > > U+FF1E ;FULLWIDTH GREATER-THAN SIGN ? ? U+FF1F ;FULLWIDTH QUESTION MARK | | U+FF5C ;FULLWIDTH VERTICAL LINE --- All of the above are the 'FULLWIDTH' versions of each of those characters. They aren't a perfect substitute, as they don't have the same ascii values, and, correctly implemented, they will display a bit of extra white space around the char. But on the positive side -- and important for unicode parsing, __each of the fullwidth forms has the same character class as its ascii equivalent__. So the full width quote has the property 'quotation mark'. So if the name was read by a unicode capable program (like perl), it could process the special characters in whatever way it would have processed the real ascii character. The other benefit is that a great many unicode charsets have mappings for the full width forms. Whatever charset I'm using -- they all displayed as what they are (with slight stylistic variations). They also displayed in my shell client on linux as their ascii equivalents and displayed in 3 windows command line clients I tried (Console2, mintty, standard cygwin). I'd consider that a strong plus for compatibility -- since outside of cygwin, use of the private area will often get you question marks or little square boxes or nothing at all, but this way, visually, at least, they'll look close to how they are intended to look. FWIW, one can use the fullwidth forms of / and \ in pathnames where the OS treats them as normal characters. That's the most important suggestion I have. As a /possible/, further, suggestion (that would take more work, and I don't believe is as important if the above change to use fullwidth characters is made), would be to allow User assignment of what Unicode char to substitute in for the special characters. The chars could be specified by their html id or a pseudo if non exists, so syntax would look like: CYGWIN="[=U+;]" That would allow someone to assign any value they wanted. An advantage of that is that those who still want to access 'ADS', could have "col=U+003A" in their CYGWIN var, and they'd still get that ability. I strongly hope and urge you to use the FULLWIDTH equivalents of the characters you are using. They are shell safe and display properly (I've been using a few of them like the colon and the occasional 'slash',) to properly display song titles in my music library. Linda -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple