Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Thu, 25 Jul 2002 15:50:26 +0800 From: Greg Matheson To: cygwin AT cygwin DOT com Subject: LONG! perl-5.8.0 handling of \n (was: perl-5.6.1 handling of \n Message-ID: <20020725155026.G37421@ms.chinmin.edu.tw> Mail-Followup-To: cygwin AT cygwin DOT com References: <20020719105932 DOT A11717 AT ms DOT chinmin DOT edu DOT tw> <20020722150604 DOT A85877 AT ms DOT chinmin DOT edu DOT tw> <20020723140301 DOT B44140 AT ms DOT chinmin DOT edu DOT tw> <20020724130850 DOT A46852 AT ms DOT chinmin DOT edu DOT tw> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i The reason I got interested in this was 5.8.0's breaking of code working in 5.6.1. The code compared the number of bytes in the internal representation of an email message with the number stored in the file. Here is the result of my earlier script run on 5.6.1 For underlying /binary/ mount mode Discipline: default String length: 8 File size: 8 Discipline: binary String length: 8 File size: 8 Discipline: text String length: 10 File size: 10 For underlying /text/ mount mode Discipline: default String length: 10 File size: 10 Discipline: binary String length: 8 File size: 8 Discipline: text String length: 10 File size: 10 Here is the same script on 5.8.0: For underlying /binary/ mount mode Discipline: default String length: 8 File size: 10 Discipline: binary String length: 8 File size: 8 Discipline: text String length: 8 File size: 10 For underlying /text/ mount mode Discipline: default String length: 8 File size: 10 Discipline: binary String length: 8 File size: 8 Discipline: text String length: 8 File size: 10 If some of the values were 'wrong' under 5.6.1, at least they were equal :-) With 5.8.0, it is finding the 'right' string length in all cases, but now this value is only equal to the file size when binmode() is used (ie writing to a Unix style file is forced), even on an underlying binary mode mount. It appears the following from perldoc perlcygwin is no longer an adequate account of what is happening. o Text/Binary When a file is opened it is in either text or binary mode. In text mode a file is subject to CR/LF/Ctrl-Z translations. With Cygwin, the default mode for an open() is determined by the mode of the mount that underlies the file. Perl provides a binmode() func- tion to set binary mode on files that otherwise would be treated as text. sysopen() with the "O_TEXT" flag sets text mode on files that otherwise would be treated as binary: It appears that it is no longer just a choice between writing to a binary mode mount or with binmode, as opposed to a text mode mount or with O_TEXT. According to perldoc perldelta o Previous versions of perl and some readings of some sections of Camel III implied that ":raw" "discipline" was the inverse of ":crlf". Turning off "clrfness" is no longer enough to make a stream truly binary. So the PerlIO ":raw" discipline is now formally defined as being equivalent to binmode(FH) - which is in turn defined as doing whatever is necessary to pass each byte as-is without any translation. In particular binmode(FH) - and hence ":raw" - will now turn off both CRLF and UTF-8 translation and remove other "layers" (e.g. :encoding()) which would modify byte stream. This seems to be a consequence of the new IO, o IO is now by default done via PerlIO rather than sys- tem's "stdio". PerlIO allows "layers" to be "pushed" onto a file handle to alter the handle's behaviour. Layers can be specified at open time via 3-arg form of open: open($fh,'>:crlf :utf8', $path) || ... or on already opened handles via extended "binmode": binmode($fh,':encoding(iso-8859-7)'); The built-in layers are: unix (low level read/write), stdio (as in previous Perls), perlio (re-implementa- tion of stdio buffering in a portable manner), crlf (does CRLF <=> "\n" translation as on Win32, but available on any platform). A mmap layer may be available if platform supports it (mostly UNIXes). Layers to be applied by default may be specified via the 'open' pragma. perldoc perlio says about defaults: If the platform is MS-DOS like and normally does CRLF to "\n" translation for text files then the default layers are : unix crlf (The low level "unix" layer may be replaced by a platform specific low level layer.) Otherwise if "Configure" found out how to do "fast" IO using system's stdio, then the default layers are : unix stdio Otherwise the default layers are unix perlio ... The default can be overridden by setting the environment variable PERLIO to a space separated list of layers (unix or platform low level layer is always pushed first). ... cd .../perl/t PERLIO=stdio ./perl harness PERLIO=perlio ./perl harness So my earlier script may have been an adequate test bed for 5.6.1. The read on the file used a default open, and the string read in seemed to reflect what had been written to the file. With 5.8.0 however, the read with a default open appears to be doing a translation of CRLF to \n, because the platform is 'MS-DOS like'. I need a script to test the effects of the various layers. In any case, looking at the results of the earlier script for 5.8.0 at the top of the email and comparing them with those for 5.6.1, it also appears that default writes to a file, EVEN IF ON AN UNDERLYING BINARY MOUNT, will now leave CRs in the file. This is something that people won't be too happy about, I think. -- Greg Matheson The best jokes are Chinmin College those you play on yourself. Taiwan Penpals Archive -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/