X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Eric Blake Subject: Re: =?utf-8?b?YmFzaC0zLjEtNxskQiEhGyhCQlVH?= Date: Wed, 13 Sep 2006 14:31:22 +0000 (UTC) Lines: 34 Message-ID: References: <091320060438 DOT 11140 DOT 45078B490008FD8600002B8422007610640A050E040D0C079D0A AT comcast DOT net> <20060913052510 DOT GB1256 AT trixie DOT casa DOT cgf DOT cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit User-Agent: Loom/3.14 (http://gmane.org/) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Christopher Faylor cygwin.com> writes: > Is bash assuming that it can read N characters and then subtract M > characters from the current position to get back to the beginning of a > line? If so, hmm. I guess this explains why it was reading a byte at a > time before. It must be counting characters rather than calling lseek > to figure out where it is. Yes, indeed, and it seems like reasonable semantics to expect as well (nevermind that it means that text mode on a seekable file involves a lot more processing, to consistently present the user with character count instead of byte offset). When a file is seekable, bash reads a buffer at a time for speed, but then must reseek to the offset where it last processed input before invoking any subprocesses, since POSIX requires that seekable files be left in a consistent state when swapping between multiple handles to the same underlying file description (even if the multiple handles exist in separate processes). When using stdio (such as fread and fseek), this works due to code in newlib (see __SCLE in stdio.h). But bash uses low-level Unix I/O, and does not benefit from newlib's approach. In a binary mount, seeking backwards by the character offset from where bash has processed to the end of the buffer it has read just works. It is only in a text mount where having lseek report the binary offset within the file, rather than the character offset, is causing problems. So I will probably end up reinstating a form of the previous #ifdef __CYGWIN__ check for is_seekable in bash 3.1-8 to chek whether a file is in text mode, in which case it is non-seekable; that is certainly a faster solution than waiting for cygwin to make a change for lseek on a text file to consistently use a character offset. But I intend that on binary files, \r\n line endings will treat the \r as part of the line, so at least binary mounts won't suffer from the speed impact of treating a file as unseekable the way bash 3.1-6 does. -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/