X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f Message-ID: <3CCD7E35.20A38E1E@acm.org> From: Eric Sosman X-Mailer: Mozilla 4.72 [en] (Win95; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.os.msdos.djgpp Subject: Re: how to determine if a file is text/binary References: Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Lines: 28 Date: Mon, 29 Apr 2002 22:08:47 GMT NNTP-Posting-Host: 12.91.3.203 X-Complaints-To: abuse AT worldnet DOT att DOT net X-Trace: bgtnsc04-news.ops.worldnet.att.net 1020118127 12.91.3.203 (Mon, 29 Apr 2002 22:08:47 GMT) NNTP-Posting-Date: Mon, 29 Apr 2002 22:08:47 GMT Organization: AT&T Worldnet To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com xeon wrote: > > Hi, > > I'm wondering, how to determine is a file is a text file, or a binary > file, programatically. I'm thinking about reading 4 bytes from the > file and test them if they're in the range of usual text ([a-z], > [A-Z], etc. The 4 bytes is read from the following locations : 1st > byte, last byte, and 2 randomly selected offset inside the file. Is > this enough? Not really. The fundamental problem is in formulating precise definitions of "text file" and "binary file:" try to do so and you'll quickly discover the kinds of trouble you'll get into. For example, is a file containing "abc\n" a text file of one three-letter newline-terminated line, or is it a binary file storing the number 0x6162630a == 1633837834? Or 'tother way round, if you find a byte with the high bit set are you looking at a binary file or at a text file containing the character "ß"? That said, you can make a guess of sorts, although you'll never be 100% accurate. Take a look at the source of the "file" program for some ideas. -- Eric Sosman esosman AT acm DOT org