From: pjfarley AT banet DOT net (Peter J. Farley III) Newsgroups: comp.lang.awk,comp.os.msdos.djgpp Subject: Re: Anyone have code to strip text from HP-PCL5 files? Date: Thu, 15 Oct 1998 23:47:09 GMT Message-ID: <3626885a.2179579@news3.banet.net> References: <36258305 DOT 18274931 AT news3 DOT banet DOT net> <705drn$k84 AT jupiter DOT planet DOT net> X-Newsreader: Forte Free Agent 1.1/32.230 NNTP-Posting-Host: 32.100.250.124 X-Trace: 15 Oct 1998 23:46:24 GMT, 32.100.250.124 Organization: IBM.NET Lines: 48 X-Notice: Items posted that violate the IBM.NET Acceptable Use Policy X-Notice: should be reported to postmaster AT ibm DOT net X-Complaints-To: postmaster AT ibm DOT net To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com "Harlan Grove" wrote: >I have a very basic awk utility that simply strips almost all HP-PCL escape >sequences from files that otherwise contain plain text. It doesn't translate >positioning sequences, so if your text contains overwriting, tabstops or >other positioning formatting, it'll be garbled. Also, it dies when it >encounters embedded binary data. Without further ado, here it is. > ># Choke on binary data, embedded fonts, etc. >/\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ { > print "Encountered binary data block. Unable to procede." > exit >} >{ > gsub("\x1B[9=]", "") # simple sequences > gsub("\x1B[^A-Z@]*[A-Z@]", "") # complex sequences > print >} Thanks for the code, Harlan. Unfortunately, I made an incorrect assumption, and it looks like the files I've got are not PCL5, but something called PCLXL. Here are the headers in the file: %-12345X AT PJL COMMENT HP LaserJet 6P/6MP - Enhanced Driver @PJL COMMENT 1.20.0.0 @PJL SET PAGEPROTECT=AUTO @PJL SET ECONOMODE=OFF @PJL SET RESOLUTION=600 @PJL SET TIMEOUT=90 @PJL DEFAULT MPTRAY=FIRST @PJL ENTER LANGUAGE = PCLXL ) HP-PCL XL;1;1;Comment Copyright Hewlett-Packard Company 1989-1996 Have you or anyone else ever seen this printer language? I'm not familiar with it myself. I wonder if it's an extension of HPGL, the plotting language? I can see the text I want to extract when I browse the file, but there is a *LOT* of binary stuff and what looks like font information in between chunks of text. I guess I'll go to one of the HP forums and ask around there. Thanks again for your code. It may well come in handy one day! ---------------------------------------------------- Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR pjfarley AT nospam DOT banet DOT net)