From: "Harlan Grove" Newsgroups: comp.lang.awk,comp.os.msdos.djgpp Subject: Re: Anyone have code to strip text from HP-PCL5 files? Date: Thu, 15 Oct 1998 11:17:50 -0700 Organization: Planet Access Network Inc. Lines: 36 Message-ID: <705drn$k84@jupiter.planet.net> References: <36258305 DOT 18274931 AT news3 DOT banet DOT net> NNTP-Posting-Host: 207.3.98.50 X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Peter J. Farley III wrote in message <36258305 DOT 18274931 AT news3 DOT banet DOT net>... >I know there are programs (e.g., pstotext) to strip text from >Postscript files, but has anyone got any code to do the same thing for >HP-PCL5 files? > >Alternatively, are there any editors or word processors that >understand HP-PCL5, and can present an on-screen image of the text >that the printer would produce, maybe with an option to save-as a >simple text file (i.e., with the PCL5 codes stripped out)? > >TIA for any help, info or url's you can provide. > >---------------------------------------------------- >Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR > pjfarley AT nospam DOT banet DOT net) I have a very basic awk utility that simply strips almost all HP-PCL escape sequences from files that otherwise contain plain text. It doesn't translate positioning sequences, so if your text contains overwriting, tabstops or other positioning formatting, it'll be garbled. Also, it dies when it encounters embedded binary data. Without further ado, here it is. # Choke on binary data, embedded fonts, etc. /\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ { print "Encountered binary data block. Unable to procede." exit } { gsub("\x1B[9=]", "") # simple sequences gsub("\x1B[^A-Z@]*[A-Z@]", "") # complex sequences print }