www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/10/15/14:16:25

From: "Harlan Grove" <HrlnGrv AT aol DOT com>
Newsgroups: comp.lang.awk,comp.os.msdos.djgpp
Subject: Re: Anyone have code to strip text from HP-PCL5 files?
Date: Thu, 15 Oct 1998 11:17:50 -0700
Organization: Planet Access Network Inc.
Lines: 36
Message-ID: <705drn$k84@jupiter.planet.net>
References: <36258305 DOT 18274931 AT news3 DOT banet DOT net>
NNTP-Posting-Host: 207.3.98.50
X-Newsreader: Microsoft Outlook Express 4.72.3110.5
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Peter J. Farley III wrote in message <36258305 DOT 18274931 AT news3 DOT banet DOT net>...
>I know there are programs (e.g., pstotext) to strip text from
>Postscript files, but has anyone got any code to do the same thing for
>HP-PCL5 files?
>
>Alternatively, are there any editors or word processors that
>understand HP-PCL5, and can present an on-screen image of the text
>that the printer would produce, maybe with an option to save-as a
>simple text file (i.e., with the PCL5 codes stripped out)?
>
>TIA for any help, info or url's you can provide.
>
>----------------------------------------------------
>Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR
>                     pjfarley AT nospam DOT banet DOT net)

I have a very basic awk utility that simply strips almost all HP-PCL escape
sequences from files that otherwise contain plain text. It doesn't translate
positioning sequences, so if your text contains overwriting, tabstops or
other positioning formatting, it'll be garbled. Also, it dies when it
encounters embedded binary data. Without further ado, here it is.

# Choke on binary data, embedded fonts, etc.
/\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ {
    print "Encountered binary data block. Unable to procede."
    exit
}

{
    gsub("\x1B[9=]", "")            # simple sequences
    gsub("\x1B[^A-Z@]*[A-Z@]", "")  # complex sequences
    print
}



- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019