www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/10/15/19:50:32

From: pjfarley AT banet DOT net (Peter J. Farley III)
Newsgroups: comp.lang.awk,comp.os.msdos.djgpp
Subject: Re: Anyone have code to strip text from HP-PCL5 files?
Date: Thu, 15 Oct 1998 23:47:09 GMT
Message-ID: <3626885a.2179579@news3.banet.net>
References: <36258305 DOT 18274931 AT news3 DOT banet DOT net> <705drn$k84 AT jupiter DOT planet DOT net>
X-Newsreader: Forte Free Agent 1.1/32.230
NNTP-Posting-Host: 32.100.250.124
X-Trace: 15 Oct 1998 23:46:24 GMT, 32.100.250.124
Organization: IBM.NET
Lines: 48
X-Notice: Items posted that violate the IBM.NET Acceptable Use Policy
X-Notice: should be reported to postmaster AT ibm DOT net
X-Complaints-To: postmaster AT ibm DOT net
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

"Harlan Grove" <HrlnGrv AT aol DOT com> wrote:
<Snipped>
>I have a very basic awk utility that simply strips almost all HP-PCL escape
>sequences from files that otherwise contain plain text. It doesn't translate
>positioning sequences, so if your text contains overwriting, tabstops or
>other positioning formatting, it'll be garbled. Also, it dies when it
>encounters embedded binary data. Without further ado, here it is.
>
># Choke on binary data, embedded fonts, etc.
>/\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ {
>    print "Encountered binary data block. Unable to procede."
>    exit
>}
>{
>    gsub("\x1B[9=]", "")            # simple sequences
>    gsub("\x1B[^A-Z@]*[A-Z@]", "")  # complex sequences
>    print
>}

Thanks for the code, Harlan.  Unfortunately, I made an incorrect
assumption, and it looks like the files I've got are not PCL5, but
something called PCLXL.  Here are the headers in the file:

%-12345X AT PJL COMMENT HP LaserJet 6P/6MP - Enhanced Driver
@PJL COMMENT 1.20.0.0
@PJL SET PAGEPROTECT=AUTO
@PJL SET ECONOMODE=OFF
@PJL SET RESOLUTION=600
@PJL SET TIMEOUT=90
@PJL DEFAULT MPTRAY=FIRST
@PJL ENTER LANGUAGE = PCLXL
) HP-PCL XL;1;1;Comment Copyright Hewlett-Packard Company 1989-1996

Have you or anyone else ever seen this printer language?  I'm not
familiar with it myself.  I wonder if it's an extension of HPGL, the
plotting language?

I can see the text I want to extract when I browse the file, but there
is a *LOT* of binary stuff and what looks like font information in
between chunks of text.

I guess I'll go to one of the HP forums and ask around there.

Thanks again for your code.  It may well come in handy one day!

----------------------------------------------------
Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR
                     pjfarley AT nospam DOT banet DOT net)

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019