www.delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-SWARE-Spam-Status: | No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_PASS |
X-Spam-Check-By: | sourceware.org |
Message-ID: | <BLU113-W29F6E906F6793615A4A75DBEA70@phx.gbl> |
From: | Mike Marchywka <marchywka AT hotmail DOT com> |
To: | <cygwin AT cygwin DOT com> |
Subject: | RE: pdftk and apropos - general questions |
Date: | Wed, 4 Mar 2009 15:33:07 -0500 |
In-Reply-To: | <20090304175648.GA5388@KCJs-Computer> |
References: | <BLU113-W74226535EC192149C5AEABEA60 AT phx DOT gbl> <49AE9494 DOT 1000804 AT veritech DOT com> <BLU113-W51FC38A48F454394262F2CBEA70 AT phx DOT gbl> <20090304175648 DOT GA5388 AT KCJs-Computer> |
MIME-Version: | 1.0 |
X-IsSubscribed: | yes |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Unsubscribe: | <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Note-from-DJ: | This may be spam |
---------------------------------------- > Date: Wed, 4 Mar 2009 09:56:49 -0800 > From: garyjohn AT spocom DOT com > To: cygwin AT cygwin DOT com > Subject: Re: pdftk and apropos - general questions > > On 2009-03-04, Mike Marchywka wrote: > >>> Mike Marchywka wrote: >>>> I've had a persistent problem getting apropos to work >>>> as it never finds anything appropriate. Is there >>>> something I need to do to make this work? >>>> >>> After each setup session, you need to run, /usr/sbin/makewhatis -u. >> >> >> Thanks but I did get that far after earlier hints and you list >> below is about what I ended up with too. One problem >> I ran into was trying to extract sensical text from the >> IRS instructions. > > I have that problem with the printed versions. > >> I used the pdftotext utility IIRC from >> >> http://www.foolabs.com/xpdf/download.html >> >> and it didn't seem to be able to separate multi-column text >> automatically ( with sed and awk I got what I needed but what >> a mess). > > Did you use the -layout option to pdftotext? It makes a huge > difference on the documents I've converted, but they've all been > single column. I played with the options but I'm not sure the information is in the source PDF. I don't imagine the authors really cared too much about layout. IIRC, selection gave rectangles of the whole page wi= dth but also IIRC from scientific papers normally the selection went column by column. Somewhere between intelligent formatting and scanned pdf is probably the authoring tool that just puts out blocks of text that can't be extracted properly ( probably even be design to stop people from using information without pictures that someone spent a lot of time authoring ). I did try the pdftk on an f1040.pdf download but I finally had to install Acrobat Reader to look at the forms and fill it in. pdftk let me examine the filled in form but there was not immediate way to identify form fields- I have to look for meaningful names etc. I guess if I could enter input data into something I could use it would be worthwhile writing a script to fill out the form. I'll use a web form for a few lines of input but if I have to type 100 numbers into an information black hole I'm happy to kill a tree or two. > > Regards, > Gary > > > > -- > Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple > Problem reports: http://cygwin.com/problems.html > Documentation: http://cygwin.com/docs.html > FAQ: http://cygwin.com/faq/ > _________________________________________________________________ Windows Live=99 Groups: Create an online spot for your favorite groups to m= eet. http://windowslive.com/online/groups?ocid=3DTXT_TAGLM_WL_groups_032009 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |