www.delorie.com/archives/browse.cgi   search  
Mail Archives: geda-user/2015/09/04/07:21:53

X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f
X-Recipient: geda-user AT delorie DOT com
X-Mailer: exmh version 2.8.0 04/21/2012 (debian 1:2.8.0~rc1-2) with nmh-1.5
X-Exmh-Isig-CompType: repl
X-Exmh-Isig-Folder: inbox
From: karl AT aspodata DOT se
To: geda-user AT delorie DOT com
Subject: [geda-user] Re: pdf table extraction
In-reply-to: <alpine.DEB.2.00.1509041305230.6924@igor2priv>
References: <CAOP4iL3YWQ_MH3HNnyDHMGCGeYFBmazwcw7Af_GATQzAUQJ57g AT mail DOT gmail DOT com> <alpine DOT DEB DOT 2 DOT 00 DOT 1509040545240 DOT 6924 AT igor2priv> <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <alpine DOT DEB DOT 2 DOT 00 DOT 1509041305230 DOT 6924 AT igor2priv>
Comments: In-reply-to gedau AT igor2 DOT repo DOT hu
message dated "Fri, 04 Sep 2015 13:06:09 +0200."
Mime-Version: 1.0
Message-Id: <20150904112133.85560809DB82@turkos.aspodata.se>
Date: Fri, 4 Sep 2015 13:21:33 +0200 (CEST)
X-Virus-Scanned: ClamAV using ClamSMTP
Reply-To: geda-user AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: geda-user AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

Igor2:
> On Fri, 4 Sep 2015, karl AT aspodata DOT se wrote:
> > Igor2:
> > [ about tables in pdf's ]
> >
> > It's true that pdf doesn't have a table structure.
> >
> > I have some experimetal code to extract tables from pdf's, the is in:
> >
> >  http://turkos.aspodata.se/git/openhw/pdftosym/Experimental/
>
> Thanx, will check it out. What you wrote suggests your script works 
> similar to mine.

Yes, but I got the impression you used the graphical elements in the 
file and that you possible used pdftohtml in "html" mode, which doesn't
give you the text positions. I have been working purely on the textual 
part.

And beware that the code above is a big mess. Perhaps you can have a
look at:

 http://turkos.aspodata.se/computing/pdfextr.pl

which is a little less unpolished, it extracts things from an invoice
(sorry can't provide you with the input data example).

Regards,
/Karl Hammar

-----------------------------------------------------------------------
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019