www.delorie.com/archives/browse.cgi | search |
X-Authentication-Warning: | delorie.com: mail set sender to geda-user-bounces using -f |
X-Recipient: | geda-user AT delorie DOT com |
X-Mailer: | exmh version 2.8.0 04/21/2012 (debian 1:2.8.0~rc1-2) with nmh-1.5 |
X-Exmh-Isig-CompType: | repl |
X-Exmh-Isig-Folder: | inbox |
From: | karl AT aspodata DOT se |
To: | geda-user AT delorie DOT com |
Subject: | [geda-user] Re: pdf table extraction |
In-reply-to: | <alpine.DEB.2.00.1509041305230.6924@igor2priv> |
References: | <CAOP4iL3YWQ_MH3HNnyDHMGCGeYFBmazwcw7Af_GATQzAUQJ57g AT mail DOT gmail DOT com> <alpine DOT DEB DOT 2 DOT 00 DOT 1509040545240 DOT 6924 AT igor2priv> <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <alpine DOT DEB DOT 2 DOT 00 DOT 1509041305230 DOT 6924 AT igor2priv> |
Comments: | In-reply-to gedau AT igor2 DOT repo DOT hu |
message dated "Fri, 04 Sep 2015 13:06:09 +0200." | |
Mime-Version: | 1.0 |
Message-Id: | <20150904112133.85560809DB82@turkos.aspodata.se> |
Date: | Fri, 4 Sep 2015 13:21:33 +0200 (CEST) |
X-Virus-Scanned: | ClamAV using ClamSMTP |
Reply-To: | geda-user AT delorie DOT com |
Errors-To: | nobody AT delorie DOT com |
X-Mailing-List: | geda-user AT delorie DOT com |
X-Unsubscribes-To: | listserv AT delorie DOT com |
Igor2: > On Fri, 4 Sep 2015, karl AT aspodata DOT se wrote: > > Igor2: > > [ about tables in pdf's ] > > > > It's true that pdf doesn't have a table structure. > > > > I have some experimetal code to extract tables from pdf's, the is in: > > > > http://turkos.aspodata.se/git/openhw/pdftosym/Experimental/ > > Thanx, will check it out. What you wrote suggests your script works > similar to mine. Yes, but I got the impression you used the graphical elements in the file and that you possible used pdftohtml in "html" mode, which doesn't give you the text positions. I have been working purely on the textual part. And beware that the code above is a big mess. Perhaps you can have a look at: http://turkos.aspodata.se/computing/pdfextr.pl which is a little less unpolished, it extracts things from an invoice (sorry can't provide you with the input data example). Regards, /Karl Hammar ----------------------------------------------------------------------- Aspö Data Lilla Aspö 148 S-742 94 Östhammar Sweden +46 173 140 57
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |