www.delorie.com/archives/browse.cgi | search |
X-Authentication-Warning: | delorie.com: mail set sender to geda-user-bounces using -f |
X-Recipient: | geda-user AT delorie DOT com |
X-Mailer: | exmh version 2.8.0 04/21/2012 (debian 1:2.8.0~rc1-2) with nmh-1.5 |
X-Exmh-Isig-CompType: | repl |
X-Exmh-Isig-Folder: | inbox |
From: | karl AT aspodata DOT se |
To: | geda-user AT delorie DOT com |
Subject: | Re: [geda-user] Re: pdf table extraction |
In-reply-to: | <CAOFvGD4rf8e_4DCF8fjS5i3zXebjM_PiR3ebRhdfZPZ5LmrBsw@mail.gmail.com> |
References: | <CAOP4iL3YWQ_MH3HNnyDHMGCGeYFBmazwcw7Af_GATQzAUQJ57g AT mail DOT gmail DOT com> <alpine DOT DEB DOT 2 DOT 00 DOT 1509040545240 DOT 6924 AT igor2priv> <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <alpine DOT DEB DOT 2 DOT 00 DOT 1509041305230 DOT 6924 AT igor2priv> <20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se> <CAOFvGD4rf8e_4DCF8fjS5i3zXebjM_PiR3ebRhdfZPZ5LmrBsw AT mail DOT gmail DOT com> |
Comments: | In-reply-to "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" <geda-user AT delorie DOT com> |
message dated "Fri, 04 Sep 2015 08:27:40 -0400." | |
Mime-Version: | 1.0 |
Message-Id: | <20150904145747.7ADC0809DB89@turkos.aspodata.se> |
Date: | Fri, 4 Sep 2015 16:57:47 +0200 (CEST) |
X-Virus-Scanned: | ClamAV using ClamSMTP |
Reply-To: | geda-user AT delorie DOT com |
Errors-To: | nobody AT delorie DOT com |
X-Mailing-List: | geda-user AT delorie DOT com |
X-Unsubscribes-To: | listserv AT delorie DOT com |
Jason: > My absolute favorite for extracting data from tables in PDF datasheets is > Tabula (http://tabula.technology/), it has a nice interface. Seems to some mix of ruby, java and javascript, and you run it through your browser. Last time I checked there were no 64bit java, seems there is now: https://www.java.com/en/download/manual.jsp but it's 68MB for java and 53 for the source, a little big. Does it work with any free java implementations or does it require the latest sun/oracle one ? Also, in the repository, there are jar files and no java source. The pdf thing seems to be done by the java code which is binary. So it's hard to get ideas from their code and to contribute. /// Looking at https://source.opennews.org/en-US/articles/introducing-tabula/ they use the same intermediary format (except they also have the rotation parameter present) -- or used, since the mentioned ruby script is not any longer present. But they at least points to http://www.tamirhassan.com/index.html#Publications which points to theese possible usable articles: http://www.orsigiorgio.net/wp-content/papercite-data/pdf/gho*12.pdf http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan.pdf http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf http://rewerse.net/publications/download/REWERSE-RP-2006-085.pdf Regards, /Karl Hammar ----------------------------------------------------------------------- Aspö Data Lilla Aspö 148 S-742 94 Östhammar Sweden +46 173 140 57
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |