www.delorie.com/archives/browse.cgi   search  
Mail Archives: geda-user/2015/09/04/10:58:04

X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f
X-Recipient: geda-user AT delorie DOT com
X-Mailer: exmh version 2.8.0 04/21/2012 (debian 1:2.8.0~rc1-2) with nmh-1.5
X-Exmh-Isig-CompType: repl
X-Exmh-Isig-Folder: inbox
From: karl AT aspodata DOT se
To: geda-user AT delorie DOT com
Subject: Re: [geda-user] Re: pdf table extraction
In-reply-to: <CAOFvGD4rf8e_4DCF8fjS5i3zXebjM_PiR3ebRhdfZPZ5LmrBsw@mail.gmail.com>
References: <CAOP4iL3YWQ_MH3HNnyDHMGCGeYFBmazwcw7Af_GATQzAUQJ57g AT mail DOT gmail DOT com> <alpine DOT DEB DOT 2 DOT 00 DOT 1509040545240 DOT 6924 AT igor2priv> <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <alpine DOT DEB DOT 2 DOT 00 DOT 1509041305230 DOT 6924 AT igor2priv> <20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se> <CAOFvGD4rf8e_4DCF8fjS5i3zXebjM_PiR3ebRhdfZPZ5LmrBsw AT mail DOT gmail DOT com>
Comments: In-reply-to "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" <geda-user AT delorie DOT com>
message dated "Fri, 04 Sep 2015 08:27:40 -0400."
Mime-Version: 1.0
Message-Id: <20150904145747.7ADC0809DB89@turkos.aspodata.se>
Date: Fri, 4 Sep 2015 16:57:47 +0200 (CEST)
X-Virus-Scanned: ClamAV using ClamSMTP
Reply-To: geda-user AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: geda-user AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

Jason:
> My absolute favorite for extracting data from tables in PDF datasheets is
> Tabula (http://tabula.technology/), it has a nice interface.

Seems to some mix of ruby, java and javascript, and you run it through
your browser.

Last time I checked there were no 64bit java, seems there is now:

 https://www.java.com/en/download/manual.jsp

but it's 68MB for java and 53 for the source, a little big.
Does it work with any free java implementations or does it require
the latest sun/oracle one ?

Also, in the repository, there are jar files and no java source.
The pdf thing seems to be done by the java code which is binary.

So it's hard to get ideas from their code and to contribute.

///

 Looking at
https://source.opennews.org/en-US/articles/introducing-tabula/

they use the same intermediary format (except they also have the 
rotation parameter present) -- or used, since the mentioned ruby
script is not any longer present.

But they at least points to
 http://www.tamirhassan.com/index.html#Publications

which points to theese possible usable articles:
 http://www.orsigiorgio.net/wp-content/papercite-data/pdf/gho*12.pdf
 http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan.pdf
 http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf
 http://rewerse.net/publications/download/REWERSE-RP-2006-085.pdf

Regards,
/Karl Hammar

-----------------------------------------------------------------------
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019