www.delorie.com/archives/browse.cgi   search  
Mail Archives: geda-user/2015/09/04/11:17:45

X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f
X-Recipient: geda-user AT delorie DOT com
X-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type;
bh=uqRxrRNfiHTvMGQ2vw4YaqD5MW4j7rQbxx2snZ+Jnyc=;
b=DA2yQmFkNDNDIffqvDhLfaaZTQS0ISvf0wc5BZpIJVr8ttDbxgmoRzMzN0fP4TWuP5
0n1dCRRbUPXITM96NH7yuQd0bwPLaAbK38Apm8VQoVkZDwgHhJ38nt9vDb/F3CD5EdlR
2tDcwRq0cJ9MBpDuPWKnmY6/2+3c5FP397XhndSsxLca3gLIRrvBgsLaSgv/lYhVHXAa
R9r5Te+qXyoVD9MspEwVQNhoHTgrsx7i3+DFn3LveTwzn16CAOFiXv6syx22lQF1jLRq
KwF8YoqOCZ7THGkYKrxT1HbobMb4Ev0MtUfXflkP0Bj1HG5v//61WzWZkKfsIUYSeJrl
AS8A==
MIME-Version: 1.0
X-Received: by 10.60.36.202 with SMTP id s10mr4003761oej.0.1441379847213; Fri,
04 Sep 2015 08:17:27 -0700 (PDT)
In-Reply-To: <20150904145747.7ADC0809DB89@turkos.aspodata.se>
References: <CAOP4iL3YWQ_MH3HNnyDHMGCGeYFBmazwcw7Af_GATQzAUQJ57g AT mail DOT gmail DOT com>
<alpine DOT DEB DOT 2 DOT 00 DOT 1509040545240 DOT 6924 AT igor2priv>
<20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se>
<alpine DOT DEB DOT 2 DOT 00 DOT 1509041305230 DOT 6924 AT igor2priv>
<20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se>
<CAOFvGD4rf8e_4DCF8fjS5i3zXebjM_PiR3ebRhdfZPZ5LmrBsw AT mail DOT gmail DOT com>
<20150904145747 DOT 7ADC0809DB89 AT turkos DOT aspodata DOT se>
Date: Fri, 4 Sep 2015 11:17:27 -0400
Message-ID: <CAOFvGD7BVApTVZd8HBRHBHC_wEPas9vVnTcC25z7dUODcNOxHw@mail.gmail.com>
Subject: Re: [geda-user] Re: pdf table extraction
From: "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" <geda-user AT delorie DOT com>
To: geda-user AT delorie DOT com
Reply-To: geda-user AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: geda-user AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

--089e013a14eac79951051eed6502
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It works fine with open source Java, just follow the installation
directions...

Regarding you second point, thankfully I have never needed to look at the
source code with this particular tool. If the core is not available try
contacting the author, maybe they have a reason? (ie. licensing issues or
something)

On Fri, Sep 4, 2015 at 10:57 AM, <karl AT aspodata DOT se> wrote:

> Jason:
> > My absolute favorite for extracting data from tables in PDF datasheets =
is
> > Tabula (http://tabula.technology/), it has a nice interface.
>
> Seems to some mix of ruby, java and javascript, and you run it through
> your browser.
>
> Last time I checked there were no 64bit java, seems there is now:
>
>  https://www.java.com/en/download/manual.jsp
>
> but it's 68MB for java and 53 for the source, a little big.
> Does it work with any free java implementations or does it require
> the latest sun/oracle one ?
>
> Also, in the repository, there are jar files and no java source.
> The pdf thing seems to be done by the java code which is binary.
>
> So it's hard to get ideas from their code and to contribute.
>
> ///
>
>  Looking at
> https://source.opennews.org/en-US/articles/introducing-tabula/
>
> they use the same intermediary format (except they also have the
> rotation parameter present) -- or used, since the mentioned ruby
> script is not any longer present.
>
> But they at least points to
>  http://www.tamirhassan.com/index.html#Publications
>
> which points to theese possible usable articles:
>  http://www.orsigiorgio.net/wp-content/papercite-data/pdf/gho*12.pdf
>  http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan.pdf
>  http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf
>  http://rewerse.net/publications/download/REWERSE-RP-2006-085.pdf
>
> Regards,
> /Karl Hammar
>
> -----------------------------------------------------------------------
> Asp=C3=B6 Data
> Lilla Asp=C3=B6 148
> S-742 94 =C3=96sthammar
> Sweden
> +46 173 140 57
>
>
>


--=20
Jason White

--089e013a14eac79951051eed6502
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>It works fine with open source Java, just follow the =
installation directions...<br><br></div><div>Regarding you second point, th=
ankfully I have never needed to look at the source code with this particula=
r tool. If the core is not available try contacting the author, maybe they =
have a reason? (ie. licensing issues or something)<br></div></div><div clas=
s=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Sep 4, 2015 at 10:=
57 AM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:karl AT aspodata DOT se" target=3D=
"_blank">karl AT aspodata DOT se</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">Jason:<br>
<span class=3D"">&gt; My absolute favorite for extracting data from tables =
in PDF datasheets is<br>
&gt; Tabula (<a href=3D"http://tabula.technology/" rel=3D"noreferrer" targe=
t=3D"_blank">http://tabula.technology/</a>), it has a nice interface.<br>
<br>
</span>Seems to some mix of ruby, java and javascript, and you run it throu=
gh<br>
your browser.<br>
<br>
Last time I checked there were no 64bit java, seems there is now:<br>
<br>
=C2=A0<a href=3D"https://www.java.com/en/download/manual.jsp" rel=3D"norefe=
rrer" target=3D"_blank">https://www.java.com/en/download/manual.jsp</a><br>
<br>
but it&#39;s 68MB for java and 53 for the source, a little big.<br>
Does it work with any free java implementations or does it require<br>
the latest sun/oracle one ?<br>
<br>
Also, in the repository, there are jar files and no java source.<br>
The pdf thing seems to be done by the java code which is binary.<br>
<br>
So it&#39;s hard to get ideas from their code and to contribute.<br>
<br>
///<br>
<br>
=C2=A0Looking at<br>
<a href=3D"https://source.opennews.org/en-US/articles/introducing-tabula/" =
rel=3D"noreferrer" target=3D"_blank">https://source.opennews.org/en-US/arti=
cles/introducing-tabula/</a><br>
<br>
they use the same intermediary format (except they also have the<br>
rotation parameter present) -- or used, since the mentioned ruby<br>
script is not any longer present.<br>
<br>
But they at least points to<br>
=C2=A0<a href=3D"http://www.tamirhassan.com/index.html#Publications" rel=3D=
"noreferrer" target=3D"_blank">http://www.tamirhassan.com/index.html#Public=
ations</a><br>
<br>
which points to theese possible usable articles:<br>
=C2=A0<a href=3D"http://www.orsigiorgio.net/wp-content/papercite-data/pdf/g=
ho*12.pdf" rel=3D"noreferrer" target=3D"_blank">http://www.orsigiorgio.net/=
wp-content/papercite-data/pdf/gho*12.pdf</a><br>
=C2=A0<a href=3D"http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan=
.pdf" rel=3D"noreferrer" target=3D"_blank">http://www.dbai.tuwien.ac.at/sta=
ff/hassan/files/p47-hassan.pdf</a><br>
=C2=A0<a href=3D"http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf" rel=
=3D"noreferrer" target=3D"_blank">http://www.cvc.uab.es/icdar2009/papers/37=
25a631.pdf</a><br>
=C2=A0<a href=3D"http://rewerse.net/publications/download/REWERSE-RP-2006-0=
85.pdf" rel=3D"noreferrer" target=3D"_blank">http://rewerse.net/publication=
s/download/REWERSE-RP-2006-085.pdf</a><br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
Regards,<br>
/Karl Hammar<br>
<br>
-----------------------------------------------------------------------<br>
Asp=C3=B6 Data<br>
Lilla Asp=C3=B6 148<br>
S-742 94 =C3=96sthammar<br>
Sweden<br>
<a href=3D"tel:%2B46%20173%20140%2057" value=3D"+4617314057">+46 173 140 57=
</a><br>
<br>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br><div class=
=3D"gmail_signature">Jason White</div>
</div>

--089e013a14eac79951051eed6502--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019