X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=b8IhVylBN0V2mRHtlRE+jPjVY6paSKeXAwsGG5vPsQ4=; b=QK9sNbDDvQnYZ4ch3ym9ksZXWhu4NBImsFRz0M8j8dC2Vk4Yxkj1c1wPxp4GSS9dIy 6AyMPp61GXP14poiS93qq0MV2C2gjDb8UYq/HdDsaw9rW3AIY6pWERN24Fnj1N9uZrpt PU7zoEP3/wFKoqltUAwR4g15+zwCq4HLsdARUsjW5zDQMsaQid3Ag5BdRCl4uHT+8O+H HHlW4APPIyu7ADgbLRyvR6T5H4uWyzmXMtA5NZf46oC/fgpWqx/B8U6nXTmanKxdHb7A D+ZeyfaTMasPRNsQkftJ3zhFGI7HZKI81SysnMuNyTvU+oX1sC/ED/3weFqLcPp6rXdb i1mQ== MIME-Version: 1.0 X-Received: by 10.60.232.231 with SMTP id tr7mr54934133oec.27.1452020855079; Tue, 05 Jan 2016 11:07:35 -0800 (PST) In-Reply-To: <201601051829.u05IT7TI021027@envy.delorie.com> References: <1512221837 DOT AA25291 AT ivan DOT Harhan DOT ORG> <20151222232230 DOT 12633 DOT qmail AT stuge DOT se> <0F6F1D0F-4F07-48EA-90FE-836EAD4E2354 AT noqsi DOT com> <0FCF3774-F93C-4BFF-BB61-636F75DCCACB AT noqsi DOT com> <20160105182120 DOT 3237F809D79B AT turkos DOT aspodata DOT se> <201601051829 DOT u05IT7TI021027 AT envy DOT delorie DOT com> Date: Tue, 5 Jan 2016 19:07:34 +0000 Message-ID: Subject: Re: [geda-user] A fileformat library From: "Peter Clifton (petercjclifton AT googlemail DOT com) [via geda-user AT delorie DOT com]" To: gEDA User Mailing List Content-Type: multipart/alternative; boundary=001a11369ad6460bde05289af39c Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --001a11369ad6460bde05289af39c Content-Type: text/plain; charset=UTF-8 On 5 Jan 2016 18:30, "DJ Delorie" wrote: > > > > . a binary file might be smaller, but that does not matter much > > I wrote an app that used a tree-like data file for storage. It > supported both ascii and binary formats. Not only was the binary > format significantly smaller, but loaded 10x faster. Parsing text > files and adapting to the incoming data is more expensive than you > think. Indeed... text representations of floating point numbers take a lot of computation to turn into the correct binary machine value. This is one of the main reasons big 3D models in STEP format are slow to load. (There are lots of irrational numbers represented in text format, base 10). It is very easy to write a fast ASCII to double conversion, but only if you make some assumptions and sacrifice accuracy. Doing correct conversion - which yields the closest binary floating point number to the decimal floating point number described is hard to preform correctly, and time consuming. Hypothetically, I think the best compromise is a format which has a lossless translation between text and binary representations. In reality, the speed issue is for the most part irrelevant to us. We simply don't have the quantity of floating point numerical data in our files to cause enough slow down to warrant For processing 3D step files - two approaches... 1. Don't perform the conversion unless the number is needed (shunt strings in and out of the system). 2. Test out the idea of hashing and caching conversions.... I've a suspicion that many coordinates and vectors get repeated a lot.... (The Autodesk dwg format special cases 0.0 and 1.0 with a very short bit pattern (3 bits I recall), which gives them enough reduction in file size to make it worth while for them. (Btw... Anyone else react with a "wtf" to realise that the DWG binary format operates on a literal BIT stream? - ie. Not even byte alignment!) Peter --001a11369ad6460bde05289af39c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 5 Jan 2016 18:30, "DJ Delorie" <dj AT delorie DOT com> wrote:
>
>
> > . a binary file might be smaller, but that does not matter much >
> I wrote an app that used a tree-like data file for storage.=C2=A0 It > supported both ascii and binary formats.=C2=A0 Not only was the binary=
> format significantly smaller, but loaded 10x faster.=C2=A0 Parsing tex= t
> files and adapting to the incoming data is more expensive than you
> think.

Indeed... text representations of floating point numbers tak= e a lot of computation to turn into the correct binary machine value.=C2=A0= This is one of the main reasons big 3D models in STEP format are slow to l= oad. (There are lots of irrational numbers represented in text format, base= 10).

It is very easy to write a fast ASCII to double conversion, = but only if you make some assumptions and sacrifice accuracy. Doing correct= conversion - which yields the closest binary floating point number to the = decimal floating point number described is hard to preform correctly, and t= ime consuming.

Hypothetically, I think the best compromise is a format whic= h has a lossless translation between text and binary representations.

In reality, the speed issue is for the most part irrelevant = to us. We simply don't have the quantity of floating point numerical da= ta in our files to cause enough slow down to warrant

For processing 3D step files - two approaches... 1. Don'= t perform the conversion unless the number is needed (shunt strings in and = out of the system). 2. Test out the idea of hashing and caching conversions= .... I've a suspicion that many coordinates and vectors get repeated a = lot.... (The Autodesk dwg format special cases 0.0 and 1.0 with a very shor= t bit pattern (3 bits I recall), which gives them enough reduction in file = size to make it worth while for them.

(Btw... Anyone else react with a "wtf" to realise = that the DWG binary format operates on a literal BIT stream? - ie. Not even= byte alignment!)

Peter

--001a11369ad6460bde05289af39c--