www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/02/10/02:27:12

From: Paul Shirley <Paul AT foobar DOT co DOT uk DOT chocolat>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: float, double & long double
Date: Mon, 10 Feb 1997 01:59:52 +0000
Organization: wot? me?
Lines: 35
Distribution: world
Message-ID: <DeB78DAYEo$yEwJo@foobar.co.uk>
References: <N DOT 020797 DOT 081335 DOT 66 AT hrv1-4 DOT worldaccess DOT nl>
Reply-To: Paul Shirley <junk AT defeating DOT email DOT address>
NNTP-Posting-Host: chocolat.foobar.co.uk
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <N DOT 020797 DOT 081335 DOT 66 AT hrv1-4 DOT worldaccess DOT nl>,
frabb AT worldaccess DOT nl writes
>It is also clear that the 'shorts' of float and long double line up nicely,
>you only have to do some truncation or inserting zero shorts to do the
>conversion. The double however has an offset of shorts + 1 bit. This will
>always make bitshifting necessary when converting. That is the reason why
>programs using double run slightly slower than programs using float or long
>double.

1: doubles are sometimes slower for 1 main reason: they are twice as big
and moving twice as many bytes usually takes longer! On a 387 or 486
moving a 64 bit value across a 32 bit bus explicitly takes more clocks.
On a P5 there *may* be delays caused by cache filling.

2: (With 1 exception) there is *NO* cost to 'converting' any float
format during reads or writes from the fpu. None, Zero clocks. Is there
any other way I can say it? All ops end up as long double during
calculations, so only load/store actions have any difference anyway.
It really does come down to how many bytes get shifted.
Loading and storing long doubles is particularly expensive because it
needs 3x32 bit access's on a 486 or 2x64 bit ones on a P5. Its slower
even though *no* bit format conversion occurs.

The exception: pass a float to a routine expecting a double and gcc will
have to load it through the fpu to do the conversion, with compatible
types gcc simply pushes the raw binary value. This is not as big a
problem as it seems because a: inlined routines won't do this, b: the
values are likely to be in the fpu anyway.



If I tell you I just spent the last 3 months optimising P5 fpu code (for
a 3D geometry pipeline) will you start believing me?
---
Paul Shirley: shuffle chocolat before foobar for my real email address

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019