From: frabb AT worldaccess DOT nl
Newsgroups: comp.os.msdos.djgpp
Subject: float, double & long double
Date: Fri, 07 Feb 97 07:13:35 GMT
Organization: World Access, Internet, E-mail and Videotex
Lines: 90
Message-ID: <N.020797.081335.66@hrv1-4.worldaccess.nl>
NNTP-Posting-Host: hrv1-4.worldaccess.nl
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

While thinking about floats and doubles I made the following program.
Please have a look at it:

---------------------------------------------------------------------
// investigate double, float, etc.

#include <stdio.h>
#include <conio.h>

#define SFL sizeof(float)
#define SDB sizeof(double)
#define SLD sizeof(long double)

union{
     long double l;
     double d;
     float f;
     unsigned short s[8]; // too many shorts, to be on the safe side
     }mn;
/*---------------------------------------------------------*/
void showflt(float f)
{
printf("\n      float: %.20f, ",f); mn.f = f;
for(int i = SFL/2 - 1; i>=0; i--) printf("%04X",mn.s[i]);
}
/*---------------------------------------------------------*/
void showdbl(double d)
{
printf("\n     double: %.20f, ",d); mn.d = d;
for(int i = SDB/2 - 1; i>=0; i--) printf("%04X",mn.s[i]);
}
/*---------------------------------------------------------*/
void showldb(long double ld)
{
printf("\nlong double: %.20Lf, ",ld); mn.l = ld;
for(int i = SLD/2 - 2; i>=0; i--) printf("%04X",mn.s[i]);
}
/*---------------------------------------------------------*/
void main(void)
{
float f; double d; long double l;
l = 1.2345678901234567890123456789L;
d = l;
f = d;
clrscr();
showflt(f);
showdbl(d);
showldb(l);
#define I 1.0
// to prove that my method is correct. The hexa result should contain
// ABCDE somewhere:
showldb(I/2+I/8+I/32+I/128+I/256+I/512+I/1024+I/8192+I/16384+I/65536+
          I/131072+I/262144+I/524288);
return 0;
}
----------------------------------------------------------------------

Each long double contains 16 redundant bits, to make it fit in a 32 bit
scheme. That explains the constant -2 in "showldb".

The calls to showflt, showdbl and showldb should all show more-or-less the
same thing. Here is the result:

      float: 1.23456788063049316406, 3F9E0652
     double: 1.23456789012345669043, 3FF3C0CA428C59FB
long double: 1.23456789012345678899, 3FFF9E06521462CFDB8D
long double: 0.67111015319824218750, 3FFEABCDE00000000000

(This would look better if there were not too many digits specified for the
fractional part.)

When writing the hexa numbers in binary it is clear that some shifting is
enough to convert float<-->double<-->long double:

           3   F   9   E   0   6   5   2
        00111111100111100000011001010010
        3   F   F   3   C   0   C   A   4   2   8   C   5   9   F   B
     0011111111110011110000001100101001000010100011000101100111111011
   3   F   F   F   9   E   0   6   5   2   1   4   6   2   C   F   D   B   8   D
00111111111111111001111000000110010100100001010001100010110011111101110110001101

It is also clear that the 'shorts' of float and long double line up nicely,
you only have to do some truncation or inserting zero shorts to do the
conversion. The double however has an offset of shorts + 1 bit. This will
always make bitshifting necessary when converting. That is the reason why
programs using double run slightly slower than programs using float or long
double.

frank abbing