Mail Archives: djgpp-workers/1999/07/01/10:04:31

www.delorie.com/archives/browse.cgi
search
Mail Archives: djgpp-workers/1999/07/01/10:04:31
Date: Thu, 01 Jul 1999 09:01:31 -0500

From: Eric Rudd <rudd AT cyberoptics DOT com>

Subject: Re: libm sources from cyberoptics

To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>

Cc: djgpp-workers AT delorie DOT com

Message-id: <377B74BA.C691B8C3@cyberoptics.com>

Organization: CyberOptics

MIME-version: 1.0

X-Mailer: Mozilla 4.05 [en] (Win95; U)

References: <Pine DOT SUN DOT 3 DOT 91 DOT 990627172155 DOT 5761E-100000 AT is>

Reply-To: djgpp-workers AT delorie DOT com
Eli Zaretskii wrote:

> - The NaN is loaded by the functions from a long whose bit pattern
>   is 0xFFC00000. [snip] I think the bit pattern should be 0x7ff80000.
>   Do you agree?

No.  There are several types of NaN, as you have surmised, but the
particular NaN reserved by Intel for indefinite results is exactly the bit
pattern that my functions return (they call it "real indefinite").  Yes,
the sign bit is set, and I think that's weird, too, but if you check the
Intel docs, you'll find that the sign bit should indeed be set.  In fact,
this is the bit pattern that is generated when one computes 0./0. in the
coprocessor and stores this quantity to memory as a float.

Note also that I load a float, not a double.  It gets converted to the same
thing inside the coprocessor, and the float takes up less space in memory.
The sole exception to this occurs in function sincos, where doubles are
pointed to, and any double NaN must be copied to memory as such.  Then the
bit pattern there is 0xFFF8000000000000.

> - A more serious problem is with asin and acos for arguments whose
>   absolute value is between 1 and 1.5 (inclusive).  These manage to
>   pass the tests for badarg, wind up in fpatan and return -NaN,
>   which has the wrong sign, and errno is not set.

Fixed.  The problem occurred only for arguments in the range <1, 2> that
have the entire least-significant four bytes equal to zero.  Thus,
acos(1.2) behaved properly, but acos(1.25) and acos(1.375) did not.

> - atan2 returns NaN when both arguments are +/-Inf.

...as it should (C9x draft notwithstanding): the size of Inf/Inf is not
known, any more than the size of 0/0 is known.

>   (atan2 from libm only returns NaN when y is Inf).

That also complies with C90, but seems a little quirky to me.

>   It seems to me that if y is finite, x can be Inf and the
>   x87 will produce a finite result, so perhaps we need to modify the
>   checks of x?

You mean, for instance, atan2(1., Inf)?  On my platform, this particular
example returns 0. without error, as it should.  I don't see what the
problem is here.  Are we having some more trouble with the assembler?

> - atanh(+/-1.25) returns -/+10 instead of a NaN.

Oops, that is indeed a problem similar to the acos() and asin() problem.
I've fixed that one, too.

> - cos suffers a large loss of significant digits near Pi/2: for
>   example cos(-1.571) loses about 40 bits of the mantissa (they come
>   out as zero).  The same happens to sin near Pi.  Is this some flaw
>   of the x87?  If so, perhaps we should instead compute sin(Pi/2 - x)
>   and sin(Pi - x) for such arguments?

The problem is that you need a value of PI that is correct to many more
than 53 bits in order to produce accurate results under such
circumstances.  The x87 is already doing just such a range reduction, using
an internal value of PI accurate to 66 bits, but for certain arguments to
the trig functions (such as the ones you cited), the roundoff errors in the
range reduction are greatly amplified.  K.B. Williams also noted large
errors in cos(), sin(), and tan() of large arguments.   The same mechanism
is responsible for those errors, as well.

I don't know what your background is, but are you acquainted with the
notion of "backward error"?  In case you are not, let me briefly explain:
One can model the error in a finite-precision function routine by a
three-stage process.  First, one adds some roundoff error to the argument.
Then, one computes the function of that rounded argument *exactly*.
Finally, one adds some more roundoff error to obtain the result.

For all decent routines, both roundoff errors in the model are on the order
of 1 ULP (that is, 2^(-53) for doubles).  For some functions (such as
atan()), one can easily account for all the error in the function by less
than 1 ULP of roundoff error in the last step.  However, for certain other
functions (such as cos(-1.571)), the rounding that occurs in the first step
is sometimes much more significant; if one attempts to account for it by
roundoff in the last step, such roundoff error bounds are very poor.  When
one quotes the roundoff error in terms of the first roundoff error
generator, it is known as "backward error".  A typical statement of
backward error might be as follows (worded for the sine function): "the
returned value is the exact sine of an argument *near* the actual
argument."  In fact, for the trig functions cos(), sin(), and tan(), either
rounding generator may dominate the other, depending on the argument.
Thus, one might characterize the errors in my trig function routines as
follows:

The output of cos(x) is the exact cosine, rounded to machine precision, of
the quantity x*(1+delta), where delta is less than 2^(-66) in absolute
value.

Ideally, all functions would report the exact result, rounded to machine
precision, of the input arguments; this is the best they could possibly do
(the second roundoff error generator always exists, because the output must
be rounded to machine precision).  I am generally in favor of this level of
accuracy:  for instance, in the case of acos(), the most straightforward
implementation was not accurate for fabs(x) near 1, but it turned out to be
possible to preserve full accuracy at the cost of a single floating-point
addition, so I went ahead and used the more accurate form.  However, in
order to report the exact cosine of x, rounded to machine precision, one
needs to do multiple-precision range reduction.  libm does this, but that's
one of the reasons it's a lot slower for many
arguments.  Why not filter out the special cases, and only give them the
special treatment?  Well, that's what libm does, but for arguments outside
of [-pi/4, +pi/4], the libm trig functions take about three times as long
as my functions do. This is a heavy price to pay.

This slower execution would be justifiable if there were some practical
reason to demand such accuracy.  However, in practical computation, the
arguments themselves are generally the result of some prior computation, so
they, too, are contaminated by roundoff errors usually much larger than
2^(-66). Thus there is no *practical* use in going to such lengths in range
reduction as in libm.  I'd hate to slow things down by a factor of three
for 99.9% of the programmers, to please the 0.1% who have 53-bit arguments
that are somehow accurate to much more than 53 bits.  We can continue to
provide libm for such perfectionists.

> - cosh doesn't set errno to ERANGE for an argument that is +/-Inf,
>   unlike the docs says.  I think the problem is that the flow jumps
>   from abarg: to infarg:, but the latter doesn't set errno to 2.

I had mentioned earlier that I considered overflow to be a *finite* result
that is converted to infinity.  Thus, cosh(inf) doesn't overflow, any more
than tanh(0.) underflows.  The wording in the doc should be similar to that
in exp.txh.  I will send you an amended doc.

> - ldexp sets errno when its first argument is -0.0.  I guess we have
>   another case of +0 vs -0 controversy to handle.  Seems like
>   clearing the high bit when testing x for zero should do the trick.

Fixed.  BTW, I noticed that the printf routine prints -0. as +0. (in other
words, the sign bit is dropped), even when the %+E specifier is used, to
force a sign to be printed.  This doesn't seem right to me.  I'd have
expected %E to print -0. as 0., and %+E to print -0. as -0.

> - modf doesn't set the sign bit of the integral part if it is zero.
>   For example, -0.81 sets the integral part to +0, not -0.  I'm not
>   sure if we should bother.

The positive zero is produced by subtraction of x - fract(x).  That's just
the way it comes out.  I'd also lean toward ignoring this little anomaly.

> - when the first argument of modf is NaN or Inf, perhaps we should
>   set the integral portion to NaN and Inf, respectively, not to zero
>   like the code now does, and return zero.  At least for the Inf
>   argument, it makes more sense to say that its integral part is
>   infinite, not its fractional part.

I agree.  Fixed.

> - what should pow(-0.,+1) return?  your last version returns +0,
>   which might be okay, but in that case we should document it, since
>   it contradicts the rule that x^1 == x for any x.  The same goes
>   for pow(-0.,-1): should it be +Inf or -Inf?

I was all ready to send you an amended pow.s a couple of days ago, but I
did some testing, and found that the "fixes" had caused more trouble than
they had solved.  Upon further consideration, I realized that the special
cases in pow() are even more complicated than I previously thought.

I drew a two-dimensional map on a piece of paper, with vertical and
horizontal divisions to indicate -inf, -finite, -0, +0, +finite, and +inf
for each of the two arguments.  There are also divisions for finite
operands that over- and underflow, and cases for negative x, depending on
whether y is an integer.  All in all, there are no fewer than 50 different
cases to treat (and that's not even counting NaN arguments, which are
correctly handled already, to the best of my knowledge).

Now, it may seem clear that pow(-0., 1.) should be -0., but what should be
done with pow(-0., y), y an even integer?  Those should probably be +0. or
+inf, depending on the sign of y.  And then what does one do with pow(-0.,
y), y a positive non-integer?  The magnitude is certainly zero (or
infinite), but if one considers -0. to be an infinitesimal, then pow(-0.,
y) winds up being complex.  There are corresponding questions about
pow(-inf, y).  It could be argued that if the complex phase of infinity is
not known to be 0 or PI, that a NaN should be returned.

I would like to fix up these problems (sometime), but I can see that their
solution is far from trivial, and I'd probably wind up making some major
changes to the structure of the routine -- right now, it's difficult to
follow, and getting worse with each new special case.  The current state of
the routine, to the best of my knowledge, is that it computes all finite
results properly, but gets the signs of zero and infinity wrong in some
cases.  There are cases where the magnitude of the result is clearly
infinite, but of indeterminate phase, and my routine returns +inf, instead
of NaN.

I also looked at the output of the libm pow() routine, and though, for all
cases I examined, the signs of zero and infinite results are computed
correctly (when they are well defined), in other cases even they didn't get
it completely right.  For instance, pow(-inf, 2.5) returns +inf with no
error (instead of NaN, as it probably should); pow(1., +inf) returns NaN
(as it should), and pow(+inf, 0.) returns 1. (instead of NaN, as it
should), but none of these cases results in errno being set to EDOM, as it
should.

I'm not sure the present problems with my pow() routine are significant
enough to hold up the upcoming release, since these behaviors are all ones
that ANSI does not even specify.  I also examined the performance of Watcom
and Borland routines, and was not impressed by what I found.  I think that
any program that *counts* on certain behavior in such cases is in deep
trouble, anyway.  What are your thoughts?

> Btw, libm functions consistently do NOT set errno for NaN arguments
> (except for pow(NaN,0)).  Our behavior is well-documented, so this
> is only FYI.  But I wonder whether some widely-accepted standard
> rules this.

I considered a NaN argument to be outside the mathematical domain of
definition of the function, and therefore, according to C90, required a
domain error to be indicated.  However, in C9x, most of that goes out the
window, at least if you can believe the current draft.

I will upload MATH0701.ZIP, which contains fixes for all but pow.s.

-Eric Rudd
- Raw text -
webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019
Date:	Thu, 01 Jul 1999 09:01:31 -0500
From:	Eric Rudd <rudd AT cyberoptics DOT com>
Subject:	Re: libm sources from cyberoptics
To:	Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
Cc:	djgpp-workers AT delorie DOT com
Message-id:	<377B74BA.C691B8C3@cyberoptics.com>
Organization:	CyberOptics
MIME-version:	1.0
X-Mailer:	Mozilla 4.05 [en] (Win95; U)
References:	<Pine DOT SUN DOT 3 DOT 91 DOT 990627172155 DOT 5761E-100000 AT is>
Reply-To:	djgpp-workers AT delorie DOT com