Date: Thu, 01 Jul 1999 09:01:31 -0500 From: Eric Rudd Subject: Re: libm sources from cyberoptics To: Eli Zaretskii Cc: djgpp-workers AT delorie DOT com Message-id: <377B74BA.C691B8C3@cyberoptics.com> Organization: CyberOptics MIME-version: 1.0 X-Mailer: Mozilla 4.05 [en] (Win95; U) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit References: Reply-To: djgpp-workers AT delorie DOT com Eli Zaretskii wrote: > - The NaN is loaded by the functions from a long whose bit pattern > is 0xFFC00000. [snip] I think the bit pattern should be 0x7ff80000. > Do you agree? No. There are several types of NaN, as you have surmised, but the particular NaN reserved by Intel for indefinite results is exactly the bit pattern that my functions return (they call it "real indefinite"). Yes, the sign bit is set, and I think that's weird, too, but if you check the Intel docs, you'll find that the sign bit should indeed be set. In fact, this is the bit pattern that is generated when one computes 0./0. in the coprocessor and stores this quantity to memory as a float. Note also that I load a float, not a double. It gets converted to the same thing inside the coprocessor, and the float takes up less space in memory. The sole exception to this occurs in function sincos, where doubles are pointed to, and any double NaN must be copied to memory as such. Then the bit pattern there is 0xFFF8000000000000. > - A more serious problem is with asin and acos for arguments whose > absolute value is between 1 and 1.5 (inclusive). These manage to > pass the tests for badarg, wind up in fpatan and return -NaN, > which has the wrong sign, and errno is not set. Fixed. The problem occurred only for arguments in the range <1, 2> that have the entire least-significant four bytes equal to zero. Thus, acos(1.2) behaved properly, but acos(1.25) and acos(1.375) did not. > - atan2 returns NaN when both arguments are +/-Inf. ...as it should (C9x draft notwithstanding): the size of Inf/Inf is not known, any more than the size of 0/0 is known. > (atan2 from libm only returns NaN when y is Inf). That also complies with C90, but seems a little quirky to me. > It seems to me that if y is finite, x can be Inf and the > x87 will produce a finite result, so perhaps we need to modify the > checks of x? You mean, for instance, atan2(1., Inf)? On my platform, this particular example returns 0. without error, as it should. I don't see what the problem is here. Are we having some more trouble with the assembler? > - atanh(+/-1.25) returns -/+10 instead of a NaN. Oops, that is indeed a problem similar to the acos() and asin() problem. I've fixed that one, too. > - cos suffers a large loss of significant digits near Pi/2: for > example cos(-1.571) loses about 40 bits of the mantissa (they come > out as zero). The same happens to sin near Pi. Is this some flaw > of the x87? If so, perhaps we should instead compute sin(Pi/2 - x) > and sin(Pi - x) for such arguments? The problem is that you need a value of PI that is correct to many more than 53 bits in order to produce accurate results under such circumstances. The x87 is already doing just such a range reduction, using an internal value of PI accurate to 66 bits, but for certain arguments to the trig functions (such as the ones you cited), the roundoff errors in the range reduction are greatly amplified. K.B. Williams also noted large errors in cos(), sin(), and tan() of large arguments. The same mechanism is responsible for those errors, as well. I don't know what your background is, but are you acquainted with the notion of "backward error"? In case you are not, let me briefly explain: One can model the error in a finite-precision function routine by a three-stage process. First, one adds some roundoff error to the argument. Then, one computes the function of that rounded argument *exactly*. Finally, one adds some more roundoff error to obtain the result. For all decent routines, both roundoff errors in the model are on the order of 1 ULP (that is, 2^(-53) for doubles). For some functions (such as atan()), one can easily account for all the error in the function by less than 1 ULP of roundoff error in the last step. However, for certain other functions (such as cos(-1.571)), the rounding that occurs in the first step is sometimes much more significant; if one attempts to account for it by roundoff in the last step, such roundoff error bounds are very poor. When one quotes the roundoff error in terms of the first roundoff error generator, it is known as "backward error". A typical statement of backward error might be as follows (worded for the sine function): "the returned value is the exact sine of an argument *near* the actual argument." In fact, for the trig functions cos(), sin(), and tan(), either rounding generator may dominate the other, depending on the argument. Thus, one might characterize the errors in my trig function routines as follows: The output of cos(x) is the exact cosine, rounded to machine precision, of the quantity x*(1+delta), where delta is less than 2^(-66) in absolute value. Ideally, all functions would report the exact result, rounded to machine precision, of the input arguments; this is the best they could possibly do (the second roundoff error generator always exists, because the output must be rounded to machine precision). I am generally in favor of this level of accuracy: for instance, in the case of acos(), the most straightforward implementation was not accurate for fabs(x) near 1, but it turned out to be possible to preserve full accuracy at the cost of a single floating-point addition, so I went ahead and used the more accurate form. However, in order to report the exact cosine of x, rounded to machine precision, one needs to do multiple-precision range reduction. libm does this, but that's one of the reasons it's a lot slower for many arguments. Why not filter out the special cases, and only give them the special treatment? Well, that's what libm does, but for arguments outside of [-pi/4, +pi/4], the libm trig functions take about three times as long as my functions do. This is a heavy price to pay. This slower execution would be justifiable if there were some practical reason to demand such accuracy. However, in practical computation, the arguments themselves are generally the result of some prior computation, so they, too, are contaminated by roundoff errors usually much larger than 2^(-66). Thus there is no *practical* use in going to such lengths in range reduction as in libm. I'd hate to slow things down by a factor of three for 99.9% of the programmers, to please the 0.1% who have 53-bit arguments that are somehow accurate to much more than 53 bits. We can continue to provide libm for such perfectionists. > - cosh doesn't set errno to ERANGE for an argument that is +/-Inf, > unlike the docs says. I think the problem is that the flow jumps > from abarg: to infarg:, but the latter doesn't set errno to 2. I had mentioned earlier that I considered overflow to be a *finite* result that is converted to infinity. Thus, cosh(inf) doesn't overflow, any more than tanh(0.) underflows. The wording in the doc should be similar to that in exp.txh. I will send you an amended doc. > - ldexp sets errno when its first argument is -0.0. I guess we have > another case of +0 vs -0 controversy to handle. Seems like > clearing the high bit when testing x for zero should do the trick. Fixed. BTW, I noticed that the printf routine prints -0. as +0. (in other words, the sign bit is dropped), even when the %+E specifier is used, to force a sign to be printed. This doesn't seem right to me. I'd have expected %E to print -0. as 0., and %+E to print -0. as -0. > - modf doesn't set the sign bit of the integral part if it is zero. > For example, -0.81 sets the integral part to +0, not -0. I'm not > sure if we should bother. The positive zero is produced by subtraction of x - fract(x). That's just the way it comes out. I'd also lean toward ignoring this little anomaly. > - when the first argument of modf is NaN or Inf, perhaps we should > set the integral portion to NaN and Inf, respectively, not to zero > like the code now does, and return zero. At least for the Inf > argument, it makes more sense to say that its integral part is > infinite, not its fractional part. I agree. Fixed. > - what should pow(-0.,+1) return? your last version returns +0, > which might be okay, but in that case we should document it, since > it contradicts the rule that x^1 == x for any x. The same goes > for pow(-0.,-1): should it be +Inf or -Inf? I was all ready to send you an amended pow.s a couple of days ago, but I did some testing, and found that the "fixes" had caused more trouble than they had solved. Upon further consideration, I realized that the special cases in pow() are even more complicated than I previously thought. I drew a two-dimensional map on a piece of paper, with vertical and horizontal divisions to indicate -inf, -finite, -0, +0, +finite, and +inf for each of the two arguments. There are also divisions for finite operands that over- and underflow, and cases for negative x, depending on whether y is an integer. All in all, there are no fewer than 50 different cases to treat (and that's not even counting NaN arguments, which are correctly handled already, to the best of my knowledge). Now, it may seem clear that pow(-0., 1.) should be -0., but what should be done with pow(-0., y), y an even integer? Those should probably be +0. or +inf, depending on the sign of y. And then what does one do with pow(-0., y), y a positive non-integer? The magnitude is certainly zero (or infinite), but if one considers -0. to be an infinitesimal, then pow(-0., y) winds up being complex. There are corresponding questions about pow(-inf, y). It could be argued that if the complex phase of infinity is not known to be 0 or PI, that a NaN should be returned. I would like to fix up these problems (sometime), but I can see that their solution is far from trivial, and I'd probably wind up making some major changes to the structure of the routine -- right now, it's difficult to follow, and getting worse with each new special case. The current state of the routine, to the best of my knowledge, is that it computes all finite results properly, but gets the signs of zero and infinity wrong in some cases. There are cases where the magnitude of the result is clearly infinite, but of indeterminate phase, and my routine returns +inf, instead of NaN. I also looked at the output of the libm pow() routine, and though, for all cases I examined, the signs of zero and infinite results are computed correctly (when they are well defined), in other cases even they didn't get it completely right. For instance, pow(-inf, 2.5) returns +inf with no error (instead of NaN, as it probably should); pow(1., +inf) returns NaN (as it should), and pow(+inf, 0.) returns 1. (instead of NaN, as it should), but none of these cases results in errno being set to EDOM, as it should. I'm not sure the present problems with my pow() routine are significant enough to hold up the upcoming release, since these behaviors are all ones that ANSI does not even specify. I also examined the performance of Watcom and Borland routines, and was not impressed by what I found. I think that any program that *counts* on certain behavior in such cases is in deep trouble, anyway. What are your thoughts? > Btw, libm functions consistently do NOT set errno for NaN arguments > (except for pow(NaN,0)). Our behavior is well-documented, so this > is only FYI. But I wonder whether some widely-accepted standard > rules this. I considered a NaN argument to be outside the mathematical domain of definition of the function, and therefore, according to C90, required a domain error to be indicated. However, in C9x, most of that goes out the window, at least if you can believe the current draft. I will upload MATH0701.ZIP, which contains fixes for all but pow.s. -Eric Rudd