Errors in Math Functions (The GNU C Library)

Next: Pseudo-Random Numbers, Previous: Special Functions, Up: Mathematics [Contents][Index]

19.7 Errors in Math Functions ¶

Errors are measured in “units of the last place”. This is a measure for the relative error. For a number z with the representation d.d...d·2^e (we assume IEEE floating-point numbers with base 2) the ULP is represented by

|d.d...d - (z / 2^e)| / 2^(p - 1)

where p is the number of bits in the mantissa of the floating-point number representation. Ideally the error for all functions is always less than 0.5ulps in round-to-nearest mode. Using rounding bits this is also possible and normally implemented for the basic operations. Except for certain functions such as sqrt, fma and rint whose results are fully specified by reference to corresponding IEEE 754 floating-point operations, and conversions between strings and floating point, the GNU C Library does not aim for correctly rounded results for functions in the math library, and does not aim for correctness in whether “inexact” exceptions are raised. Instead, the goals for accuracy of functions without fully specified results are as follows; some functions have bugs meaning they do not meet these goals in all cases. In the future, the GNU C Library may provide some other correctly rounding functions under the names such as crsin proposed for an extension to ISO C.

Each function with a floating-point result behaves as if it computes an infinite-precision result that is within a few ulp (in both real and complex parts, for functions with complex results) of the mathematically correct value of the function (interpreted together with ISO C or POSIX semantics for the function in question) at the exact value passed as the input. Exceptions are raised appropriately for this value and in accordance with IEEE 754 / ISO C / POSIX semantics, and it is then rounded according to the current rounding direction to the result that is returned to the user. errno may also be set (see Error Reporting by Mathematical Functions). (The “inexact” exception may be raised, or not raised, even if this is inconsistent with the infinite-precision value.)
For the IBM long double format, as used on PowerPC GNU/Linux, the accuracy goal is weaker for input values not exactly representable in 106 bits of precision; it is as if the input value is some value within 0.5ulp of the value actually passed, where “ulp” is interpreted in terms of a fixed-precision 106-bit mantissa, but not necessarily the exact value actually passed with discontiguous mantissa bits.
For the IBM long double format, functions whose results are fully specified by reference to corresponding IEEE 754 floating-point operations have the same accuracy goals as other functions, but with the error bound being the same as that for division (3ulp). Furthermore, “inexact” and “underflow” exceptions may be raised for all functions for any inputs, even where such exceptions are inconsistent with the returned value, since the underlying floating-point arithmetic has that property.
Functions behave as if the infinite-precision result computed is zero, infinity or NaN if and only if that is the mathematically correct infinite-precision result. They behave as if the infinite-precision result computed always has the same sign as the mathematically correct result.
If the mathematical result is more than a few ulp above the overflow threshold for the current rounding direction, the value returned is the appropriate overflow value for the current rounding direction, with the overflow exception raised.
If the mathematical result has magnitude well below half the least subnormal magnitude, the returned value is either zero or the least subnormal (in each case, with the correct sign), according to the current rounding direction and with the underflow exception raised.
Where the mathematical result underflows (before rounding) and is not exactly representable as a floating-point value, the function does not behave as if the computed infinite-precision result is an exact value in the subnormal range. This means that the underflow exception is raised other than possibly for cases where the mathematical result is very close to the underflow threshold and the function behaves as if it computes an infinite-precision result that does not underflow. (So there may be spurious underflow exceptions in cases where the underflowing result is exact, but not missing underflow exceptions in cases where it is inexact.)
The GNU C Library does not aim for functions to satisfy other properties of the underlying mathematical function, such as monotonicity, where not implied by the above goals.
All the above applies to both real and complex parts, for complex functions.

Therefore many of the functions in the math library have errors. The math testsuite only flags results larger than 9ulp (or 16 for IBM long double format) as errors; although most of the implementations show errors smaller than the limit.

A more comprehensive analysis of the GNU C Library math functions precision could be found in ’Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision’; Brian Gladman, Vincenzo Innocente, John Mather, and Paul Zimmermann at <https://inria.hal.science/hal-03141101>. It does not cover complex functions, nor jn/yn, and it is only for x86_64, and for rounding to nearest, and does not cover any architecture variations (in particular IBM long double is out of scope).

For complex functions, some analysis of the GNU C Library math functions can be found in ’Accuracy of Complex Mathematical Operations and Functions in Single and Double Precision’; Paul Caprioli, Vincenzo Innocente, Paul Zimmermann at https://inria.hal.science/hal-04714173. It only covers float and double, only for x86_64, and for rounding to nearest, and does not cover any architecture variations.