Floating-point calculations are carried out internally with extra precision, and then rounded to fit into the destination type. This ensures that results are as precise as the input data. IEEE 754 defines four possible rounding modes:
This is the default mode. It should be used unless there is a specific
need for one of the others. In this mode results are rounded to the
nearest representable value. If the result is midway between two
representable values, the even representable is chosen. Even here
means the lowest-order bit is zero. This rounding mode prevents
statistical bias and guarantees numeric stability: round-off errors in a
lengthy calculation will remain smaller than half of FLT_EPSILON
.
All results are rounded to the smallest representable value which is greater than the result.
All results are rounded to the largest representable value which is less than the result.
All results are rounded to the largest representable value whose magnitude is less than that of the result. In other words, if the result is negative it is rounded up; if it is positive, it is rounded down.
fenv.h defines constants which you can use to refer to the various rounding modes. Each one will be defined if and only if the FPU supports the corresponding rounding mode.
FE_TONEAREST
¶Round to nearest.
FE_UPWARD
¶Round toward +∞.
FE_DOWNWARD
¶Round toward -∞.
FE_TOWARDZERO
¶Round toward zero.
Underflow is an unusual case. Normally, IEEE 754 floating point
numbers are always normalized (see Floating Point Representation Concepts).
Numbers smaller than 2^r (where r is the minimum exponent,
FLT_MIN_RADIX-1
for float) cannot be represented as
normalized numbers. Rounding all such numbers to zero or 2^r
would cause some algorithms to fail at 0. Therefore, they are left in
denormalized form. That produces loss of precision, since some bits of
the mantissa are stolen to indicate the decimal point.
If a result is too small to be represented as a denormalized number, it is rounded to zero. However, the sign of the result is preserved; if the calculation was negative, the result is negative zero. Negative zero can also result from some operations on infinity, such as 4/-∞.
At any time, one of the above four rounding modes is selected. You can find out which one with this function:
int
fegetround (void)
¶Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.
Returns the currently selected rounding mode, represented by one of the values of the defined rounding mode macros.
To change the rounding mode, use this function:
int
fesetround (int round)
¶Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.
Changes the currently selected rounding mode to round. If
round does not correspond to one of the supported rounding modes
nothing is changed. fesetround
returns zero if it changed the
rounding mode, or a nonzero value if the mode is not supported.
You should avoid changing the rounding mode if possible. It can be an expensive operation; also, some hardware requires you to compile your program differently for it to work. The resulting code may run slower. See your compiler documentation for details.