Half-Precision (Using the GNU Compiler Collection (GCC))

Next: Decimal Floating Types, Previous: Additional Floating Types, Up: Additional Numeric Types [Contents][Index]

6.1.5 Half-Precision Floating Point ¶

GCC supports half-precision (16-bit) floating point on several targets.

It is recommended that portable code use the _Float16 type defined by ISO/IEC TS 18661-3:2015. See Additional Floating Types.

Some targets have peculiarities as follows.

On Arm and AArch64 targets, GCC supports half-precision (16-bit) floating point via the __fp16 type defined in the Arm C-Language Extensions (ACLE).

Language-level support for the __fp16 data type is independent of whether GCC generates code using hardware floating-point instructions. In cases where hardware support is not specified, GCC implements conversions between __fp16 and other types as library calls.

Arm targets support two mutually incompatible half-precision floating-point formats:

A format that implements IEEE 754-2008 16-bit floating point types, enabled with the -mfp16-format=ieee command-line option; this format can represent normalized values in the range of 2^{-14} to 65504. There are 11 bits of significand precision, approximately 3 decimal digits.
An alternative format that sacrifices NaNs and infinity values, but has a larger range of values that can be represented: 2^{-14} to 131008. This is enabled with the -mfp16-format=alternative option.

You must choose one of the formats and use it consistently in your program.

GCC only supports the ‘alternative’ format on implementations that support it in hardware; there is no support for conversions to and from this format using library functions. Furthermore, you cannot link together code compiled with one format and code compiled for the other. GCC also supports the -mfp16-format=none option, which disables all support for half-precision floating-point types. Code compiled with this option can be linked safely with code compiled for either format.

The Arm architecture extension FEAT_FP16 (enabled, for example, with -march=armv8.2-a+fp16, or -march=armv8.1-m.main+mve.fp) defines data processing instructions that only support the ‘ieee’ format. The compiler rejects attempts to use the ‘alternative’ format when this architecture extension is enabled.

Note that the ACLE has deprecated use of the ‘alternative’ format and recommends that only the ‘ieee’ format be used.

The default is to compile with -mfp16-format=ieee.

In C and C++ there are two related data types:

__fp16, as defined by the Arm C-Language Extensions (ACLE). This can be used to hold either format;
_Float16, which is defined by ISO/IEC TS 18661-3:2015. This is only defined when the format selected is ‘ieee’.

The GCC port for AArch64 only supports the IEEE 754-2008 format, and does not have the -mfp16-format command-line option.

On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) floating point via the _Float16 type. For C++, x86 provides a builtin type named _Float16 which contains same data format as C.

On x86 targets with SSE2 enabled, without -mavx512fp16, all operations are emulated by software emulation and the float instructions. The default behavior for FLT_EVAL_METHOD is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using -fexcess-precision=16 forces round back after each operation.

Using -mavx512fp16 generates AVX512-FP16 instructions instead of software emulation. The default behavior of FLT_EVAL_METHOD is to round after each operation. The same is true with -fexcess-precision=standard and -mfpmath=sse. If there is no -mfpmath=sse, -fexcess-precision=standard alone does the same thing as before, It is useful for code that does not have _Float16 and runs on the x87 FPU.