GCC supports half-precision (16-bit) floating point on several targets.
It is recommended that portable code use the _Float16 type defined
by ISO/IEC TS 18661-3:2015. See Additional Floating Types.
Some targets have peculiarities as follows.
On Arm and AArch64 targets, GCC supports half-precision (16-bit)
floating point via the __fp16 type defined in the Arm
C-Language Extensions (ACLE).
Language-level support for the __fp16 data type is
independent of whether GCC generates code using hardware floating-point
instructions. In cases where hardware support is not specified, GCC
implements conversions between __fp16 and other types as library
calls.
Arm targets support two mutually incompatible half-precision floating-point formats:
You must choose one of the formats and use it consistently in your program.
GCC only supports the ‘alternative’ format on implementations that support it in hardware; there is no support for conversions to and from this format using library functions. Furthermore, you cannot link together code compiled with one format and code compiled for the other. GCC also supports the -mfp16-format=none option, which disables all support for half-precision floating-point types. Code compiled with this option can be linked safely with code compiled for either format.
The Arm architecture extension FEAT_FP16 (enabled, for example,
with -march=armv8.2-a+fp16, or
-march=armv8.1-m.main+mve.fp) defines data processing
instructions that only support the ‘ieee’ format. The compiler
rejects attempts to use the ‘alternative’ format when this
architecture extension is enabled.
Note that the ACLE has deprecated use of the ‘alternative’ format and recommends that only the ‘ieee’ format be used.
The default is to compile with -mfp16-format=ieee.
In C and C++ there are two related data types:
__fp16, as defined by the Arm C-Language Extensions (ACLE).
This can be used to hold either format;
_Float16, which is defined by ISO/IEC TS 18661-3:2015. This is
only defined when the format selected is ‘ieee’.
The GCC port for AArch64 only supports the IEEE 754-2008 format, and does not have the -mfp16-format command-line option.
On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
floating point via the _Float16 type. For C++, x86 provides a
builtin type named _Float16 which contains same data format as C.
On x86 targets with SSE2 enabled, without -mavx512fp16,
all operations are emulated by software emulation and the float
instructions. The default behavior for FLT_EVAL_METHOD is to keep the
intermediate result of the operation as 32-bit precision. This may lead to
inconsistent behavior between software emulation and AVX512-FP16 instructions.
Using -fexcess-precision=16 forces round back after each operation.
Using -mavx512fp16 generates AVX512-FP16 instructions instead of
software emulation. The default behavior of FLT_EVAL_METHOD is to round
after each operation. The same is true with -fexcess-precision=standard
and -mfpmath=sse. If there is no -mfpmath=sse,
-fexcess-precision=standard alone does the same thing as before,
It is useful for code that does not have _Float16 and runs on the x87
FPU.