LoongArch Options (Using the GNU Compiler Collection (GCC))

Next: M32C Options, Previous: LM32 Options, Up: Machine-Dependent Options [Contents][Index]

3.20.23 LoongArch Options ¶

These command-line options are defined for LoongArch targets:

-march=arch-type ¶

Generate instructions for the machine type arch-type. -march=arch-type allows GCC to generate code that may not run at all on processors other than the one indicated.

The choices for arch-type are:

‘native’: Local processor type detected by the native compiler.
‘loongarch64’: Generic LoongArch 64-bit processor.
‘la464’: LoongArch LA464-based processor with LSX, LASX.
‘la664’: LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1 instructions.
‘la64v1.0’: LoongArch64 ISA version 1.0.
‘la64v1.1’: LoongArch64 ISA version 1.1.
‘la32v1.0’: LoongArch32 ISA version 1.0.
‘la32rv1.0’: LoongArch32 Reduced ISA version 1.0.

More information about LoongArch ISA versions can be found at https://github.com/loongson/la-toolchain-conventions.

-mtune=tune-type ¶

Optimize the generated code for the given processor target.

The choices for tune-type are:

‘native’: Local processor type detected by the native compiler.
‘generic’: Generic LoongArch processor.
‘loongarch64’: Generic LoongArch 64-bit processor.
‘la464’: LoongArch LA464 core.
‘la664’: LoongArch LA664 core.
‘loongarch32’: Generic LoongArch 32-bit processor.

-mabi=base-abi-type ¶

Generate code for the specified calling convention. base-abi-type can be one of:

‘lp64d’: Uses 64-bit general purpose registers and 32/64-bit floating-point registers for parameter passing. Data model is LP64, where ‘int’ is 32 bits, while ‘long int’ and pointers are 64 bits.
‘lp64f’: Uses 64-bit general purpose registers and 32-bit floating-point registers for parameter passing. Data model is LP64, where ‘int’ is 32 bits, while ‘long int’ and pointers are 64 bits.
‘lp64s’: Uses 64-bit general purpose registers and no floating-point registers for parameter passing. Data model is LP64, where ‘int’ is 32 bits, while ‘long int’ and pointers are 64 bits.

-mfpu=fpu-type ¶

Generate code for the specified FPU type, which can be one of:

‘64’: Allow the use of hardware floating-point instructions for 32-bit and 64-bit operations.
‘32’: Allow the use of hardware floating-point instructions for 32-bit operations.
‘none’
‘0’: Prevent the use of hardware floating-point instructions.

-msimd=simd-type ¶

Enable generation of LoongArch SIMD instructions for vectorization and via builtin functions. The value can be one of:

‘lasx’: Enable generating instructions from the 256-bit LoongArch Advanced SIMD Extension (LASX) and the 128-bit LoongArch SIMD Extension (LSX).
‘lsx’: Enable generating instructions from the 128-bit LoongArch SIMD Extension (LSX).
‘none’: No LoongArch SIMD instruction may be generated.

-msoft-float ¶

Force -mfpu=none and prevent the use of floating-point registers for parameter passing. This option may change the target ABI.

-msingle-float ¶

Force -mfpu=32 and allow the use of 32-bit floating-point registers for parameter passing. This option may change the target ABI.

-mdouble-float ¶

Force -mfpu=64 and allow the use of 32/64-bit floating-point registers for parameter passing. This option may change the target ABI.

-mlasx ¶

-mno-lasx

-mlsx

-mno-lsx

Incrementally adjust the scope of the SIMD extensions (none / LSX / LASX) that can be used by the compiler for code generation. Enabling LASX with -mlasx automatically enables LSX, and disabling LSX with -mno-lsx automatically disables LASX. These driver-only options act upon the final -msimd configuration state and make incremental changes in the order they appear on the GCC driver’s command line, deriving the final / canonicalized -msimd option that is passed to the compiler proper.

-mbranch-cost=n ¶

Set the cost of branches to roughly n instructions.

-maddr-reg-reg-cost=n ¶

Set the cost of ADDRESS_REG_REG to the value calculated by n.

-mcheck-zero-division ¶

-mno-check-zero-divison

Trap (do not trap) on integer division by zero. The default is -mcheck-zero-division for -O0 or -Og, and -mno-check-zero-division for other optimization levels.

-mbreak-code=code ¶

Emit a break code instruction for irrecoverable traps from __builtin_trap or inserted by the compiler (for example an erroneous path isolated with -fisolate-erroneous-paths-dereference), or an amswap.w $r0, $r1, $r0 instruction which will cause the hardware to trigger an Instruction Not-defined Exception if code is negative or greater than 32767. The default is -1, meaning to use the amswap.w instruction.

-mcond-move-int ¶

-mno-cond-move-int

Conditional moves for integral data in general-purpose registers are enabled (disabled). The default is -mcond-move-int.

-mcond-move-float ¶

-mno-cond-move-float

Conditional moves for floating-point registers are enabled (disabled). The default is -mcond-move-float.

-mmemcpy ¶

-mno-memcpy

Force (do not force) the use of memcpy for non-trivial block moves. The default is -mno-memcpy, which allows GCC to inline most constant-sized copies. Setting optimization level to -Os also forces the use of memcpy, but -mno-memcpy may override this behavior if explicitly specified, regardless of the order these options on the command line.

-mstrict-align ¶

-mno-strict-align

Avoid or allow generating memory accesses that may not be aligned on a natural object boundary as described in the architecture specification. The default is -mno-strict-align.

-G num ¶

Put global and static data smaller than num bytes into a small data section. The default value is 0.

-mmax-inline-memcpy-size=n ¶

Inline all block moves (such as calls to memcpy or structure copies) less than or equal to n bytes. The default value of n is 1024.

-mcmodel=code-model ¶

Set the code model to one of:

‘tiny-static (Not implemented yet)’
‘tiny (Not implemented yet)’
‘normal’: The text segment must be within 128MB addressing space. The data segment must be within 2GB addressing space.
‘medium’: The text segment and data segment must be within 2GB addressing space. This is the default code model unless GCC has been configured with --with-cmodel= specifying a different default code model.
‘large (Not implemented yet)’
‘extreme’: This mode does not limit the size of the code segment and data segment. The -mcmodel=extreme option is incompatible with -fplt and/or -mexplicit-relocs=none.

-mexplicit-relocs=style ¶

-mexplicit-relocs

-mno-explicit-relocs

Set when to use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may limit instruction scheduling but allow linker relaxation. With -mexplicit-relocs=none, the assembler macros are always used; with -mexplicit-relocs=always, the assembler relocation operators are always used; and with -mexplicit-relocs=auto the compiler uses the relocation operators where linker relaxation is impossible to improve the code quality, and macros elsewhere.

The default value for the option is determined with the assembler capability detected during GCC build-time and the setting of -mrelax: -mexplicit-relocs=none if the assembler does not support relocation operators at all, -mexplicit-relocs=always if the assembler supports relocation operators but -mrelax is not enabled, -mexplicit-relocs=auto if the assembler supports relocation operators and -mrelax is enabled.

For backward compatibility, -mexplicit-relocs is equivalent to -mexplicit-relocs=always, while -mno-explicit-relocs is equivalent to -mexplicit-relocs=none.

-mdirect-extern-access ¶

-mno-direct-extern-access

Control use of the GOT to access external symbols. The default is -mno-direct-extern-access: the GOT is used for external symbols with default visibility, but not used for other external symbols.

With -mdirect-extern-access, the GOT is not used and all external symbols are PC-relatively addressed. It is only suitable for environments where no dynamic link is performed, like firmwares, OS kernels, executables linked with -static or -static-pie. -mdirect-extern-access is not compatible with -fPIC or -fpic.

-mrelax ¶

-mno-relax

Take (do not take) advantage of linker relaxations. If -mpass-mrelax-to-as is enabled, this option is also passed to the assembler. The default is determined during GCC build-time by detecting corresponding assembler support: -mrelax if the assembler supports both the -mrelax option and the conditional branch relaxation (it’s required or the .align directives and conditional branch instructions in the assembly code outputted by GCC may be rejected by the assembler because of a relocation overflow), -mno-relax otherwise.

-mpass-mrelax-to-as ¶

-mno-pass-mrelax-to-as

Pass (do not pass) the -mrelax or -mno-relax option to the assembler. The default is determined during GCC build-time by detecting corresponding assembler support: -mpass-mrelax-to-as if the assembler supports the -mrelax option, -mno-pass-mrelax-to-as otherwise. This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one.

-mrecip ¶

This option enables use of the reciprocal estimate and reciprocal square root estimate instructions with additional Newton-Raphson steps to increase precision instead of doing a divide or square root and divide for floating-point arguments. These instructions are generated only when -funsafe-math-optimizations is enabled together with -ffinite-math-only and -fno-trapping-math. This option is off by default. Before you can use this option, you must sure the target CPU supports the frecipe and frsqrte instructions. Note that while the throughput of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).

-mrecip=opt ¶

This option controls which reciprocal estimate instructions may be used. opt is a comma-separated list of options, which may be preceded by a ‘!’ to invert the option:

‘all’: Enable all estimate instructions.
‘default’: Enable the default instructions, equivalent to -mrecip.
‘none’: Disable all estimate instructions, equivalent to -mno-recip.
‘div’: Enable the approximation for scalar division.
‘vec-div’: Enable the approximation for vectorized division.
‘sqrt’: Enable the approximation for scalar square root.
‘vec-sqrt’: Enable the approximation for vectorized square root.
‘rsqrt’: Enable the approximation for scalar reciprocal square root.
‘vec-rsqrt’: Enable the approximation for vectorized reciprocal square root.

So, for example, -mrecip=all,!sqrt enables all of the reciprocal approximations, except for scalar square root.

-mfrecipe ¶

-mno-frecipe

Use (do not use) frecipe.{s/d} and frsqrte.{s/d} instructions. When compiling with -march=la664, it is enabled by default. Otherwise the default is -mno-frecipe.

-mdiv32 ¶

-mno-div32

Use (do not use) div.w[u] and mod.w[u] instructions with input not sign-extended. When compiling with -march=la664, it is enabled by default. Otherwise the default is -mno-div32.

-mlam-bh ¶

-mno-lam-bh

Use (do not use) am{swap/add}[_db].{b/h} instructions. When compiling with -march=la664, it is enabled by default. Otherwise the default is -mno-lam-bh.

-mlamcas ¶

-mno-lamcas

Use (do not use) amcas[_db].{b/h/w/d} instructions. When compiling with -march=la664, it is enabled by default. Otherwise the default is -mno-lamcas.

-mld-seq-sa ¶

-mno-ld-seq-sa

Whether a same-address load-load barrier (dbar 0x700) is needed. When compiling with -march=la664, it is enabled by default. Otherwise the default is -mno-ld-seq-sa, the load-load barrier is needed.

-mscq ¶

-mno-scq

Use (do not use) the 16-byte conditional store instruction sc.q. The default is -mscq if the machine type specified with -march= supports this instruction, -mno-scq otherwise.

-mtls-dialect=opt ¶

This option controls which TLS dialect may be used for general dynamic and local dynamic TLS models. The opt argument can be one of:

‘trad’: Use traditional TLS. This is the default.
‘desc’: Use TLS descriptors.

-mannotate-tablejump ¶

-mno-annotate-tablejump

Create an annotation section .discard.tablejump_annotate to correlate the jirl instruction and the jump table when a jump table is used to optimize the switch statement. Some external tools, for example objtool of the Linux kernel building system, need the annotation to analyze the control flow. The default is -mno-annotate-tablejump.

--param loongarch-vect-unroll-limit=n

The vectorizer uses available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This parameter sets the upper bound of how much the vectorizer unrolls the main loop. The default value is six.