6.4.1.13 LoongArch Function Attributes

The following attributes are supported by LoongArch end:

target (option,...)

The following target-specific function attributes are available for the LoongArch target. These options mirror the behavior of similar command-line options (see LoongArch Options), but on a per-function basis.

strict-align
no-strict-align

strict-align indicates that the compiler should not assume that unaligned memory references are handled by the system. To allow the compiler to assume that aligned memory references are handled by the system, the inverse attribute no-strict-align can be specified. The behavior is same as for the command-line option -mstrict-align and -mno-strict-align.

cmodel=

Indicates that code should be generated for a particular code model for this function. The behavior and permissible arguments are the same as for the command-line option -mcmodel=.

arch=

Specifies the architecture version and architectural extensions to use for this function. The behavior and permissible arguments are the same as for the -march= command-line option.

tune=

Specifies the core for which to tune the performance of this function. The behavior and permissible arguments are the same as for the -mtune= command-line option.

lsx
no-lsx

lsx indicates that vector instruction generation is allowed (not allowed) when compiling the function. The behavior is same as for the command-line option -mlsx and -mno-lsx.

lasx
no-lasx

lasx indicates that lasx instruction generation is allowed (not allowed) when compiling the function. The behavior is slightly different from the command-line option -mno-lasx. Example:

test.c:
typedef int v4i32 __attribute__ ((vector_size(16), aligned(16)));

v4i32 a, b, c;
#ifdef WITH_ATTR
__attribute__ ((target("no-lasx"))) void
#else
void
#endif
test ()
{
  c = a + b;
}
$ gcc test.c -o test.s -O2 -mlasx -DWITH_ATTR

Compiled as above, 128-bit vectorization is possible. But the following method cannot perform 128-bit vectorization.

$ gcc test.c -o test.s -O2 -mlasx -mno-lasx
recipe
no-recipe

recipe indicates that frecipe.{s/d} and frsqrt.{s/d}instruction generation is allowed (not allowed) when compiling the function. The behavior is same as for the command-line option -mrecipe and -mno-recipe.

div32
no-div32

div32 determines whether div.w[u] and mod.w[u] instructions on 64-bit machines are evaluated based only on the lower 32 bits of the input registers. -mdiv32 and -mno-div32.

lam-bh
no-lam-bh

lam-bh indicates that am{swap/add}[_db].{b/h} instruction generation is allowed (not allowed) when compiling the function. The behavior is same as for the command-line option -mlam-bh and -mno-lam-bh.

lamcas
no-lamcas

lamcas indicates that amcas[_db].{b/h/w/d} instruction generation is allowed (not allowed) when compiling the function. The behavior is same as for the command-line option -mlamcas and -mno-lamcas.

scq
no-scq

scq indicates that sc.q instruction generation is allowed (not allowed) when compiling the function. The behavior is same as for the command-line option -mscq and -mno-scq.

ld-seq-sa
no-ld-seq-sa

ld-seq-sa indicates that whether need load-load barries (dbar 0x700) -mld-seq-sa and -mno-ld-seq-sa.

Multiple target function attributes can be specified by separating them with a comma. For example:

__attribute__((target("arch=la64v1.1,lasx")))
int
foo (int a)
{
  return a + 5;
}

is valid and compiles function foo for LA64V1.1 with lasx.

Inlining rules

Specifying target attributes on individual functions or performing link-time optimization across translation units compiled with different target options can affect function inlining rules:

In particular, a caller function can inline a callee function only if the architectural features available to the callee are a subset of the features available to the caller.

Note that when the callee function does not have the always_inline attribute, it will not be inlined if the code model of the caller function is different from the code model of the callee function.

target_clones (string,...)

Like attribute target, these options also reflect the behavior of similar command line options.

Note that this attribute requires GLIBC2.38 and newer that support HWCAP.

string can take the following values:

  • default
  • strict-align
  • arch=
  • lsx
  • lasx
  • frecipe
  • div32
  • lam-bh
  • lamcas
  • scq
  • ld-seq-sa

You can set the priority of attributes in target_clones (except default). For example:

__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
int
foo (int a)
{
  return a + 5;
}

The priority is from low to high:

  • default
  • arch=loongarch64
  • strict-align
  • frecipe = div32 = lam-bh = lamcas = scq = ld-seq-sa
  • lsx
  • arch=la64v1.0
  • arch=la64v1.1
  • lasx

Note that the option values on the gcc command line are not considered when calculating the priority.

If a priority is set for a feature in target_clones, then the priority of this feature will be higher than lasx.

For example:

__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
int
foo (int a)
{
  return a + 5;
}

In this test case, the priority of lsx is higher than that of arch=la64v1.1.

If the same priority is explicitly set for two features, the priority is still calculated according to the priority list above.

For example:

__attribute__((target_clones ("default","arch=la64v1.1;priority=1","lsx;priority=1")))
int
foo (int a)
{
  return a + 5;
}

In this test case, the priority of arch=la64v1.1;priority=1 is higher than that of lsx;priority=1.

target_version (string)

Support attributes and priorities are the same as target_clones. Note that this attribute requires GLIBC2.38 and newer that support HWCAP.

For example:

test1.C

__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
int
foo (int a)
{
  return a + 5;
}

test2.C

__attribute__((target_version ("default")))
int
foo (int a)
{
  return a + 5;
}
__attribute__((target_version ("arch=la64v1.1")))
int
foo (int a)
{
  return a + 5;
}
__attribute__((target_version ("lsx;priority=1")))
int
foo (int a)
{
  return a + 5;
}

The implementations of test1.C and test2.C are equivalent.