18.10.2 Standard Pattern Names for Vectorization

vec_load_lanesmn

Perform an interleaved load of several vectors from memory operand 1 into register operand 0. Both operands have mode m. The register operand is viewed as holding consecutive vectors of mode n, while the memory operand is a flat array that contains the same number of elements. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  for (i = 0; i < c; i++)
    operand0[i][j] = operand1[j * c + i];

For example, ‘vec_load_lanestiv4hi’ loads 8 16-bit values from memory into a register of mode ‘TI’. The register contains two consecutive vectors of mode ‘V4HI’.

This pattern can only be used if:

TARGET_ARRAY_MODE_SUPPORTED_P (n, c)

is true. GCC assumes that, if a target supports this kind of instruction for some mode n, it also supports unaligned loads for vectors of mode n.

This pattern is not allowed to FAIL.

vec_mask_load_lanesmn

Like ‘vec_load_lanesmn’, but takes an additional mask operand (operand 2) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are taken from operand 3, which is an else operand in the subvector mode n, similar to the one in maskload. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[i][j] = operand1[j * c + i];
  else
    for (i = 0; i < c; i++)
      operand0[i][j] = operand3[j];

This pattern is not allowed to FAIL.

vec_mask_len_load_lanesmn

Like ‘vec_load_lanesmn’, but takes an additional mask operand (operand 2), length operand (operand 4) as well as bias operand (operand 5) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are taken from operand 3, which is an else operand similar to the one in maskload. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < operand4 + operand5; j++)
  for (i = 0; i < c; i++)
    if (operand2[j])
      operand0[i][j] = operand1[j * c + i];
    else
      operand0[i][j] = operand3[j];

This pattern is not allowed to FAIL.

vec_store_lanesmn

Equivalent to ‘vec_load_lanesmn’, with the memory and register operands reversed. That is, the instruction is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  for (i = 0; i < c; i++)
    operand0[j * c + i] = operand1[i][j];

for a memory operand 0 and register operand 1.

This pattern is not allowed to FAIL.

vec_mask_store_lanesmn

Like ‘vec_store_lanesmn’, but takes an additional mask operand (operand 2) that specifies which elements of the source vectors should be stored. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[j * c + i] = operand1[i][j];

This pattern is not allowed to FAIL.

vec_mask_len_store_lanesmn

Like ‘vec_store_lanesmn’, but takes an additional mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) that specifies which elements of the source vectors should be stored. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < operand3 + operand4; j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[j * c + i] = operand1[i][j];

This pattern is not allowed to FAIL.

gather_loadmn

Load several separate memory locations into a vector of mode m. Operand 1 is a scalar base address and operand 2 is a vector of mode n containing offsets from that base. Operand 0 is a destination vector with the same number of elements as n. For each element index i:

  • extend the offset element i to address width, using zero extension if operand 3 is 1 and sign extension if operand 3 is zero;
  • multiply the extended offset by operand 4;
  • add the result to the base; and
  • load the value at that address into element i of operand 0.

The value of operand 3 does not matter if the offsets are already address width.

mask_gather_loadmn

Like ‘gather_loadmn’, but takes an extra mask operand as operand 5. Other elements of the destination vectors are taken from operand 6, which is an else operand similar to the one in maskload. Bit i of the mask is set if element i of the result should be loaded from memory and clear if element i of the result should be set to operand 6.

mask_len_gather_loadmn

Like ‘gather_loadmn’, but takes an extra mask operand (operand 5) and an else operand (operand 6) as well as a len operand (operand 7) and a bias operand (operand 8).

Similar to mask_len_load the instruction loads at most (operand 7 + operand 8) elements from memory. Bit i of the mask is set if element i of the result should be loaded from memory and clear if element i of the result should be set to element i of operand 6. Mask elements i with i > (operand 7 + operand 8) are ignored.

mask_len_strided_loadm

Load several separate memory locations into a destination vector of mode m. Operand 0 is a destination vector of mode m. Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. The instruction can be seen as a special case of mask_len_gather_loadmn with an offset vector that is a vec_series with zero as base and operand 2 as step. For each element the load address is operand 1 + i * operand 2. Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. Element i of the mask (operand 3) is set if element i of the result should be loaded from memory and clear if element i of the result should be zero. Mask elements i with i > (operand 4 + operand 5) are ignored.

scatter_storemn

Store a vector of mode m into several distinct memory locations. Operand 0 is a scalar base address and operand 1 is a vector of mode n containing offsets from that base. Operand 4 is the vector of values that should be stored, which has the same number of elements as n. For each element index i:

  • extend the offset element i to address width, using zero extension if operand 2 is 1 and sign extension if operand 2 is zero;
  • multiply the extended offset by operand 3;
  • add the result to the base; and
  • store element i of operand 4 to that address.

The value of operand 2 does not matter if the offsets are already address width.

mask_scatter_storemn

Like ‘scatter_storemn’, but takes an extra mask operand as operand 5. Bit i of the mask is set if element i of the result should be stored to memory.

mask_len_scatter_storemn

Like ‘scatter_storemn’, but takes an extra mask operand (operand 5), a len operand (operand 6) as well as a bias operand (operand 7). The instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit i of the mask is set if element i of (operand 4) should be stored. Mask elements i with i > (operand 6 + operand 7) are ignored.

mask_len_strided_storem

Store a vector of mode m into several distinct memory locations. Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. Operand 2 is the vector of values that should be stored, which is of mode m. operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. The instruction can be seen as a special case of mask_len_scatter_storemn with an offset vector that is a vec_series with zero as base and operand 1 as step. For each element the store address is operand 0 + i * operand 1. Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. Element i of the mask is set if element i of (operand 3) should be stored. Mask elements i with i > (operand 4 + operand 5) are ignored.

while_ultmn

Set operand 0 to a mask that is true while incrementing operand 1 gives a value that is less than operand 2, for a vector length up to operand 3. Operand 0 has mode n and operands 1 and 2 are scalar integers of mode m. Operand 3 should be omitted when n is a vector mode, and a CONST_INT otherwise. The operation for vector modes is equivalent to:

operand0[0] = operand1 < operand2;
for (i = 1; i < GET_MODE_NUNITS (n); i++)
  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

And for non-vector modes the operation is equivalent to:

operand0[0] = operand1 < operand2;
for (i = 1; i < operand3; i++)
  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
select_vlmn

Set operand 0 (of mode n) to the number of scalar iterations that should be handled by one iteration of a vector loop. Operand 1 is the total number of scalar iterations that the loop needs to process and operand 2 is a maximum bound on the result (also known as the maximum “vectorization factor”). Operand 3 (of mode m) is a dummy parameter to pass the vector mode to be used.

The maximum value of operand 0 is given by:

operand0 = MIN (operand1, operand2)

However, targets might choose a lower value than this, based on target-specific criteria. Each iteration of the vector loop might therefore process a different number of scalar iterations, which in turn means that induction variables will have a variable step. Because of this, it is generally not useful to define this instruction if it will always calculate the maximum value.

This optab is only useful on targets that implement ‘len_load_m’ and/or ‘len_store_m’ or the associated ‘_len’ variants.

vec_setm

Set given field in the vector value. Operand 0 is the vector to modify, operand 1 is new value of field and operand 2 specify the field index.

This pattern is not allowed to FAIL.

vec_extractmn

Extract given field from the vector value. Operand 1 is the vector, operand 2 specify field index and operand 0 place to store value into. The n mode is the mode of the field or vector of fields that should be extracted, should be either element mode of the vector mode m, or a vector mode with the same element mode and smaller number of elements. If n is a vector mode the index is counted in multiples of mode n.

This pattern is not allowed to FAIL.

vec_initmn

Initialize the vector to given values. Operand 0 is the vector to initialize and operand 1 is parallel containing values for individual fields. The n mode is the mode of the elements, should be either element mode of the vector mode m, or a vector mode with the same element mode and smaller number of elements.

vec_duplicatem

Initialize vector output operand 0 so that each element has the value given by scalar input operand 1. The vector has mode m and the scalar has the mode appropriate for one element of m.

This pattern only handles duplicates of non-constant inputs. Constant vectors go through the movm pattern instead.

This pattern is not allowed to FAIL.

vec_seriesm

Initialize vector output operand 0 so that element i is equal to operand 1 plus i times operand 2. In other words, create a linear series whose base value is operand 1 and whose step is operand 2.

The vector output has mode m and the scalar inputs have the mode appropriate for one element of m. This pattern is not used for floating-point vectors, in order to avoid having to specify the rounding behavior for i > 1.

This pattern is not allowed to FAIL.

check_raw_ptrsm

Check whether, given two pointers a and b and a length len, a write of len bytes at a followed by a read of len bytes at b can be split into interleaved byte accesses ‘a[0], b[0], a[1], b[1], …’ without affecting the dependencies between the bytes. Set operand 0 to true if the split is possible and false otherwise.

Operands 1, 2 and 3 provide the values of a, b and len respectively. Operand 4 is a constant integer that provides the known common alignment of a and b. All inputs have mode m.

This split is possible if:

a == b || a + len <= b || b + len <= a

You should only define this pattern if the target has a way of accelerating the test without having to do the individual comparisons.

check_war_ptrsm

Like ‘check_raw_ptrsm’, but with the read and write swapped round. The split is possible in this case if:

b <= a || a + len <= b
vec_cmpmn

Output a vector comparison. Operand 0 of mode n is the destination for predicate in operand 1 which is a signed vector comparison with operands of mode m in operands 2 and 3. Predicate is computed by elementwise evaluation of the vector comparison with a truth value of all-ones and a false value of all-zeros.

vec_cmpumn

Similar to vec_cmpmn but perform unsigned vector comparison.

vec_cmpeqmn

Similar to vec_cmpmn but perform equality or non-equality vector comparison only. If vec_cmpmn or vec_cmpumn instruction pattern is supported, it will be preferred over vec_cmpeqmn, so there is no need to define this instruction pattern if the others are supported.

vcond_mask_mn

Output a conditional vector move. Operand 0 is the destination to receive a combination of operand 1 and operand 2, depending on the mask in operand 3. Operands 0, 1, and 2 have mode m while operand 3 has mode n.

Suppose that m has e elements. There are then two supported forms of n. The first form is an integer or boolean vector that also has e elements. In this case, each element is -1 or 0, with -1 selecting elements from operand 1 and 0 selecting elements from operand 2. The second supported form of n is a scalar integer that has at least e bits. A set bit then selects from operand 1 and a clear bit selects from operand 2. Bits e and above have no effect.

Subject to those restrictions, the behavior is equivalent to:

for (i = 0; i < e; i++)
  op0[i] = op3[i] ? op1[i] : op2[i];
vcond_mask_len_mn

Set each element of operand 0 to the corresponding element of operand 2 or operand 3. Choose operand 2 if both the element index is less than operand 4 plus operand 5 and the corresponding element of operand 1 is nonzero:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i];

Operands 0, 2 and 3 have mode m. Operand 1 has mode n. Operands 4 and 5 have a target-dependent scalar integer mode.

maskloadmn

Perform a masked load of vector from memory operand 1 of mode m into register operand 0. The mask is provided in register operand 2 of mode n. Operand 3 (the “else value”) is of mode m and specifies which value is loaded when the mask is unset. The predicate of operand 3 must only accept the else values that the target actually supports. Currently three values are attempted, zero, -1, and undefined. GCC handles an else value of zero more efficiently than -1 or undefined.

This pattern is not allowed to FAIL.

maskstoremn

Perform a masked store of vector from register operand 1 of mode m into memory operand 0. Mask is provided in register operand 2 of mode n.

This pattern is not allowed to FAIL.

len_load_m

Load (operand 3 + operand 4) elements from memory operand 1 into vector register operand 0. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. Operand 2 (the “else value”) is of mode m and specifies which value is loaded for the remaining elements. The predicate of operand 2 must only accept the else values that the target actually supports. Operand 4 conceptually has mode QI.

Operand 3 can be a variable or a constant amount. Operand 4 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 3 + operand 4) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

len_store_m

Store (operand 2 + operand 3) vector elements from vector register operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged. Operands 0 and 1 have mode m, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. Operand 3 conceptually has mode QI.

Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 + operand 3) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

mask_len_loadmn

Perform a masked load from the memory location pointed to by operand 1 into register operand 0. (operand 3 + operand 4) elements are loaded from memory and other elements in operand 0 are set to undefined values. This is a combination of len_load and maskload. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type n. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 4 + operand 5) are used. Operand 3 is an else operand similar to the one in maskload. Operand 4 conceptually has mode QI.

Operand 4 can be a variable or a constant amount. Operand 5 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 5 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 4 + operand 5) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

mask_len_storemn

Perform a masked store from vector register operand 1 into memory operand 0. (operand 3 + operand 4) elements are stored to memory and leave the other elements of operand 0 unchanged. This is a combination of len_store and maskstore. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type n. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 3 + operand 4) are used. Operand 4 conceptually has mode QI.

Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 + operand 4) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

vec_permm

Output a (variable) vector permutation. Operand 0 is the destination to receive elements from operand 1 and operand 2, which are of mode m. Operand 3 is the selector. It is an integral mode vector of the same width and number of elements as mode m.

The input elements are numbered from 0 in operand 1 through 2*N-1 in operand 2. The elements of the selector must be computed modulo 2*N. Note that if rtx_equal_p(operand1, operand2), this can be implemented with just operand 1 and selector elements modulo N.

In order to make things easy for a number of targets, if there is no ‘vec_perm’ pattern for mode m, but there is for mode q where q is a vector of QImode of the same width as m, the middle-end will lower the mode m VEC_PERM_EXPR to mode q.

See also TARGET_VECTORIZER_VEC_PERM_CONST, which performs the analogous operation for constant selectors.

reduc_smin_scal_m’, ‘reduc_smax_scal_m

Find the signed minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

reduc_umin_scal_m’, ‘reduc_umax_scal_m

Find the unsigned minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

reduc_fmin_scal_m’, ‘reduc_fmax_scal_m

Find the floating-point minimum/maximum of the elements of a vector, using the same rules as fminm3 and fmaxm3. Operand 1 is a vector of mode m and operand 0 is the scalar result, which has mode GET_MODE_INNER (m).

reduc_plus_scal_m

Compute the sum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

reduc_and_scal_m
reduc_ior_scal_m
reduc_xor_scal_m

Compute the bitwise AND/IOR/XOR reduction of the elements of a vector of mode m. Operand 1 is the vector input and operand 0 is the scalar result. The mode of the scalar result is the same as one element of m.

reduc_sbool_and_scal_m
reduc_sbool_ior_scal_m
reduc_sbool_xor_scal_m

Compute the bitwise AND/IOR/XOR reduction of the elements of a vector boolean of mode m. Operand 1 is the vector input and operand 0 is the scalar result. The mode of the scalar result is QImode with its value either zero or one. If mode m is a scalar integer mode then operand 2 is the number of elements in the input vector to provide disambiguation for the case m is ambiguous.

extract_last_m

Find the last set bit in mask operand 1 and extract the associated element of vector operand 2. Store the result in scalar operand 0. Operand 2 has vector mode m while operand 0 has the mode appropriate for one element of m. Operand 1 has the usual mask mode for vectors of mode m; see TARGET_VECTORIZE_GET_MASK_MODE.

fold_extract_last_m

If any bits of mask operand 2 are set, find the last set bit, extract the associated element from vector operand 3, and store the result in operand 0. Store operand 1 in operand 0 otherwise. Operand 3 has mode m and operands 0 and 1 have the mode appropriate for one element of m. Operand 2 has the usual mask mode for vectors of mode m; see TARGET_VECTORIZE_GET_MASK_MODE.

len_fold_extract_last_m

Like ‘fold_extract_last_m’, but takes an extra length operand as operand 4 and an extra bias operand as operand 5. The last associated element is extracted should have the index i < len (operand 4) + bias (operand 5).

fold_left_plus_m

Take scalar operand 1 and successively add each element from vector operand 2. Store the result in scalar operand 0. The vector has mode m and the scalars have the mode appropriate for one element of m. The operation is strictly in-order: there is no reassociation.

mask_fold_left_plus_m

Like ‘fold_left_plus_m’, but takes an additional mask operand (operand 3) that specifies which elements of the source vector should be added.

mask_len_fold_left_plus_m

Like ‘fold_left_plus_m’, but takes an additional mask operand (operand 3), len operand (operand 4) and bias operand (operand 5) that performs following operations strictly in-order (no reassociation):

operand0 = operand1;
for (i = 0; i < LEN + BIAS; i++)
  if (operand3[i])
    operand0 += operand2[i];
sdot_prodmn

Multiply operand 1 by operand 2 without loss of precision, given that both operands contain signed elements. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

sdot<signed op0, signed op1, signed op2, signed op3> ==
   op0 = sign-ext (op1) * sign-ext (op2) + op3
...
udot_prodmn

Multiply operand 1 by operand 2 without loss of precision, given that both operands contain unsigned elements. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> ==
   op0 = zero-ext (op1) * zero-ext (op2) + op3
...
usdot_prodmn

Compute the sum of the products of elements of different signs. Multiply operand 1 by operand 2 without loss of precision, given that operand 1 is unsigned and operand 2 is signed. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

usdot<signed op0, unsigned op1, signed op2, signed op3> ==
   op0 = ((signed-conv) zero-ext (op1)) * sign-ext (op2) + op3
...
vec_shl_insert_m

Shift the elements in vector input operand 1 left one element (i.e. away from element 0) and fill the vacated element 0 with the scalar in operand 2. Store the result in vector output operand 0. Operands 0 and 1 have mode m and operand 2 has the mode appropriate for one element of m.

vec_shl_m

Whole vector left shift in bits, i.e. away from element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes.

vec_shr_m

Whole vector right shift in bits, i.e. towards element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes.

vec_pack_trunc_m

Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral or floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated after narrowing them down using truncation.

vec_pack_sbool_trunc_m

Narrow and merge the elements of two vectors. Operands 1 and 2 are vectors of the same type having N boolean elements. Operand 0 is the resulting vector in which 2*N elements are concatenated. The last operand (operand 3) is the number of elements in the output vector 2*N as a CONST_INT. This instruction pattern is used when all the vector input and output operands have the same scalar mode m and thus using vec_pack_trunc_m would be ambiguous.

vec_pack_ssat_m’, ‘vec_pack_usat_m

Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral elements of size S. Operand 0 is the resulting vector in which the elements of the two input vectors are concatenated after narrowing them down using signed/unsigned saturating arithmetic.

vec_pack_sfix_trunc_m’, ‘vec_pack_ufix_trunc_m

Narrow, convert to signed/unsigned integral type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated.

vec_packs_float_m’, ‘vec_packu_float_m

Narrow, convert to floating point type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N signed/unsigned integral elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated.

vec_unpacks_hi_m’, ‘vec_unpacks_lo_m

Extract and widen (promote) the high/low part of a vector of signed integral or floating point elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using signed or floating point extension and place the resulting N/2 values of size 2*S in the output vector (operand 0).

vec_unpacku_hi_m’, ‘vec_unpacku_lo_m

Extract and widen (promote) the high/low part of a vector of unsigned integral elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using zero extension and place the resulting N/2 values of size 2*S in the output vector (operand 0).

vec_unpacks_sbool_hi_m’, ‘vec_unpacks_sbool_lo_m

Extract the high/low part of a vector of boolean elements that have scalar mode m. The input vector (operand 1) has N elements, the output vector (operand 0) has N/2 elements. The last operand (operand 2) is the number of elements of the input vector N as a CONST_INT. These patterns are used if both the input and output vectors have the same scalar mode m and thus using vec_unpacks_hi_m or vec_unpacks_lo_m would be ambiguous.

vec_unpacks_float_hi_m’, ‘vec_unpacks_float_lo_m
vec_unpacku_float_hi_m’, ‘vec_unpacku_float_lo_m

Extract, convert to floating point type and widen the high/low part of a vector of signed/unsigned integral elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector using floating point conversion and place the resulting N/2 values of size 2*S in the output vector (operand 0).

vec_unpack_sfix_trunc_hi_m’,
vec_unpack_sfix_trunc_lo_m
vec_unpack_ufix_trunc_hi_m
vec_unpack_ufix_trunc_lo_m

Extract, convert to signed/unsigned integer type and widen the high/low part of a vector of floating point elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector to integers and place the resulting N/2 values of size 2*S in the output vector (operand 0).

vec_widen_umult_hi_m’, ‘vec_widen_umult_lo_m
vec_widen_smult_hi_m’, ‘vec_widen_smult_lo_m
vec_widen_umult_even_m’, ‘vec_widen_umult_odd_m
vec_widen_smult_even_m’, ‘vec_widen_smult_odd_m

Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) are vectors with N signed/unsigned elements of size S. Multiply the high/low or even/odd elements of the two vectors, and put the N/2 products of size 2*S in the output vector (operand 0). A target shouldn’t implement even/odd pattern pair if it is less efficient than lo/hi one.

vec_widen_ushiftl_hi_m’, ‘vec_widen_ushiftl_lo_m
vec_widen_sshiftl_hi_m’, ‘vec_widen_sshiftl_lo_m

Signed/Unsigned widening shift left. The first input (operand 1) is a vector with N signed/unsigned elements of size S. Operand 2 is a constant. Shift the high/low elements of operand 1, and put the N/2 results of size 2*S in the output vector (operand 0).

vec_widen_uaddl_hi_m’, ‘vec_widen_uaddl_lo_m
vec_widen_saddl_hi_m’, ‘vec_widen_saddl_lo_m

Signed/Unsigned widening add long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Add the high/low elements of 1 and 2 together, widen the resulting elements and put the N/2 results of size 2*S in the output vector (operand 0).

vec_widen_usubl_hi_m’, ‘vec_widen_usubl_lo_m
vec_widen_ssubl_hi_m’, ‘vec_widen_ssubl_lo_m

Signed/Unsigned widening subtract long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Subtract the high/low elements of 2 from 1 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0).

vec_widen_uabd_hi_m’, ‘vec_widen_uabd_lo_m
vec_widen_uabd_odd_m’, ‘vec_widen_uabd_even_m
vec_widen_sabd_hi_m’, ‘vec_widen_sabd_lo_m
vec_widen_sabd_odd_m’, ‘vec_widen_sabd_even_m

Signed/Unsigned widening absolute difference. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Find the absolute difference between operands 1 and 2 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0).

vec_trunc_add_highm

Signed or unsigned addition of two input integer vectors of mode m, then extracts the most significant half of each result element and narrows it to elements of half the original width.

Concretely, it computes: (bits(a)/2)((a + b) >> bits(a)/2)

where bits(a) is the width in bits of each input element.

Operand 1 and 2 are of integer vector mode m containing the same number of signed or unsigned integral elements. The result (operand 0) is of an integer vector mode with the same number of elements but elements of half of the width of those of mode m.

This operation currently only used for early break result compression when the result of a vector boolean can be represented as 0 or -1.

vec_addsubm3

Alternating subtract, add with even lanes doing subtract and odd lanes doing addition. Operands 1 and 2 and the outout operand are vectors with mode m.

vec_fmaddsubm4

Alternating multiply subtract, add with even lanes doing subtract and odd lanes doing addition of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode m.

vec_fmsubaddm4

Alternating multiply add, subtract with even lanes doing addition and odd lanes doing subtraction of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode m.

These instructions are not allowed to FAIL.

cadd90m3

Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as

  for (int i = 0; i < N; i += 2)
    {
      c[i] = a[i] - b[i+1];
      c[i+1] = a[i+1] + b[i];
    }

This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 90 degrees around the argand plane and storing the result in operand 0.

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cadd270m3

Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as

  for (int i = 0; i < N; i += 2)
    {
      c[i] = a[i] + b[i+1];
      c[i+1] = a[i+1] - b[i];
    }

This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 270 degrees around the argand plane and storing the result in operand 0.

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmlam4

Perform a vector multiply and accumulate that is semantically the same as a multiply and accumulate of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i] + op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmla_conjm4

Perform a vector multiply by conjugate and accumulate that is semantically the same as a multiply and accumulate of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]) + op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmlsm4

Perform a vector multiply and subtract that is semantically the same as a multiply and subtract of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i] - op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmls_conjm4

Perform a vector multiply by conjugate and subtract that is semantically the same as a multiply and subtract of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]) - op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmulm4

Perform a vector multiply that is semantically the same as multiply of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cmul_conjm4

Perform a vector multiply by conjugate that is semantically the same as a multiply of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]);
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

cond_negmode
cond_one_cmplmode
cond_sqrtmode
cond_ceilmode
cond_floormode
cond_roundmode
cond_rintmode

When operand 1 is true, perform an operation on operands 2 and store the result in operand 0, otherwise store operand 3 in operand 0. The operation works elementwise if the operands are vectors.

The scalar case is equivalent to:

op0 = op1 ? op op2 : op3;

while the vector case is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? op op2[i] : op3[i];

where, for example, op is ~ for ‘cond_one_cmplmode’.

When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, and 3 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE.

cond_opmode’ generally corresponds to a conditional form of ‘opmode2’.

cond_addmode
cond_submode
cond_mulmode
cond_divmode
cond_udivmode
cond_modmode
cond_umodmode
cond_andmode
cond_iormode
cond_xormode
cond_sminmode
cond_smaxmode
cond_uminmode
cond_umaxmode
cond_copysignmode
cond_fminmode
cond_fmaxmode
cond_ashlmode
cond_ashrmode
cond_lshrmode

When operand 1 is true, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation works elementwise if the operands are vectors.

The scalar case is equivalent to:

op0 = op1 ? op2 op op3 : op4;

while the vector case is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? op2[i] op op3[i] : op4[i];

where, for example, op is + for ‘cond_addmode’.

When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE.

cond_opmode’ generally corresponds to a conditional form of ‘opmode3’. As an exception, the vector forms of shifts correspond to patterns like vashlmode3 rather than patterns like ashlmode3.

cond_copysignmode’ is only defined for floating point modes.

cond_fmamode
cond_fmsmode
cond_fnmamode
cond_fnmsmode

Like ‘cond_addm’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_fmamode’ is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];
cond_len_negmode
cond_len_one_cmplmode
cond_len_sqrtmode
cond_len_ceilmode
cond_len_floormode
cond_len_roundmode
cond_len_rintmode

When operand 1 is true and element index < operand 4 + operand 5, perform an operation on operands 1 and store the result in operand 0, otherwise store operand 2 in operand 0. The operation only works for the operands are vectors.

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[4] + ops[5] && op1[i]
            ? op op2[i]
            : op3[i]);

where, for example, op is ~ for ‘cond_len_one_cmplmode’.

When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, and 3 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE. Operand 4 has whichever integer mode the target prefers.

cond_len_opmode’ generally corresponds to a conditional form of ‘opmode2’.

cond_len_addmode
cond_len_submode
cond_len_mulmode
cond_len_divmode
cond_len_udivmode
cond_len_modmode
cond_len_umodmode
cond_len_andmode
cond_len_iormode
cond_len_xormode
cond_len_sminmode
cond_len_smaxmode
cond_len_uminmode
cond_len_umaxmode
cond_len_copysignmode
cond_len_fminmode
cond_len_fmaxmode
cond_len_ashlmode
cond_len_ashrmode
cond_len_lshrmode

When operand 1 is true and element index < operand 5 + operand 6, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation only works for the operands are vectors.

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[5] + ops[6] && op1[i]
            ? op2[i] op op3[i]
            : op4[i]);

where, for example, op is + for ‘cond_len_addmode’.

When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE. Operand 5 has whichever integer mode the target prefers.

cond_len_opmode’ generally corresponds to a conditional form of ‘opmode3’. As an exception, the vector forms of shifts correspond to patterns like vashlmode3 rather than patterns like ashlmode3.

cond_len_copysignmode’ is only defined for floating point modes.

cond_len_fmamode
cond_len_fmsmode
cond_len_fnmamode
cond_len_fnmsmode

Like ‘cond_len_addm’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_len_fmamode’ is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[6] + ops[7] && op1[i]
            ? fma (op2[i], op3[i], op4[i])
            : op5[i]);
cbranchmode4

Conditional branch instruction combined with a compare instruction. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to. For vectors this optab is only used for comparisons of VECTOR_BOOLEAN_TYPE_P values and it never called for data-registers. Data vector operands should use one of the patterns below instead.

vec_cbranch_anymode

Conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to.

vec_cbranch_allmode

Conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to.

cond_vec_cbranch_anymode

Masked conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 5 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

cond_vec_cbranch_allmode

Masked conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 5 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

cond_len_vec_cbranch_anymode

Len based conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 4 is the len operand and Operand 5 is the bias operand. Operand 6 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

cond_len_vec_cbranch_allmode

Len based conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 4 is the len operand and Operand 5 is the bias operand. Operand 6 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.