Standard Pattern Names for Vectorization (GNU Compiler Collection (GCC) Internals)

Previous: General Standard Pattern Names, Up: Standard Pattern Names For Generation [Contents][Index]

18.10.2 Standard Pattern Names for Vectorization ¶

‘vec_load_lanesmn’ ¶

Perform an interleaved load of several vectors from memory operand 1 into register operand 0. Both operands have mode m. The register operand is viewed as holding consecutive vectors of mode n, while the memory operand is a flat array that contains the same number of elements. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  for (i = 0; i < c; i++)
    operand0[i][j] = operand1[j * c + i];

For example, ‘vec_load_lanestiv4hi’ loads 8 16-bit values from memory into a register of mode ‘TI’. The register contains two consecutive vectors of mode ‘V4HI’.

This pattern can only be used if:

TARGET_ARRAY_MODE_SUPPORTED_P (n, c)

is true. GCC assumes that, if a target supports this kind of instruction for some mode n, it also supports unaligned loads for vectors of mode n.

This pattern is not allowed to FAIL.

‘vec_mask_load_lanesmn’ ¶

Like ‘vec_load_lanesmn’, but takes an additional mask operand (operand 2) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are taken from operand 3, which is an else operand in the subvector mode n, similar to the one in maskload. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[i][j] = operand1[j * c + i];
  else
    for (i = 0; i < c; i++)
      operand0[i][j] = operand3[j];

This pattern is not allowed to FAIL.

‘vec_mask_len_load_lanesmn’ ¶

Like ‘vec_load_lanesmn’, but takes an additional mask operand (operand 2), length operand (operand 4) as well as bias operand (operand 5) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are taken from operand 3, which is an else operand similar to the one in maskload. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < operand4 + operand5; j++)
  for (i = 0; i < c; i++)
    if (operand2[j])
      operand0[i][j] = operand1[j * c + i];
    else
      operand0[i][j] = operand3[j];

This pattern is not allowed to FAIL.

‘vec_store_lanesmn’ ¶

Equivalent to ‘vec_load_lanesmn’, with the memory and register operands reversed. That is, the instruction is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  for (i = 0; i < c; i++)
    operand0[j * c + i] = operand1[i][j];

for a memory operand 0 and register operand 1.

This pattern is not allowed to FAIL.

‘vec_mask_store_lanesmn’ ¶

Like ‘vec_store_lanesmn’, but takes an additional mask operand (operand 2) that specifies which elements of the source vectors should be stored. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < GET_MODE_NUNITS (n); j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[j * c + i] = operand1[i][j];

This pattern is not allowed to FAIL.

‘vec_mask_len_store_lanesmn’ ¶

Like ‘vec_store_lanesmn’, but takes an additional mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) that specifies which elements of the source vectors should be stored. The operation is equivalent to:

int c = GET_MODE_SIZE (m) / GET_MODE_SIZE (n);
for (j = 0; j < operand3 + operand4; j++)
  if (operand2[j])
    for (i = 0; i < c; i++)
      operand0[j * c + i] = operand1[i][j];

This pattern is not allowed to FAIL.

‘gather_loadmn’ ¶

Load several separate memory locations into a vector of mode m. Operand 1 is a scalar base address and operand 2 is a vector of mode n containing offsets from that base. Operand 0 is a destination vector with the same number of elements as n. For each element index i:

extend the offset element i to address width, using zero extension if operand 3 is 1 and sign extension if operand 3 is zero;
multiply the extended offset by operand 4;
add the result to the base; and
load the value at that address into element i of operand 0.

The value of operand 3 does not matter if the offsets are already address width.

‘mask_gather_loadmn’ ¶

Like ‘gather_loadmn’, but takes an extra mask operand as operand 5. Other elements of the destination vectors are taken from operand 6, which is an else operand similar to the one in maskload. Bit i of the mask is set if element i of the result should be loaded from memory and clear if element i of the result should be set to operand 6.

‘mask_len_gather_loadmn’ ¶

Like ‘gather_loadmn’, but takes an extra mask operand (operand 5) and an else operand (operand 6) as well as a len operand (operand 7) and a bias operand (operand 8).

Similar to mask_len_load the instruction loads at most (operand 7 + operand 8) elements from memory. Bit i of the mask is set if element i of the result should be loaded from memory and clear if element i of the result should be set to element i of operand 6. Mask elements i with i > (operand 7 + operand 8) are ignored.

‘mask_len_strided_loadm’ ¶

Load several separate memory locations into a destination vector of mode m. Operand 0 is a destination vector of mode m. Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. The instruction can be seen as a special case of mask_len_gather_loadmn with an offset vector that is a vec_series with zero as base and operand 2 as step. For each element the load address is operand 1 + i * operand 2. Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. Element i of the mask (operand 3) is set if element i of the result should be loaded from memory and clear if element i of the result should be zero. Mask elements i with i > (operand 4 + operand 5) are ignored.

‘scatter_storemn’ ¶

Store a vector of mode m into several distinct memory locations. Operand 0 is a scalar base address and operand 1 is a vector of mode n containing offsets from that base. Operand 4 is the vector of values that should be stored, which has the same number of elements as n. For each element index i:

extend the offset element i to address width, using zero extension if operand 2 is 1 and sign extension if operand 2 is zero;
multiply the extended offset by operand 3;
add the result to the base; and
store element i of operand 4 to that address.

The value of operand 2 does not matter if the offsets are already address width.

‘mask_scatter_storemn’ ¶

Like ‘scatter_storemn’, but takes an extra mask operand as operand 5. Bit i of the mask is set if element i of the result should be stored to memory.

‘mask_len_scatter_storemn’ ¶

Like ‘scatter_storemn’, but takes an extra mask operand (operand 5), a len operand (operand 6) as well as a bias operand (operand 7). The instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit i of the mask is set if element i of (operand 4) should be stored. Mask elements i with i > (operand 6 + operand 7) are ignored.

‘mask_len_strided_storem’ ¶

Store a vector of mode m into several distinct memory locations. Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. Operand 2 is the vector of values that should be stored, which is of mode m. operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. The instruction can be seen as a special case of mask_len_scatter_storemn with an offset vector that is a vec_series with zero as base and operand 1 as step. For each element the store address is operand 0 + i * operand 1. Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. Element i of the mask is set if element i of (operand 3) should be stored. Mask elements i with i > (operand 4 + operand 5) are ignored.

while_ultmn ¶

Set operand 0 to a mask that is true while incrementing operand 1 gives a value that is less than operand 2, for a vector length up to operand 3. Operand 0 has mode n and operands 1 and 2 are scalar integers of mode m. Operand 3 should be omitted when n is a vector mode, and a CONST_INT otherwise. The operation for vector modes is equivalent to:

operand0[0] = operand1 < operand2;
for (i = 1; i < GET_MODE_NUNITS (n); i++)
  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

And for non-vector modes the operation is equivalent to:

operand0[0] = operand1 < operand2;
for (i = 1; i < operand3; i++)
  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

select_vlmn ¶

Set operand 0 (of mode n) to the number of scalar iterations that should be handled by one iteration of a vector loop. Operand 1 is the total number of scalar iterations that the loop needs to process and operand 2 is a maximum bound on the result (also known as the maximum “vectorization factor”). Operand 3 (of mode m) is a dummy parameter to pass the vector mode to be used.

The maximum value of operand 0 is given by:

operand0 = MIN (operand1, operand2)

However, targets might choose a lower value than this, based on target-specific criteria. Each iteration of the vector loop might therefore process a different number of scalar iterations, which in turn means that induction variables will have a variable step. Because of this, it is generally not useful to define this instruction if it will always calculate the maximum value.

This optab is only useful on targets that implement ‘len_load_m’ and/or ‘len_store_m’ or the associated ‘_len’ variants.

‘vec_setm’ ¶

Set given field in the vector value. Operand 0 is the vector to modify, operand 1 is new value of field and operand 2 specify the field index.

This pattern is not allowed to FAIL.

‘vec_extractmn’ ¶

Extract given field from the vector value. Operand 1 is the vector, operand 2 specify field index and operand 0 place to store value into. The n mode is the mode of the field or vector of fields that should be extracted, should be either element mode of the vector mode m, or a vector mode with the same element mode and smaller number of elements. If n is a vector mode the index is counted in multiples of mode n.

This pattern is not allowed to FAIL.

‘vec_initmn’ ¶

Initialize the vector to given values. Operand 0 is the vector to initialize and operand 1 is parallel containing values for individual fields. The n mode is the mode of the elements, should be either element mode of the vector mode m, or a vector mode with the same element mode and smaller number of elements.

‘vec_duplicatem’ ¶

Initialize vector output operand 0 so that each element has the value given by scalar input operand 1. The vector has mode m and the scalar has the mode appropriate for one element of m.

This pattern only handles duplicates of non-constant inputs. Constant vectors go through the movm pattern instead.

This pattern is not allowed to FAIL.

‘vec_seriesm’ ¶

Initialize vector output operand 0 so that element i is equal to operand 1 plus i times operand 2. In other words, create a linear series whose base value is operand 1 and whose step is operand 2.

The vector output has mode m and the scalar inputs have the mode appropriate for one element of m. This pattern is not used for floating-point vectors, in order to avoid having to specify the rounding behavior for i > 1.

This pattern is not allowed to FAIL.

‘check_raw_ptrsm’ ¶

Check whether, given two pointers a and b and a length len, a write of len bytes at a followed by a read of len bytes at b can be split into interleaved byte accesses ‘a[0], b[0], a[1], b[1], …’ without affecting the dependencies between the bytes. Set operand 0 to true if the split is possible and false otherwise.

Operands 1, 2 and 3 provide the values of a, b and len respectively. Operand 4 is a constant integer that provides the known common alignment of a and b. All inputs have mode m.

This split is possible if:

a == b || a + len <= b || b + len <= a

You should only define this pattern if the target has a way of accelerating the test without having to do the individual comparisons.

‘check_war_ptrsm’ ¶

Like ‘check_raw_ptrsm’, but with the read and write swapped round. The split is possible in this case if:

b <= a || a + len <= b

‘vec_cmpmn’ ¶

Output a vector comparison. Operand 0 of mode n is the destination for predicate in operand 1 which is a signed vector comparison with operands of mode m in operands 2 and 3. Predicate is computed by elementwise evaluation of the vector comparison with a truth value of all-ones and a false value of all-zeros.

‘vec_cmpumn’ ¶

Similar to vec_cmpmn but perform unsigned vector comparison.

‘vec_cmpeqmn’ ¶

Similar to vec_cmpmn but perform equality or non-equality vector comparison only. If vec_cmpmn or vec_cmpumn instruction pattern is supported, it will be preferred over vec_cmpeqmn, so there is no need to define this instruction pattern if the others are supported.

‘vcond_mask_mn’ ¶

Output a conditional vector move. Operand 0 is the destination to receive a combination of operand 1 and operand 2, depending on the mask in operand 3. Operands 0, 1, and 2 have mode m while operand 3 has mode n.

Suppose that m has e elements. There are then two supported forms of n. The first form is an integer or boolean vector that also has e elements. In this case, each element is -1 or 0, with -1 selecting elements from operand 1 and 0 selecting elements from operand 2. The second supported form of n is a scalar integer that has at least e bits. A set bit then selects from operand 1 and a clear bit selects from operand 2. Bits e and above have no effect.

Subject to those restrictions, the behavior is equivalent to:

for (i = 0; i < e; i++)
  op0[i] = op3[i] ? op1[i] : op2[i];

‘vcond_mask_len_mn’ ¶

Set each element of operand 0 to the corresponding element of operand 2 or operand 3. Choose operand 2 if both the element index is less than operand 4 plus operand 5 and the corresponding element of operand 1 is nonzero:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i];

Operands 0, 2 and 3 have mode m. Operand 1 has mode n. Operands 4 and 5 have a target-dependent scalar integer mode.

‘maskloadmn’ ¶

Perform a masked load of vector from memory operand 1 of mode m into register operand 0. The mask is provided in register operand 2 of mode n. Operand 3 (the “else value”) is of mode m and specifies which value is loaded when the mask is unset. The predicate of operand 3 must only accept the else values that the target actually supports. Currently three values are attempted, zero, -1, and undefined. GCC handles an else value of zero more efficiently than -1 or undefined.

This pattern is not allowed to FAIL.

‘maskstoremn’ ¶

Perform a masked store of vector from register operand 1 of mode m into memory operand 0. Mask is provided in register operand 2 of mode n.

This pattern is not allowed to FAIL.

‘len_load_m’ ¶

Load (operand 3 + operand 4) elements from memory operand 1 into vector register operand 0. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. Operand 2 (the “else value”) is of mode m and specifies which value is loaded for the remaining elements. The predicate of operand 2 must only accept the else values that the target actually supports. Operand 4 conceptually has mode QI.

Operand 3 can be a variable or a constant amount. Operand 4 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 3 + operand 4) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

‘len_store_m’ ¶

Store (operand 2 + operand 3) vector elements from vector register operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged. Operands 0 and 1 have mode m, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. Operand 3 conceptually has mode QI.

Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 + operand 3) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

‘mask_len_loadmn’ ¶

Perform a masked load from the memory location pointed to by operand 1 into register operand 0. (operand 3 + operand 4) elements are loaded from memory and other elements in operand 0 are set to undefined values. This is a combination of len_load and maskload. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type n. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 4 + operand 5) are used. Operand 3 is an else operand similar to the one in maskload. Operand 4 conceptually has mode QI.

Operand 4 can be a variable or a constant amount. Operand 5 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 5 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 4 + operand 5) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

‘mask_len_storemn’ ¶

Perform a masked store from vector register operand 1 into memory operand 0. (operand 3 + operand 4) elements are stored to memory and leave the other elements of operand 0 unchanged. This is a combination of len_store and maskstore. Operands 0 and 1 have mode m, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type n. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 3 + operand 4) are used. Operand 4 conceptually has mode QI.

Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 + operand 4) exceeds the number of elements in mode m, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of QI elements.

This pattern is not allowed to FAIL.

‘vec_permm’ ¶

Output a (variable) vector permutation. Operand 0 is the destination to receive elements from operand 1 and operand 2, which are of mode m. Operand 3 is the selector. It is an integral mode vector of the same width and number of elements as mode m.

The input elements are numbered from 0 in operand 1 through 2*N-1 in operand 2. The elements of the selector must be computed modulo 2*N. Note that if rtx_equal_p(operand1, operand2), this can be implemented with just operand 1 and selector elements modulo N.

In order to make things easy for a number of targets, if there is no ‘vec_perm’ pattern for mode m, but there is for mode q where q is a vector of QImode of the same width as m, the middle-end will lower the mode m VEC_PERM_EXPR to mode q.

See also TARGET_VECTORIZER_VEC_PERM_CONST, which performs the analogous operation for constant selectors.

‘reduc_smin_scal_m’, ‘reduc_smax_scal_m’ ¶

Find the signed minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

‘reduc_umin_scal_m’, ‘reduc_umax_scal_m’ ¶

Find the unsigned minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

‘reduc_fmin_scal_m’, ‘reduc_fmax_scal_m’ ¶

Find the floating-point minimum/maximum of the elements of a vector, using the same rules as fminm3 and fmaxm3. Operand 1 is a vector of mode m and operand 0 is the scalar result, which has mode GET_MODE_INNER (m).

‘reduc_plus_scal_m’ ¶

Compute the sum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector.

‘reduc_and_scal_m’ ¶

‘reduc_ior_scal_m’

‘reduc_xor_scal_m’

Compute the bitwise AND/IOR/XOR reduction of the elements of a vector of mode m. Operand 1 is the vector input and operand 0 is the scalar result. The mode of the scalar result is the same as one element of m.

‘reduc_sbool_and_scal_m’ ¶

‘reduc_sbool_ior_scal_m’

‘reduc_sbool_xor_scal_m’

Compute the bitwise AND/IOR/XOR reduction of the elements of a vector boolean of mode m. Operand 1 is the vector input and operand 0 is the scalar result. The mode of the scalar result is QImode with its value either zero or one. If mode m is a scalar integer mode then operand 2 is the number of elements in the input vector to provide disambiguation for the case m is ambiguous.

extract_last_m ¶

Find the last set bit in mask operand 1 and extract the associated element of vector operand 2. Store the result in scalar operand 0. Operand 2 has vector mode m while operand 0 has the mode appropriate for one element of m. Operand 1 has the usual mask mode for vectors of mode m; see TARGET_VECTORIZE_GET_MASK_MODE.

fold_extract_last_m ¶

If any bits of mask operand 2 are set, find the last set bit, extract the associated element from vector operand 3, and store the result in operand 0. Store operand 1 in operand 0 otherwise. Operand 3 has mode m and operands 0 and 1 have the mode appropriate for one element of m. Operand 2 has the usual mask mode for vectors of mode m; see TARGET_VECTORIZE_GET_MASK_MODE.

len_fold_extract_last_m ¶

Like ‘fold_extract_last_m’, but takes an extra length operand as operand 4 and an extra bias operand as operand 5. The last associated element is extracted should have the index i < len (operand 4) + bias (operand 5).

fold_left_plus_m ¶

Take scalar operand 1 and successively add each element from vector operand 2. Store the result in scalar operand 0. The vector has mode m and the scalars have the mode appropriate for one element of m. The operation is strictly in-order: there is no reassociation.

mask_fold_left_plus_m ¶

Like ‘fold_left_plus_m’, but takes an additional mask operand (operand 3) that specifies which elements of the source vector should be added.

mask_len_fold_left_plus_m ¶

Like ‘fold_left_plus_m’, but takes an additional mask operand (operand 3), len operand (operand 4) and bias operand (operand 5) that performs following operations strictly in-order (no reassociation):

operand0 = operand1;
for (i = 0; i < LEN + BIAS; i++)
  if (operand3[i])
    operand0 += operand2[i];

‘sdot_prodmn’ ¶

Multiply operand 1 by operand 2 without loss of precision, given that both operands contain signed elements. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

sdot<signed op0, signed op1, signed op2, signed op3> ==
   op0 = sign-ext (op1) * sign-ext (op2) + op3
...

‘udot_prodmn’ ¶

Multiply operand 1 by operand 2 without loss of precision, given that both operands contain unsigned elements. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> ==
   op0 = zero-ext (op1) * zero-ext (op2) + op3
...

‘usdot_prodmn’ ¶

Compute the sum of the products of elements of different signs. Multiply operand 1 by operand 2 without loss of precision, given that operand 1 is unsigned and operand 2 is signed. Add each product to the overlapping element of operand 3 and store the result in operand 0. Operands 0 and 3 have mode m and operands 1 and 2 have mode n, with n having narrower elements than m.

Semantically the expressions perform the multiplication in the following signs

usdot<signed op0, unsigned op1, signed op2, signed op3> ==
   op0 = ((signed-conv) zero-ext (op1)) * sign-ext (op2) + op3
...

‘vec_shl_insert_m’ ¶

Shift the elements in vector input operand 1 left one element (i.e. away from element 0) and fill the vacated element 0 with the scalar in operand 2. Store the result in vector output operand 0. Operands 0 and 1 have mode m and operand 2 has the mode appropriate for one element of m.

‘vec_shl_m’ ¶

Whole vector left shift in bits, i.e. away from element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes.

‘vec_shr_m’ ¶

Whole vector right shift in bits, i.e. towards element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes.

‘vec_pack_trunc_m’ ¶

Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral or floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated after narrowing them down using truncation.

‘vec_pack_sbool_trunc_m’ ¶

Narrow and merge the elements of two vectors. Operands 1 and 2 are vectors of the same type having N boolean elements. Operand 0 is the resulting vector in which 2*N elements are concatenated. The last operand (operand 3) is the number of elements in the output vector 2*N as a CONST_INT. This instruction pattern is used when all the vector input and output operands have the same scalar mode m and thus using vec_pack_trunc_m would be ambiguous.

‘vec_pack_ssat_m’, ‘vec_pack_usat_m’ ¶

Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral elements of size S. Operand 0 is the resulting vector in which the elements of the two input vectors are concatenated after narrowing them down using signed/unsigned saturating arithmetic.

‘vec_pack_sfix_trunc_m’, ‘vec_pack_ufix_trunc_m’ ¶

Narrow, convert to signed/unsigned integral type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated.

‘vec_packs_float_m’, ‘vec_packu_float_m’ ¶

Narrow, convert to floating point type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N signed/unsigned integral elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated.

‘vec_unpacks_hi_m’, ‘vec_unpacks_lo_m’ ¶

Extract and widen (promote) the high/low part of a vector of signed integral or floating point elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using signed or floating point extension and place the resulting N/2 values of size 2*S in the output vector (operand 0).

‘vec_unpacku_hi_m’, ‘vec_unpacku_lo_m’ ¶

Extract and widen (promote) the high/low part of a vector of unsigned integral elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using zero extension and place the resulting N/2 values of size 2*S in the output vector (operand 0).

‘vec_unpacks_sbool_hi_m’, ‘vec_unpacks_sbool_lo_m’ ¶

Extract the high/low part of a vector of boolean elements that have scalar mode m. The input vector (operand 1) has N elements, the output vector (operand 0) has N/2 elements. The last operand (operand 2) is the number of elements of the input vector N as a CONST_INT. These patterns are used if both the input and output vectors have the same scalar mode m and thus using vec_unpacks_hi_m or vec_unpacks_lo_m would be ambiguous.

‘vec_unpacks_float_hi_m’, ‘vec_unpacks_float_lo_m’ ¶

‘vec_unpacku_float_hi_m’, ‘vec_unpacku_float_lo_m’

Extract, convert to floating point type and widen the high/low part of a vector of signed/unsigned integral elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector using floating point conversion and place the resulting N/2 values of size 2*S in the output vector (operand 0).

‘vec_unpack_sfix_trunc_hi_m’, ¶

‘vec_unpack_sfix_trunc_lo_m’

‘vec_unpack_ufix_trunc_hi_m’

‘vec_unpack_ufix_trunc_lo_m’

Extract, convert to signed/unsigned integer type and widen the high/low part of a vector of floating point elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector to integers and place the resulting N/2 values of size 2*S in the output vector (operand 0).

‘vec_widen_umult_hi_m’, ‘vec_widen_umult_lo_m’ ¶

‘vec_widen_smult_hi_m’, ‘vec_widen_smult_lo_m’

‘vec_widen_umult_even_m’, ‘vec_widen_umult_odd_m’

‘vec_widen_smult_even_m’, ‘vec_widen_smult_odd_m’

Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) are vectors with N signed/unsigned elements of size S. Multiply the high/low or even/odd elements of the two vectors, and put the N/2 products of size 2*S in the output vector (operand 0). A target shouldn’t implement even/odd pattern pair if it is less efficient than lo/hi one.

‘vec_widen_ushiftl_hi_m’, ‘vec_widen_ushiftl_lo_m’ ¶

‘vec_widen_sshiftl_hi_m’, ‘vec_widen_sshiftl_lo_m’

Signed/Unsigned widening shift left. The first input (operand 1) is a vector with N signed/unsigned elements of size S. Operand 2 is a constant. Shift the high/low elements of operand 1, and put the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_uaddl_hi_m’, ‘vec_widen_uaddl_lo_m’ ¶

‘vec_widen_saddl_hi_m’, ‘vec_widen_saddl_lo_m’

Signed/Unsigned widening add long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Add the high/low elements of 1 and 2 together, widen the resulting elements and put the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_usubl_hi_m’, ‘vec_widen_usubl_lo_m’ ¶

‘vec_widen_ssubl_hi_m’, ‘vec_widen_ssubl_lo_m’

Signed/Unsigned widening subtract long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Subtract the high/low elements of 2 from 1 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_uabd_hi_m’, ‘vec_widen_uabd_lo_m’ ¶

‘vec_widen_uabd_odd_m’, ‘vec_widen_uabd_even_m’

‘vec_widen_sabd_hi_m’, ‘vec_widen_sabd_lo_m’

‘vec_widen_sabd_odd_m’, ‘vec_widen_sabd_even_m’

Signed/Unsigned widening absolute difference. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Find the absolute difference between operands 1 and 2 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0).

‘vec_trunc_add_highm’ ¶

Signed or unsigned addition of two input integer vectors of mode m, then extracts the most significant half of each result element and narrows it to elements of half the original width.

Concretely, it computes: (bits(a)/2)((a + b) >> bits(a)/2)

where bits(a) is the width in bits of each input element.

Operand 1 and 2 are of integer vector mode m containing the same number of signed or unsigned integral elements. The result (operand 0) is of an integer vector mode with the same number of elements but elements of half of the width of those of mode m.

This operation currently only used for early break result compression when the result of a vector boolean can be represented as 0 or -1.

‘vec_addsubm3’ ¶

Alternating subtract, add with even lanes doing subtract and odd lanes doing addition. Operands 1 and 2 and the outout operand are vectors with mode m.

‘vec_fmaddsubm4’ ¶

Alternating multiply subtract, add with even lanes doing subtract and odd lanes doing addition of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode m.

‘vec_fmsubaddm4’ ¶

Alternating multiply add, subtract with even lanes doing addition and odd lanes doing subtraction of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode m.

These instructions are not allowed to FAIL.

‘cadd90m3’ ¶

Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as

  for (int i = 0; i < N; i += 2)
    {
      c[i] = a[i] - b[i+1];
      c[i+1] = a[i+1] + b[i];
    }

This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 90 degrees around the argand plane and storing the result in operand 0.

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cadd270m3’ ¶

Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as

  for (int i = 0; i < N; i += 2)
    {
      c[i] = a[i] + b[i+1];
      c[i+1] = a[i+1] - b[i];
    }

This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 270 degrees around the argand plane and storing the result in operand 0.

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmlam4’ ¶

Perform a vector multiply and accumulate that is semantically the same as a multiply and accumulate of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i] + op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmla_conjm4’ ¶

Perform a vector multiply by conjugate and accumulate that is semantically the same as a multiply and accumulate of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]) + op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmlsm4’ ¶

Perform a vector multiply and subtract that is semantically the same as a multiply and subtract of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i] - op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmls_conjm4’ ¶

Perform a vector multiply by conjugate and subtract that is semantically the same as a multiply and subtract of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  complex TYPE op3[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]) - op3[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmulm4’ ¶

Perform a vector multiply that is semantically the same as multiply of complex numbers.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * op2[i];
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cmul_conjm4’ ¶

Perform a vector multiply by conjugate that is semantically the same as a multiply of complex numbers where the second multiply arguments is conjugated.

  complex TYPE op0[N];
  complex TYPE op1[N];
  complex TYPE op2[N];
  for (int i = 0; i < N; i += 1)
    {
      op0[i] = op1[i] * conj (op2[i]);
    }

In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes.

The operation is only supported for vector modes m.

This pattern is not allowed to FAIL.

‘cond_negmode’ ¶

‘cond_one_cmplmode’

‘cond_sqrtmode’

‘cond_ceilmode’

‘cond_floormode’

‘cond_roundmode’

‘cond_rintmode’

When operand 1 is true, perform an operation on operands 2 and store the result in operand 0, otherwise store operand 3 in operand 0. The operation works elementwise if the operands are vectors.

The scalar case is equivalent to:

op0 = op1 ? op op2 : op3;

while the vector case is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? op op2[i] : op3[i];

where, for example, op is ~ for ‘cond_one_cmplmode’.

When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, and 3 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE.

‘cond_opmode’ generally corresponds to a conditional form of ‘opmode2’.

‘cond_addmode’ ¶

‘cond_submode’

‘cond_mulmode’

‘cond_divmode’

‘cond_udivmode’

‘cond_modmode’

‘cond_umodmode’

‘cond_andmode’

‘cond_iormode’

‘cond_xormode’

‘cond_sminmode’

‘cond_smaxmode’

‘cond_uminmode’

‘cond_umaxmode’

‘cond_copysignmode’

‘cond_fminmode’

‘cond_fmaxmode’

‘cond_ashlmode’

‘cond_ashrmode’

‘cond_lshrmode’

When operand 1 is true, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation works elementwise if the operands are vectors.

The scalar case is equivalent to:

op0 = op1 ? op2 op op3 : op4;

while the vector case is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? op2[i] op op3[i] : op4[i];

where, for example, op is + for ‘cond_addmode’.

When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE.

‘cond_opmode’ generally corresponds to a conditional form of ‘opmode3’. As an exception, the vector forms of shifts correspond to patterns like vashlmode3 rather than patterns like ashlmode3.

‘cond_copysignmode’ is only defined for floating point modes.

‘cond_fmamode’ ¶

‘cond_fmsmode’

‘cond_fnmamode’

‘cond_fnmsmode’

Like ‘cond_addm’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_fmamode’ is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];

‘cond_len_negmode’ ¶

‘cond_len_one_cmplmode’

‘cond_len_sqrtmode’

‘cond_len_ceilmode’

‘cond_len_floormode’

‘cond_len_roundmode’

‘cond_len_rintmode’

When operand 1 is true and element index < operand 4 + operand 5, perform an operation on operands 1 and store the result in operand 0, otherwise store operand 2 in operand 0. The operation only works for the operands are vectors.

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[4] + ops[5] && op1[i]
            ? op op2[i]
            : op3[i]);

where, for example, op is ~ for ‘cond_len_one_cmplmode’.

When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, and 3 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE. Operand 4 has whichever integer mode the target prefers.

‘cond_len_opmode’ generally corresponds to a conditional form of ‘opmode2’.

‘cond_len_addmode’ ¶

‘cond_len_submode’

‘cond_len_mulmode’

‘cond_len_divmode’

‘cond_len_udivmode’

‘cond_len_modmode’

‘cond_len_umodmode’

‘cond_len_andmode’

‘cond_len_iormode’

‘cond_len_xormode’

‘cond_len_sminmode’

‘cond_len_smaxmode’

‘cond_len_uminmode’

‘cond_len_umaxmode’

‘cond_len_copysignmode’

‘cond_len_fminmode’

‘cond_len_fmaxmode’

‘cond_len_ashlmode’

‘cond_len_ashrmode’

‘cond_len_lshrmode’

When operand 1 is true and element index < operand 5 + operand 6, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation only works for the operands are vectors.

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[5] + ops[6] && op1[i]
            ? op2[i] op op3[i]
            : op4[i]);

where, for example, op is + for ‘cond_len_addmode’.

When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition.

Operands 0, 2, 3 and 4 all have mode m. Operand 1 is a scalar integer if m is scalar, otherwise it has the mode returned by TARGET_VECTORIZE_GET_MASK_MODE. Operand 5 has whichever integer mode the target prefers.

‘cond_len_opmode’ generally corresponds to a conditional form of ‘opmode3’. As an exception, the vector forms of shifts correspond to patterns like vashlmode3 rather than patterns like ashlmode3.

‘cond_len_copysignmode’ is only defined for floating point modes.

‘cond_len_fmamode’ ¶

‘cond_len_fmsmode’

‘cond_len_fnmamode’

‘cond_len_fnmsmode’

Like ‘cond_len_addm’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_len_fmamode’ is equivalent to:

for (i = 0; i < GET_MODE_NUNITS (m); i++)
  op0[i] = (i < ops[6] + ops[7] && op1[i]
            ? fma (op2[i], op3[i], op4[i])
            : op5[i]);

‘cbranchmode4’ ¶

Conditional branch instruction combined with a compare instruction. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to. For vectors this optab is only used for comparisons of VECTOR_BOOLEAN_TYPE_P values and it never called for data-registers. Data vector operands should use one of the patterns below instead.

‘vec_cbranch_anymode’ ¶

Conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to.

‘vec_cbranch_allmode’ ¶

Conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the code_label to jump to.

‘cond_vec_cbranch_anymode’ ¶

Masked conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 5 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

‘cond_vec_cbranch_allmode’ ¶

Masked conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 5 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

‘cond_len_vec_cbranch_anymode’ ¶

Len based conditional branch instruction based on a vector compare that branches when at least one of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 4 is the len operand and Operand 5 is the bias operand. Operand 6 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.

‘cond_len_vec_cbranch_allmode’ ¶

Len based conditional branch instruction based on a vector compare that branches when all of the elementwise comparisons of the two input vectors is true. Operand 0 is a comparison operator. Operand 1 is the mask operand. Operand 2 and operand 3 are the first and second operands of the comparison, respectively. Operand 4 is the len operand and Operand 5 is the bias operand. Operand 6 is the code_label to jump to. Inactive lanes in the mask operand should not influence the decision to branch.