summaryrefslogtreecommitdiff
path: root/clang/docs/LanguageExtensions.rst
diff options
context:
space:
mode:
authorMingming Liu <mingmingl@google.com>2025-09-10 15:25:31 -0700
committerGitHub <noreply@github.com>2025-09-10 15:25:31 -0700
commit1417dafa1db9cb1b2b09438aa9f53ea5ab6e36e2 (patch)
tree57f4b1f313c8cf74eed8819870f39c36ea263c68 /clang/docs/LanguageExtensions.rst
parent898b813bc8a6d0276bf0f4769f5f2f64b34e632d (diff)
parentb8cefcb601ddaa18482555c4ff363c01a270c2fe (diff)
Merge branch 'main' into users/mingmingl-llvm/samplefdo-profile-formatusers/mingmingl-llvm/samplefdo-profile-format
Diffstat (limited to 'clang/docs/LanguageExtensions.rst')
-rw-r--r--clang/docs/LanguageExtensions.rst61
1 files changed, 53 insertions, 8 deletions
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 3c6c97bb1fa1..ad190eace5b0 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -875,12 +875,14 @@ of different sizes and signs is forbidden in binary and ternary builtins.
for the comparison.
T __builtin_elementwise_fshl(T x, T y, T z) perform a funnel shift left. Concatenate x and y (x is the most integer types
significant bits of the wide value), the combined value is shifted
- left by z, and the most significant bits are extracted to produce
+ left by z (modulo the bit width of the original arguments),
+ and the most significant bits are extracted to produce
a result that is the same size as the original arguments.
T __builtin_elementwise_fshr(T x, T y, T z) perform a funnel shift right. Concatenate x and y (x is the most integer types
significant bits of the wide value), the combined value is shifted
- right by z, and the least significant bits are extracted to produce
+ right by z (modulo the bit width of the original arguments),
+ and the least significant bits are extracted to produce
a result that is the same size as the original arguments.
T __builtin_elementwise_ctlz(T x[, T y]) return the number of leading 0 bits in the first argument. If integer types
the first argument is 0 and an optional second argument is provided,
@@ -946,7 +948,14 @@ Let ``VT`` be a vector type and ``ET`` the element type of ``VT``.
Each builtin accesses memory according to a provided boolean mask. These are
provided as ``__builtin_masked_load`` and ``__builtin_masked_store``. The first
-argument is always boolean mask vector.
+argument is always boolean mask vector. The ``__builtin_masked_load`` builtin
+takes an optional third vector argument that will be used for the result of the
+masked-off lanes. These builtins assume the memory is always aligned.
+
+The ``__builtin_masked_expand_load`` and ``__builtin_masked_compress_store``
+builtins have the same interface but store the result in consecutive indices.
+Effectively this performs the ``if (mask[i]) val[i] = ptr[j++]`` and ``if
+(mask[i]) ptr[j++] = val[i]`` pattern respectively.
Example:
@@ -955,9 +964,19 @@ Example:
using v8b = bool [[clang::ext_vector_type(8)]];
using v8i = int [[clang::ext_vector_type(8)]];
- v8i load(v8b m, v8i *p) { return __builtin_masked_load(m, p); }
-
- void store(v8b m, v8i v, v8i *p) { __builtin_masked_store(m, v, p); }
+ v8i load(v8b mask, v8i *ptr) { return __builtin_masked_load(mask, ptr); }
+
+ v8i load_expand(v8b mask, v8i *ptr) {
+ return __builtin_masked_expand_load(mask, ptr);
+ }
+
+ void store(v8b mask, v8i val, v8i *ptr) {
+ __builtin_masked_store(mask, val, ptr);
+ }
+
+ void store_compress(v8b mask, v8i val, v8i *ptr) {
+ __builtin_masked_compress_store(mask, val, ptr);
+ }
Matrix Types
@@ -2032,6 +2051,9 @@ The following type trait primitives are supported by Clang. Those traits marked
Returns true if a reference ``T`` can be copy-initialized from a temporary of type
a non-cv-qualified ``U``.
* ``__underlying_type`` (C++, GNU, Microsoft)
+* ``__builtin_lt_synthesises_from_spaceship``, ``__builtin_gt_synthesises_from_spaceship``,
+ ``__builtin_le_synthesises_from_spaceship``, ``__builtin_ge_synthesises_from_spaceship`` (Clang):
+ These builtins can be used to determine whether the corresponding operator is synthesised from a spaceship operator.
In addition, the following expression traits are supported:
@@ -4182,7 +4204,7 @@ builtin, the mangler emits their usual pattern without any special treatment.
-----------------------
``__builtin_popcountg`` returns the number of 1 bits in the argument. The
-argument can be of any unsigned integer type.
+argument can be of any unsigned integer type or fixed boolean vector.
**Syntax**:
@@ -4214,7 +4236,13 @@ such as ``unsigned __int128`` and C23 ``unsigned _BitInt(N)``.
``__builtin_clzg`` (respectively ``__builtin_ctzg``) returns the number of
leading (respectively trailing) 0 bits in the first argument. The first argument
-can be of any unsigned integer type.
+can be of any unsigned integer type or fixed boolean vector.
+
+For boolean vectors, these builtins interpret the vector like a bit-field where
+the ith element of the vector is bit i of the bit-field, counting from the
+least significant end. ``__builtin_clzg`` returns the number of zero elements at
+the end of the vector, while ``__builtin_ctzg`` returns the number of zero
+elements at the start of the vector.
If the first argument is 0 and an optional second argument of ``int`` type is
provided, then the second argument is returned. If the first argument is 0, but
@@ -5154,6 +5182,23 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
+__builtin_amdgcn_ballot_w{32,64}
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__builtin_amdgcn_ballot_w{32,64}`` returns a bitmask that contains its
+boolean argument as a bit for every lane of the current wave that is currently
+active (i.e., that is converged with the executing thread), and a 0 bit for
+every lane that is not active.
+
+The result is uniform, i.e. it is the same in every active thread of the wave.
+
+__builtin_amdgcn_inverse_ballot_w{32,64}
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Given a wave-uniform bitmask, ``__builtin_amdgcn_inverse_ballot_w{32,64}(mask)``
+returns the bit at the position of the current lane. It is almost equivalent to
+``(mask & (1 << lane_id)) != 0``, except that its behavior is only defined if
+the given mask has the same value for all active lanes of the current wave.
ARM/AArch64 Language Extensions
-------------------------------