diff options
| author | Mingming Liu <mingmingl@google.com> | 2025-09-10 15:25:31 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-09-10 15:25:31 -0700 |
| commit | 1417dafa1db9cb1b2b09438aa9f53ea5ab6e36e2 (patch) | |
| tree | 57f4b1f313c8cf74eed8819870f39c36ea263c68 /clang/docs/LanguageExtensions.rst | |
| parent | 898b813bc8a6d0276bf0f4769f5f2f64b34e632d (diff) | |
| parent | b8cefcb601ddaa18482555c4ff363c01a270c2fe (diff) | |
Merge branch 'main' into users/mingmingl-llvm/samplefdo-profile-formatusers/mingmingl-llvm/samplefdo-profile-format
Diffstat (limited to 'clang/docs/LanguageExtensions.rst')
| -rw-r--r-- | clang/docs/LanguageExtensions.rst | 61 |
1 files changed, 53 insertions, 8 deletions
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 3c6c97bb1fa1..ad190eace5b0 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -875,12 +875,14 @@ of different sizes and signs is forbidden in binary and ternary builtins. for the comparison. T __builtin_elementwise_fshl(T x, T y, T z) perform a funnel shift left. Concatenate x and y (x is the most integer types significant bits of the wide value), the combined value is shifted - left by z, and the most significant bits are extracted to produce + left by z (modulo the bit width of the original arguments), + and the most significant bits are extracted to produce a result that is the same size as the original arguments. T __builtin_elementwise_fshr(T x, T y, T z) perform a funnel shift right. Concatenate x and y (x is the most integer types significant bits of the wide value), the combined value is shifted - right by z, and the least significant bits are extracted to produce + right by z (modulo the bit width of the original arguments), + and the least significant bits are extracted to produce a result that is the same size as the original arguments. T __builtin_elementwise_ctlz(T x[, T y]) return the number of leading 0 bits in the first argument. If integer types the first argument is 0 and an optional second argument is provided, @@ -946,7 +948,14 @@ Let ``VT`` be a vector type and ``ET`` the element type of ``VT``. Each builtin accesses memory according to a provided boolean mask. These are provided as ``__builtin_masked_load`` and ``__builtin_masked_store``. The first -argument is always boolean mask vector. +argument is always boolean mask vector. The ``__builtin_masked_load`` builtin +takes an optional third vector argument that will be used for the result of the +masked-off lanes. These builtins assume the memory is always aligned. + +The ``__builtin_masked_expand_load`` and ``__builtin_masked_compress_store`` +builtins have the same interface but store the result in consecutive indices. +Effectively this performs the ``if (mask[i]) val[i] = ptr[j++]`` and ``if +(mask[i]) ptr[j++] = val[i]`` pattern respectively. Example: @@ -955,9 +964,19 @@ Example: using v8b = bool [[clang::ext_vector_type(8)]]; using v8i = int [[clang::ext_vector_type(8)]]; - v8i load(v8b m, v8i *p) { return __builtin_masked_load(m, p); } - - void store(v8b m, v8i v, v8i *p) { __builtin_masked_store(m, v, p); } + v8i load(v8b mask, v8i *ptr) { return __builtin_masked_load(mask, ptr); } + + v8i load_expand(v8b mask, v8i *ptr) { + return __builtin_masked_expand_load(mask, ptr); + } + + void store(v8b mask, v8i val, v8i *ptr) { + __builtin_masked_store(mask, val, ptr); + } + + void store_compress(v8b mask, v8i val, v8i *ptr) { + __builtin_masked_compress_store(mask, val, ptr); + } Matrix Types @@ -2032,6 +2051,9 @@ The following type trait primitives are supported by Clang. Those traits marked Returns true if a reference ``T`` can be copy-initialized from a temporary of type a non-cv-qualified ``U``. * ``__underlying_type`` (C++, GNU, Microsoft) +* ``__builtin_lt_synthesises_from_spaceship``, ``__builtin_gt_synthesises_from_spaceship``, + ``__builtin_le_synthesises_from_spaceship``, ``__builtin_ge_synthesises_from_spaceship`` (Clang): + These builtins can be used to determine whether the corresponding operator is synthesised from a spaceship operator. In addition, the following expression traits are supported: @@ -4182,7 +4204,7 @@ builtin, the mangler emits their usual pattern without any special treatment. ----------------------- ``__builtin_popcountg`` returns the number of 1 bits in the argument. The -argument can be of any unsigned integer type. +argument can be of any unsigned integer type or fixed boolean vector. **Syntax**: @@ -4214,7 +4236,13 @@ such as ``unsigned __int128`` and C23 ``unsigned _BitInt(N)``. ``__builtin_clzg`` (respectively ``__builtin_ctzg``) returns the number of leading (respectively trailing) 0 bits in the first argument. The first argument -can be of any unsigned integer type. +can be of any unsigned integer type or fixed boolean vector. + +For boolean vectors, these builtins interpret the vector like a bit-field where +the ith element of the vector is bit i of the bit-field, counting from the +least significant end. ``__builtin_clzg`` returns the number of zero elements at +the end of the vector, while ``__builtin_ctzg`` returns the number of zero +elements at the start of the vector. If the first argument is 0 and an optional second argument of ``int`` type is provided, then the second argument is returned. If the first argument is 0, but @@ -5154,6 +5182,23 @@ If no address spaces names are provided, all address spaces are fenced. __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local") __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global") +__builtin_amdgcn_ballot_w{32,64} +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``__builtin_amdgcn_ballot_w{32,64}`` returns a bitmask that contains its +boolean argument as a bit for every lane of the current wave that is currently +active (i.e., that is converged with the executing thread), and a 0 bit for +every lane that is not active. + +The result is uniform, i.e. it is the same in every active thread of the wave. + +__builtin_amdgcn_inverse_ballot_w{32,64} +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Given a wave-uniform bitmask, ``__builtin_amdgcn_inverse_ballot_w{32,64}(mask)`` +returns the bit at the position of the current lane. It is almost equivalent to +``(mask & (1 << lane_id)) != 0``, except that its behavior is only defined if +the given mask has the same value for all active lanes of the current wave. ARM/AArch64 Language Extensions ------------------------------- |
