| Age | Commit message (Collapse) | Author |
|
VPSHUFBITQMB intrinsics to be used in constexpr (#168100)
Resolves #161337
|
|
VPERMILPD/S variable mask intrinsics to be used in constexpr (#168861)
Allowing VPERMILPD/S intrinsics to be used in constexpr
Closes #167878
|
|
evaluation (#168206)
Fixes #167681
|
|
AVX512 mask predicate intrinsics to be used in constexpr (#165054)
Enables constexpr evaluation for the following AVX512 Instrinsics:
```
_mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask
_mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask
_mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask
_mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask
```
Part of #162072
|
|
Resolves #166529
|
|
The option -falloc-token-max=0 is supposed to be usable to override
previous settings back to the target default max tokens (SIZE_MAX).
This did not work for the builtin:
```
| executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0
| clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed.
```
Fix it by also picking the default if "0" is passed.
Improve the documentation to be clearer what the value of "0" means.
|
|
constexpr (#162816)
This PR just resolves ss/sd part of AVX512 masked arithmetic intrinsics of #160559.
|
|
Recent commits (7fe069121b57a, 53ddeb493529a) marked several x86
intrinsics as constexpr in headers without providing the necessary
constant evaluation support in the compiler backend. This caused
compilation failures when attempting to use these intrinsics in constant
expressions.
Resolves #166814
Resolves #161203
|
|
Resolves #166976
|
|
The pointer needs to point to a record.
Fixes https://github.com/llvm/llvm-project/issues/166371
|
|
Add a new builtin function __builtin_bswapg. It works on any integral
types that has a multiple of 16 bits as well as a single byte.
Closes #160266
|
|
Resolves #167476
|
|
permute shuffles. (#167236)
This patch extends `interp__builtin_ia32_shuffle_generic` and `evalShuffleGeneric` to handle both 2-argument and 3-argument patterns, replacing specialized shuffle functions with the unified handler.
Resolves #166342
|
|
PALIGNR byte shift intrinsics to be used in constexpr (#162005)
Fixes #160509
|
|
AVX512 KTEST/KORTEST intrinsics to be used in constexpr (#166103)
Add AVX512 KTEST/KORTEST intrinsics to be used in constexpr.
Fixes #162051
|
|
This patch enables compile-time evaluation of AVX512 permutex2var
intrinsics in constexpr contexts.
Extend shuffle generic to handle both integer immediate and vector mask
operands.
Resolves #161335
|
|
insertps intrinsic to be used in constexp (#165513)
Resolves #165161
|
|
Fixes https://github.com/llvm/llvm-project/issues/165372
|
|
Evaluation (#164026)
Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics:
```
_mm_cmp_epi8_mask _mm_cmp_epu8_mask
_mm_cmp_epi16_mask _mm_cmp_epu16_mask
_mm_cmp_epi32_mask _mm_cmp_epu32_mask
_mm_cmp_epi64_mask _mm_cmp_epu64_mask
_mm256_cmp_epi8_mask _mm256_cmp_epu8_mask
_mm256_cmp_epi16_mask _mm256_cmp_epu16_mask
_mm256_cmp_epi32_mask _mm256_cmp_epu32_mask
_mm256_cmp_epi64_mask _mm256_cmp_epu64_mask
_mm512_cmp_epi8_mask _mm512_cmp_epu8_mask
_mm512_cmp_epi16_mask _mm512_cmp_epu16_mask
_mm512_cmp_epi32_mask _mm512_cmp_epu32_mask
_mm512_cmp_epi64_mask _mm512_cmp_epu64_mask
```
Part 1 of #162054
|
|
We can't save the result in a non-block pointer.
Fixes https://github.com/llvm/llvm-project/issues/165076
|
|
We can't read from non-block pointers anyway.
Fixes https://github.com/llvm/llvm-project/issues/165061
|
|
shufps/pd shuffles intrinsics to be used in constexpr (#164078)
Resolves #161208
|
|
constexpr (#164166)
Support constexpr usage for SLLDQ/SRLDQ byte shift intrinsics
This draft PR adds support for using the following SRLDQ intrinsics in
constant expressions:
- _mm_srli_si128
- _mm256_srli_si256
- _mm_slli_si128
- _mm256_slli_si256
Relevant tests are included.
Fixes #156494
|
|
(#163639)
Implement the constexpr evaluation for `__builtin_infer_alloc_token()`
in Clang's constant expression evaluators (both in ExprConstant and the
new bytecode interpreter).
The constant evaluation is only supported for stateless (hash-based)
token modes. If a stateful mode like `increment` is used, the evaluation
fails, as the token value is not deterministic at compile time.
|
|
MMX/SSE/AVX2 PSIGN intrinsics to be used in constexpr (#163685)
Fix #155812
|
|
interp__builtin_elementwise_int_binop callbacks (#164679)
Related to the discussion in #162346, this PR is to remove the trailing type from the 'interp__builtin_elementwise_int_binop' callbacks.
|
|
Get the zero-extended truncated desired value in that case. Add one RUN
line to the constexpr-string.cpp test case, to not increase the runtime
of that test too much.
|
|
This drastically reduces the preprocessed size of Context.cpp and
InterpBuiltin.cpp.
|
|
AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 (#162836)
**This PR supersedes and replaces PR #158853**
The original branch diverged too far from the main branch, resulting in
significant merge conflicts that were difficult to resolve cleanly. To
provide a clean and reviewable history, this new PR was created by
cherry-picking the necessary commits onto a fresh branch based on the
latest `main`.
---
*(Original Description)*
This patch enables the use of AVX/AVX512 subvector extraction intrinsics
within `constexpr` functions. This is achieved by implementing the
evaluation logic for these intrinsics in
`VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`.
The original discussion and review comments can be found in the previous
pull request for context: #158853
Fixes #157712
|
|
(#161914)
Fix #154520
|
|
phminposuw intrinsic to be used in constexp (#163041)
Fix #161336
|
|
MMX/SSE/AVX/AVX512 PMULHRSW intrinsics to be used in constexpr (#160636)
This PR resolves #155805 and updates the following builtins to handle
constant expressions:
```
_mm_mulhrs_pi16
mm_mulhrs_epi16 mm256_mulhrs_epi16 mm512_mulhrs_epi16
```
|
|
interp__builtin_elementwise_int_unaryop callbacks (#163905)
Regarding the discussion in #162346, this PR is to remove the trailing type from the 'interp__builtin_elementwise_int_unaryop' callbacks.
|
|
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]],
introduced as part of C++17.
|
|
conflict intrinsics to be used in constexpr (#163293)
Resolves #160524
|
|
(#163148)
The PSHUFB instruction shuffles bytes within each 128-bit lane: for each
control byte, if bit 7 is set, the output byte is zeroed; otherwise, the
low 4 bits select a source byte (0–15) from the same lane.
Note: _mm_shuffle_pi8 function had to change as __anyext128 had negative
indices which are invalid in constant expression context.
Fixes #156612
|
|
AVX/AVX512 IFMA madd52 intrinsics to be used in constexpr (#161056)
Resolves #160498
|
|
constexpr (#156822)
[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - add MMX/SSE/AVX PHADD/SUB & HADDPS/D intrinsics to be used in constexpr
Fixes #155395
cover func:
_mm_hadd_pi16 _mm_hadd_epi16 _mm256_hadd_epi16
_mm_hadd_pi32 _mm_hadd_epi32 _mm256_hadd_epi32
_mm_hadds_pi16 _mm_hadds_epi16 _mm256_hadds_epi16
_mm_hsub_pi16 _mm_hsub_epi16 _mm256_hsub_epi16
_mm_hsub_pi32 _mm_hsub_epi32 _mm256_hsub_epi32
_mm_hsubs_pi16 _mm_hsubs_epi16 _mm256_hsubs_epi16
_mm_hadd_pd _mm256_hadd_pd
_mm_hadd_ps _mm256_hadd_ps
_mm_hsub_pd _mm256_hsub_pd
_mm_hsub_ps _mm256_hsub_ps
---------
Co-authored-by: whyuuwang <whyuuwang@tencent.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Co-authored-by: Simon Pilgrim <git@redking.me.uk>
|
|
SSE/AVX VPTEST/VTESTPD/VTESTPS intrinsics to be used in constexpr (#160428)
Fix #158653
Add handling for:
```
ptestz128 / ptestz256 → (a & b) == 0.
ptestc128 / ptestc256 → (~a & b) == 0
ptestnzc128 / ptestnzc256 → (a & b) != 0 AND (~a & b) != 0.
vtestzps / vtestzps256 → (S(a) & S(b)) == 0.
vtestcps / vtestcps256 → (~S(a) & S(b)) == 0.
vtestnzcps / vtestnzcps256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0.
vtestzpd / vtestzpd256 → (S(a) & S(b)) == 0.
vtestcpd / vtestcpd256 → (~S(a) & S(b)) == 0.
vtestnzcpd / vtestnzcpd256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0.
```
Add corresponding test cases for:
```
int _mm_test_all_ones (__m128i a)
int _mm_test_all_zeros (__m128i mask, __m128i a)
int _mm_test_mix_ones_zeros (__m128i mask, __m128i a)
int _mm_testc_pd (__m128d a, __m128d b)
int _mm256_testc_pd (__m256d a, __m256d b)
int _mm_testc_ps (__m128 a, __m128 b)
int _mm256_testc_ps (__m256 a, __m256 b)
int _mm_testc_si128 (__m128i a, __m128i b)
int _mm256_testc_si256 (__m256i a, __m256i b)
int _mm_testnzc_pd (__m128d a, __m128d b)
int _mm256_testnzc_pd (__m256d a, __m256d b)
int _mm_testnzc_ps (__m128 a, __m128 b)
int _mm256_testnzc_ps (__m256 a, __m256 b)
int _mm_testnzc_si128 (__m128i a, __m128i b)
int _mm256_testnzc_si256 (__m256i a, __m256i b)
int _mm_testz_pd (__m128d a, __m128d b)
int _mm256_testz_pd (__m256d a, __m256d b)
int _mm_testz_ps (__m128 a, __m128 b)
int _mm256_testz_ps (__m256 a, __m256 b)
int _mm_testz_si128 (__m128i a, __m128i b)
int _mm256_testz_si256 (__m256i a, __m256i b)
```
|
|
PrimType variables end in T, not PT. Remove const from local primitive
variables.
|
|
interp__builtin_ia32_pmul/interp__builtin_ia32_pmadd implementations (#162504)
The interp__builtin_ia32_pmadd implementation can be correctly used for
PMULDQ/PMULUDQ evaluation as well as we're ignoring the "hi" integers in
each pair
I've replaced the PMULDQ/PMULUDQ evaluation with callbacks and renamed
interp__builtin_ia32_pmadd to interp__builtin_ia32_pmul for consistency
|
|
PMADDWD/PMADDUBSW intrinsics (#161563)
This PR updates the PMADDWD/PMADDUBSW builtins to support constant
expression handling, by extending the VectorExprEvaluator::VisitCallExpr
that handles interp__builtin_ia32_pmadd builtins.
Closes #155392
|
|
with static bool interp__builtin_elementwise_int_unaryop callback (#162346)
Fixes #160288
|
|
The previous `ByteOffset` computation only makes sense if `Ptr` points
into an array.
|
|
VPTERNLOGD/VPTERNLOGQ intrinsics to be used in constexpr (#158703)
Fix #157698
Add handling for `__builtin_ia32_pternlog[d/q][128/256/512]_mask[z]` intrinsics to `VectorExprEvaluator::VisitCallExpr` and `InterpBuiltin.cpp` with the corresponding test coverage:
```
_mm_mask_ternarylogic_epi32
_mm_maskz_ternarylogic_epi32
_mm_ternarylogic_epi32
_mm256_mask_ternarylogic_epi32
_mm256_maskz_ternarylogic_epi32
_mm256_ternarylogic_epi32
_mm512_mask_ternarylogic_epi32
_mm512_maskz_ternarylogic_epi32
_mm512_ternarylogic_epi32
_mm_mask_ternarylogic_epi64
_mm_maskz_ternarylogic_epi64
_mm_ternarylogic_epi64
_mm256_mask_ternarylogic_epi64
_mm256_maskz_ternarylogic_epi64
_mm256_ternarylogic_epi64
_mm512_mask_ternarylogic_epi64
_mm512_maskz_ternarylogic_epi64
_mm512_ternarylogic_epi64
```
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
interp__builtin_elementwise_int_binop callback #160289 (#161924)
Fixes #160289
|
|
reuse of PSHUFD/LW/HW mask decode. NFC (#162006)
Removes need to offset PSHUFHW land index to extract the shuffle mask element.
|
|
The i16/i32 shuffle intrinsics (`pshufw`, `pshuflw`, `pshufhw`,
`pshufd`) currently cannot be used in constant expressions. This patch
adds support in both bytecode interpreter (InterpBuiltin.cpp) and
constant evaluator
(ExprConstant.cpp) for pshuf intrinsics, enabling their use in constant
expressions.
## Intrinsics covered
- `_mm_shuffle_pi16` (MMX `pshufw`)
- `_mm_shufflelo_epi16` / `_mm_shufflehi_epi16`
- `_mm_shuffle_epi32`
- Their AVX2/AVX512 vector-width variants
- Masked and maskz forms (handled indirectly via
`__builtin_ia32_select*`)
Fixes #156611
|
|
element extraction/insertion intrinsics to be used in constexpr #159753 (#161302)
FIXES: #159753
Enable constexpr evaluation for X86 vector element extract/insert builtins. and adds corresponding tests
Index is masked with `(Idx & (NumElts - 1))`, matching existing CodeGen.
|
|
interp__builtin_elementwise_int_binop (#160362)
Fixes #160281
|