| Age | Commit message (Collapse) | Author |
|
VPSHUFBITQMB intrinsics to be used in constexpr (#168100)
Resolves #161337
|
|
VPERMILPD/S variable mask intrinsics to be used in constexpr (#168861)
Allowing VPERMILPD/S intrinsics to be used in constexpr
Closes #167878
|
|
evaluation (#168206)
Fixes #167681
|
|
AVX512 mask predicate intrinsics to be used in constexpr (#165054)
Enables constexpr evaluation for the following AVX512 Instrinsics:
```
_mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask
_mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask
_mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask
_mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask
```
Part of #162072
|
|
Resolves #166529
|
|
The option -falloc-token-max=0 is supposed to be usable to override
previous settings back to the target default max tokens (SIZE_MAX).
This did not work for the builtin:
```
| executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0
| clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed.
```
Fix it by also picking the default if "0" is passed.
Improve the documentation to be clearer what the value of "0" means.
|
|
constexpr (#162816)
This PR just resolves ss/sd part of AVX512 masked arithmetic intrinsics of #160559.
|
|
Recent commits (7fe069121b57a, 53ddeb493529a) marked several x86
intrinsics as constexpr in headers without providing the necessary
constant evaluation support in the compiler backend. This caused
compilation failures when attempting to use these intrinsics in constant
expressions.
Resolves #166814
Resolves #161203
|
|
Resolves #166976
|
|
Add a new builtin function __builtin_bswapg. It works on any integral
types that has a multiple of 16 bits as well as a single byte.
Closes #160266
|
|
Without this gcc warns like
../../clang/lib/AST/ExprConstant.cpp:4091:63: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
4091 | (SrcVal.isVector() && SrcVal.getVectorLength() == 1) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
4092 | "Not a valid HLSLAggregateSplatCast.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
Resolves #167476
|
|
permute shuffles. (#167236)
This patch extends `interp__builtin_ia32_shuffle_generic` and `evalShuffleGeneric` to handle both 2-argument and 3-argument patterns, replacing specialized shuffle functions with the unified handler.
Resolves #166342
|
|
PALIGNR byte shift intrinsics to be used in constexpr (#162005)
Fixes #160509
|
|
AVX512 KTEST/KORTEST intrinsics to be used in constexpr (#166103)
Add AVX512 KTEST/KORTEST intrinsics to be used in constexpr.
Fixes #162051
|
|
constant expression evaluator (#164700)
Add support to handle these casts in the constant expression evaluator.
- HLSLAggregateSplatCast
- HLSLElementwiseCast
- HLSLArrayRValue
Add tests
Closes #125766
Closes #125321
|
|
By rejecting them.
Fixes https://github.com/llvm/llvm-project/issues/165555
|
|
This patch enables compile-time evaluation of AVX512 permutex2var
intrinsics in constexpr contexts.
Extend shuffle generic to handle both integer immediate and vector mask
operands.
Resolves #161335
|
|
insertps intrinsic to be used in constexp (#165513)
Resolves #165161
|
|
Evaluation (#164026)
Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics:
```
_mm_cmp_epi8_mask _mm_cmp_epu8_mask
_mm_cmp_epi16_mask _mm_cmp_epu16_mask
_mm_cmp_epi32_mask _mm_cmp_epu32_mask
_mm_cmp_epi64_mask _mm_cmp_epu64_mask
_mm256_cmp_epi8_mask _mm256_cmp_epu8_mask
_mm256_cmp_epi16_mask _mm256_cmp_epu16_mask
_mm256_cmp_epi32_mask _mm256_cmp_epu32_mask
_mm256_cmp_epi64_mask _mm256_cmp_epu64_mask
_mm512_cmp_epi8_mask _mm512_cmp_epu8_mask
_mm512_cmp_epi16_mask _mm512_cmp_epu16_mask
_mm512_cmp_epi32_mask _mm512_cmp_epu32_mask
_mm512_cmp_epi64_mask _mm512_cmp_epu64_mask
```
Part 1 of #162054
|
|
shufps/pd shuffles intrinsics to be used in constexpr (#164078)
Resolves #161208
|
|
constexpr (#164166)
Support constexpr usage for SLLDQ/SRLDQ byte shift intrinsics
This draft PR adds support for using the following SRLDQ intrinsics in
constant expressions:
- _mm_srli_si128
- _mm256_srli_si256
- _mm_slli_si128
- _mm256_slli_si256
Relevant tests are included.
Fixes #156494
|
|
(#163639)
Implement the constexpr evaluation for `__builtin_infer_alloc_token()`
in Clang's constant expression evaluators (both in ExprConstant and the
new bytecode interpreter).
The constant evaluation is only supported for stateless (hash-based)
token modes. If a stateful mode like `increment` is used, the evaluation
fails, as the token value is not deterministic at compile time.
|
|
MMX/SSE/AVX2 PSIGN intrinsics to be used in constexpr (#163685)
Fix #155812
|
|
AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 (#162836)
**This PR supersedes and replaces PR #158853**
The original branch diverged too far from the main branch, resulting in
significant merge conflicts that were difficult to resolve cleanly. To
provide a clean and reviewable history, this new PR was created by
cherry-picking the necessary commits onto a fresh branch based on the
latest `main`.
---
*(Original Description)*
This patch enables the use of AVX/AVX512 subvector extraction intrinsics
within `constexpr` functions. This is achieved by implementing the
evaluation logic for these intrinsics in
`VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`.
The original discussion and review comments can be found in the previous
pull request for context: #158853
Fixes #157712
|
|
(#161914)
Fix #154520
|
|
phminposuw intrinsic to be used in constexp (#163041)
Fix #161336
|
|
MMX/SSE/AVX/AVX512 PMULHRSW intrinsics to be used in constexpr (#160636)
This PR resolves #155805 and updates the following builtins to handle
constant expressions:
```
_mm_mulhrs_pi16
mm_mulhrs_epi16 mm256_mulhrs_epi16 mm512_mulhrs_epi16
```
|
|
conflict intrinsics to be used in constexpr (#163293)
Resolves #160524
|
|
This rename was made as part of
https://github.com/llvm/llvm-project/pull/147835 in order to ease
rebasing the PR, and give a nice window for other patches to get rebased
as well.
It has been a while already, so lets go ahead and rename it back.
|
|
(#163148)
The PSHUFB instruction shuffles bytes within each 128-bit lane: for each
control byte, if bit 7 is set, the output byte is zeroed; otherwise, the
low 4 bits select a source byte (0–15) from the same lane.
Note: _mm_shuffle_pi8 function had to change as __anyext128 had negative
indices which are invalid in constant expression context.
Fixes #156612
|
|
AVX/AVX512 IFMA madd52 intrinsics to be used in constexpr (#161056)
Resolves #160498
|
|
constexpr (#156822)
[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - add MMX/SSE/AVX PHADD/SUB & HADDPS/D intrinsics to be used in constexpr
Fixes #155395
cover func:
_mm_hadd_pi16 _mm_hadd_epi16 _mm256_hadd_epi16
_mm_hadd_pi32 _mm_hadd_epi32 _mm256_hadd_epi32
_mm_hadds_pi16 _mm_hadds_epi16 _mm256_hadds_epi16
_mm_hsub_pi16 _mm_hsub_epi16 _mm256_hsub_epi16
_mm_hsub_pi32 _mm_hsub_epi32 _mm256_hsub_epi32
_mm_hsubs_pi16 _mm_hsubs_epi16 _mm256_hsubs_epi16
_mm_hadd_pd _mm256_hadd_pd
_mm_hadd_ps _mm256_hadd_ps
_mm_hsub_pd _mm256_hsub_pd
_mm_hsub_ps _mm256_hsub_ps
---------
Co-authored-by: whyuuwang <whyuuwang@tencent.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Co-authored-by: Simon Pilgrim <git@redking.me.uk>
|
|
To address @AaronBallman's feedback from
https://github.com/llvm/llvm-project/pull/143785 this patch implements
an explicit opt-out for `-fconstexpr-steps` by setting
`-fconstexpr-steps=0`.
This does not change any defaults, but gives users an easy way to opt
out of this limit altogether (and instead let the compiler reach the
system's resource limits).
Currently users set `constexpr-steps` to some arbitrary high number (and
I mean _arbitrary_ - see the tables in the previous PR). This isn't
actually opting out of the limit though - you're still bound by the
upper bound of the counter's type. If you have enough resources to
evaluate more than 18446744073709551615 steps that's bad news.
In any case, `=0` conveys the intent clearer. This is in line with how
we handle other flags, ie `-ftemplate-backtrace-limit` or
`-ferror-limit`.
|
|
SSE/AVX VPTEST/VTESTPD/VTESTPS intrinsics to be used in constexpr (#160428)
Fix #158653
Add handling for:
```
ptestz128 / ptestz256 → (a & b) == 0.
ptestc128 / ptestc256 → (~a & b) == 0
ptestnzc128 / ptestnzc256 → (a & b) != 0 AND (~a & b) != 0.
vtestzps / vtestzps256 → (S(a) & S(b)) == 0.
vtestcps / vtestcps256 → (~S(a) & S(b)) == 0.
vtestnzcps / vtestnzcps256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0.
vtestzpd / vtestzpd256 → (S(a) & S(b)) == 0.
vtestcpd / vtestcpd256 → (~S(a) & S(b)) == 0.
vtestnzcpd / vtestnzcpd256 → (S(a) & S(b)) != 0 AND (~S(a) & S(b)) != 0.
```
Add corresponding test cases for:
```
int _mm_test_all_ones (__m128i a)
int _mm_test_all_zeros (__m128i mask, __m128i a)
int _mm_test_mix_ones_zeros (__m128i mask, __m128i a)
int _mm_testc_pd (__m128d a, __m128d b)
int _mm256_testc_pd (__m256d a, __m256d b)
int _mm_testc_ps (__m128 a, __m128 b)
int _mm256_testc_ps (__m256 a, __m256 b)
int _mm_testc_si128 (__m128i a, __m128i b)
int _mm256_testc_si256 (__m256i a, __m256i b)
int _mm_testnzc_pd (__m128d a, __m128d b)
int _mm256_testnzc_pd (__m256d a, __m256d b)
int _mm_testnzc_ps (__m128 a, __m128 b)
int _mm256_testnzc_ps (__m256 a, __m256 b)
int _mm_testnzc_si128 (__m128i a, __m128i b)
int _mm256_testnzc_si256 (__m256i a, __m256i b)
int _mm_testz_pd (__m128d a, __m128d b)
int _mm256_testz_pd (__m256d a, __m256d b)
int _mm_testz_ps (__m128 a, __m128 b)
int _mm256_testz_ps (__m256 a, __m256 b)
int _mm_testz_si128 (__m128i a, __m128i b)
int _mm256_testz_si256 (__m256i a, __m256i b)
```
|
|
PMADDWD/PMADDUBSW intrinsics (#161563)
This PR updates the PMADDWD/PMADDUBSW builtins to support constant
expression handling, by extending the VectorExprEvaluator::VisitCallExpr
that handles interp__builtin_ia32_pmadd builtins.
Closes #155392
|
|
builtins, call rotl/rotr directly (#162113)
Fixes #162046
|
|
VPTERNLOGD/VPTERNLOGQ intrinsics to be used in constexpr (#158703)
Fix #157698
Add handling for `__builtin_ia32_pternlog[d/q][128/256/512]_mask[z]` intrinsics to `VectorExprEvaluator::VisitCallExpr` and `InterpBuiltin.cpp` with the corresponding test coverage:
```
_mm_mask_ternarylogic_epi32
_mm_maskz_ternarylogic_epi32
_mm_ternarylogic_epi32
_mm256_mask_ternarylogic_epi32
_mm256_maskz_ternarylogic_epi32
_mm256_ternarylogic_epi32
_mm512_mask_ternarylogic_epi32
_mm512_maskz_ternarylogic_epi32
_mm512_ternarylogic_epi32
_mm_mask_ternarylogic_epi64
_mm_maskz_ternarylogic_epi64
_mm_ternarylogic_epi64
_mm256_mask_ternarylogic_epi64
_mm256_maskz_ternarylogic_epi64
_mm256_ternarylogic_epi64
_mm512_mask_ternarylogic_epi64
_mm512_maskz_ternarylogic_epi64
_mm512_ternarylogic_epi64
```
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
The i16/i32 shuffle intrinsics (`pshufw`, `pshuflw`, `pshufhw`,
`pshufd`) currently cannot be used in constant expressions. This patch
adds support in both bytecode interpreter (InterpBuiltin.cpp) and
constant evaluator
(ExprConstant.cpp) for pshuf intrinsics, enabling their use in constant
expressions.
## Intrinsics covered
- `_mm_shuffle_pi16` (MMX `pshufw`)
- `_mm_shufflelo_epi16` / `_mm_shufflehi_epi16`
- `_mm_shuffle_epi32`
- Their AVX2/AVX512 vector-width variants
- Masked and maskz forms (handled indirectly via
`__builtin_ia32_select*`)
Fixes #156611
|
|
element extraction/insertion intrinsics to be used in constexpr #159753 (#161302)
FIXES: #159753
Enable constexpr evaluation for X86 vector element extract/insert builtins. and adds corresponding tests
Index is masked with `(Idx & (NumElts - 1))`, matching existing CodeGen.
|
|
Fixes #157492
|
|
pack intrinsics to be used in constexpr (#156003)
Fixes #154283
|
|
(#159998)
Fixes #158646
|
|
Summary:
The added bit counting builtins for vectors used `cttz` and `ctlz`,
which is consistent with the LLVM naming convention. However, these are
clang builtins and implement exactly the `__builtin_ctzg` and
`__builtin_clzg` behavior. It is confusing to people familiar with other
other builtins that these are the only bit counting intrinsics named
differently. This includes the additional operation for the undefined
zero case, which was added as a `clzg` extension.
|
|
AVX/AVX512 subvector insertion intrinsics to be used in constexpr #157709 (#158778)
AVX/AVX512 vector insert intrinsics now support constexpr evaluation in both the AST evaluator and bytecode interpreter paths.
FIXES: #157709
|
|
If it's truly a known const int, it won't emit any diagnostics anyway.
And if it did, we wouldn't notice because no call site passed something
non-null.
|
|
NFC. (#159330)
This avoids the following warnings:
../../clang/lib/AST/ExprConstant.cpp: In member function ‘bool {anonymous}::IntExprEvaluator::VisitBuiltinCallExpr(const clang::CallExpr*, unsigned int)’:
../../clang/lib/AST/ExprConstant.cpp:14104:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
14104 | }
| ^
../../clang/lib/AST/ExprConstant.cpp:14105:3: note: here
14105 | case Builtin::BIstrlen:
| ^~~~
../../clang/lib/Driver/ToolChains/CommonArgs.cpp: In function ‘std::string clang::driver::tools::complexRangeKindToStr(clang::LangOptionsBase::ComplexRangeKind ’:
../../clang/lib/Driver/ToolChains/CommonArgs.cpp:3523:1: warning: control reaches end of non-void function [-Wreturn-type]
3523 | }
| ^
|
|
Instead of having `State::getLangOpts()`, which does a virtual call to
`getASTContext()` to call `getLangOpts()` on that, just move
`getLangOpts()` to the subclasses so we can do that without the virtual
call. We never call `getLangOpts()` in `State.cpp`, so it's not needed
in the base class.
|
|
Both the expression (the initializer) as well as the VarDecl can't be
null here. Assert that.
|
|
This is not implemented at compile time and asserts in assertion builds,
so reject it here.
Fixed the coding style in `BuiltinShuffleVector` at the same time.
Fixes #158471
|