| Age | Commit message (Collapse) | Author |
|
These aren't AVX512VL tests
|
|
VPSHUFBITQMB intrinsics to be used in constexpr (#168100)
Resolves #161337
|
|
VPERMILPD/S variable mask intrinsics to be used in constexpr (#168861)
Allowing VPERMILPD/S intrinsics to be used in constexpr
Closes #167878
|
|
evaluation (#168206)
Fixes #167681
|
|
'target' is not one of the features recognized by clang tests, and the
test doesn't require X86 backend to be built. Specify the target
explicitly instead. Remove duplicate `-fsanitize=type` as well.
|
|
|
|
(#166170)
…nd update docs
|
|
AVX512 mask predicate intrinsics to be used in constexpr (#165054)
Enables constexpr evaluation for the following AVX512 Instrinsics:
```
_mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask
_mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask
_mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask
_mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask
```
Part of #162072
|
|
The clang side of the calling convention code for arm64 vs. arm64ec is
close enough that this isn't really noticeable in most cases, but the
rule for choosing whether to pass a struct directly or indirectly is
significantly different.
(Adapted from my old patch https://reviews.llvm.org/D125419 .)
Fixes #89615.
|
|
for complex compound assignment (#166798)
- Fixes https://github.com/llvm/llvm-project/issues/166512
- `ComplexExprEmitter::EmitCompoundAssignLValue` is calling
`EmitLoadOfScalar(LValue, SourceLocation)` to load the LHS value in the
case that it's non-complex, however this function requires that the
value is a simple LValue - issue occurred because the LValue in question
was a bitfield LValue. I changed it to use this function which seems to
handle all of the different cases (deferring to the original
`EmitLoadOfScalar` if it's a simple LValue)
|
|
Resolves #166529
|
|
|
|
This change adds intrinsics and clang builtins for the remaining float
to fp16 conversions. This includes the following conversions:
- float to bf16x2 - satfinite variants
- float to f16x2 - satfinite variants
- float to bf16 - satfinite variants
- float to f16 - all variants
Tests are added in `convert-sm80.ll` and `convert-sm80-sf.ll` for the
intrinsics and in `builtins-nvptx.c` for the clang builtins.
|
|
This follows the list of names used by GCC.
|
|
|
|
This patch introduces preliminary support for additional memory
locations.
They are: target_mem0 and target_mem1 and they model memory locations
that cannot be represented with existing memory locations.
It was a solution suggested in :
https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6
Currently, these locations are not yet target-specific. The goal is to
enable the compiler to express read/write effects on these resources.
|
|
used in constexpr (#168496)
### Summary
This PR resolves #160559 - other pd/ps/epi/epu part of AVX512 masked arithmetic intrinsics.
|
|
constexpr (#162816)
This PR just resolves ss/sd part of AVX512 masked arithmetic intrinsics of #160559.
|
|
This change fixes the SM requirement of the f32 to tf32 conversion with
`rna` rounding mode and `.satfinite` modifier. The current requirement
specified is `sm_89` but this conversion is supported from `sm_80`
onwards after it was added in PTX 8.1.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
|
|
Recent commits (7fe069121b57a, 53ddeb493529a) marked several x86
intrinsics as constexpr in headers without providing the necessary
constant evaluation support in the compiler backend. This caused
compilation failures when attempting to use these intrinsics in constant
expressions.
Resolves #166814
Resolves #161203
|
|
Currently only __builtin_elementwise_sqrt emits contrained fp intrinsic
and propagates fp options.
This commit adds this support for the rest of elementwise builtins.
|
|
Resolves #166976
|
|
The change made in #162433 exposed a weakness in this test that showed
different results on different archs that were not caught on the CI
bots. This expands the tests to cover more archs, and out of necessity
moves the os_log test into a separate test file.
|
|
intrinsics tests (#168274)
|
|
This updates the test to avoid inclusion of int128 bswapg tests on
targets that don't support int128 at all.
This fixes failures introduced by #162433
|
|
Add a new builtin function __builtin_bswapg. It works on any integral
types that has a multiple of 16 bits as well as a single byte.
Closes #160266
|
|
As in title. AVX10.x doesn't distinguish between available vector
lengths.
-mattr=avx10.x-512 and defining of macros with _512 is kept for compatibility.
Bit-positions of avx10.1/2 features in compiler-rt and X86TargetParser
are synced to match those in the gcc.
|
|
__builtin_elementwise_sqrt (#168057)
Followup to #165682
|
|
Non-binary output files from the compiler need the `OF_Text` flag set
for encoding conversion to be performed correctly on z/OS.
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
|
|
The Transactional Memory Extension (TME) was introduced as part of
Armv9-A but has not been adopted by the ecosystem. This mirrors what
Arm has observed with similar extensions in other architectures.
Therefore, remove FEAT_TME assembly and ACLE code from llvm, because
support for TME has now been officially withdrawn, as noted here:
```
FEAT_TME is withdrawn from all future versions of Arm®
Architecture Reference Manual for A-profile architecture.
```
referenced in Known Issue D24093, documented here:
https://developer.arm.com/documentation/102105/lb-05/
|
|
_mm_sqrt_sh / _mm512_sqrt_ph - these were missed from #167692
|
|
Help to unblock #165682
I have the avx10_2 bf16 test coverage as well, but its currently
breaking as we're missing bf16 strict_fsqrt lowering in the backend
|
|
Resolves #167476
|
|
__builtin_elementwise_fma and support constexpr (#154731)
Now that #152455 is done, we can make all the scalar fma intrinsics to
wrap __builtin_elementwise_fma, which also allows constexpr
The main difference is that FMA4 intrinsics guarantee that the upper
elements are zero, while FMA3 passes through the destination register
elements like older scalar instructions
Fixes #154555
|
|
multiversion resolvers (#167516)
- Fixes https://github.com/llvm/llvm-project/issues/163369
- Segmentation fault occurred because resolver was calling TSan
instrumentation functions (__tsan_func_entry, __tsan_func_exit) but as
the resolver is run by the dynamic linker at load time, TSan is not
initialized yet so the current thread pointer is null.
- This PR adds the DisableSanitizerInstrumentation attribute to the
multiversion function resolvers to avoid issues like this.
- Added regression test for TSan segfault.
|
|
PALIGNR byte shift intrinsics to be used in constexpr (#162005)
Fixes #160509
|
|
as noop. (#167359)
|
|
Fixes https://github.com/llvm/llvm-project/issues/163256
|
|
Added CONSTEXPR macro and test for the following intrinsics:
-- _mm_mask_adds_epi16 _mm_maskz_adds_epi16
-- _mm_mask_adds_epi8 _mm_maskz_adds_epi8
-- _mm_mask_adds_epu16 _mm_maskz_adds_epu16
-- _mm_mask_adds_epu8 _mm_maskz_adds_epu8
-- _mm_mask_broadcastb_epi8 _mm_maskz_broadcastb_epi8
-- _mm_mask_broadcastw_epi16 _mm_maskz_broadcastw_epi16
-- _mm_mask_cvtepi8_epi16 _mm_maskz_cvtepi8_epi16
-- _mm_mask_cvtepu8_epi16 _mm_maskz_cvtepu8_epi16
-- _mm_mask_packs_epi16 _mm_maskz_packs_epi16
-- _mm_mask_packs_epi32 _mm_maskz_packs_epi32
-- _mm_mask_packus_epi16 _mm_maskz_packus_epi16
-- _mm_mask_packus_epi32 _mm_maskz_packus_epi32
-- _mm_mask_set1_epi16 _mm_maskz_set1_epi16
-- _mm_mask_set1_epi8 _mm_maskz_set1_epi8
-- _mm_mask_slli_epi16 _mm_mask_slli_epi16
-- _mm_mask_subs_epi16 _mm_maskz_subs_epi16
-- _mm_mask_subs_epi8 _mm_maskz_subs_epi8
-- _mm_mask_subs_epu16 _mm_maskz_subs_epu16
-- _mm_mask_subs_epu8 _mm_maskz_subs_epu8
-- _mm_mask_unpackhi_epi16 _mm_maskz_unpackhi_epi16
-- _mm_mask_unpackhi_epi8 _mm_maskz_unpackhi_epi8
-- _mm_mask_unpacklo_epi16 _mm_maskz_unpacklo_epi16
-- _mm_mask_unpacklo_epi8 _mm_maskz_unpacklo_epi8
-- _mm256_mask_adds_epi16 _mm256_maskz_adds_epi16
-- _mm256_mask_adds_epi8 _mm256_maskz_adds_epi8
-- _mm256_mask_adds_epu16 _mm256_maskz_adds_epu16
-- _mm256_mask_adds_epu8 _mm256_maskz_adds_epu8
-- _mm256_mask_broadcastb_epi8 _mm256_maskz_broadcastb_epi8
-- _mm256_mask_broadcastw_epi16 _mm256_maskz_broadcastw_epi16
-- _mm256_mask_cvtepi8_epi16 _mm256_maskz_cvtepi8_epi16
-- _mm256_mask_cvtepu8_epi16 _mm256_maskz_cvtepu8_epi16
-- _mm256_mask_packs_epi16 _mm256_maskz_packs_epi16
-- _mm256_mask_packs_epi32 _mm256_maskz_packs_epi32
-- _mm256_mask_packus_epi16 _mm256_maskz_packus_epi16
-- _mm256_mask_packus_epi32 _mm256_maskz_packus_epi32
-- _mm256_mask_set1_epi16 _mm256_maskz_set1_epi16
-- _mm256_mask_set1_epi8 _mm256_maskz_set1_epi8
-- _mm256_mask_slli_epi16 _mm256_mask_slli_epi16
-- _mm256_mask_subs_epi16 _mm256_maskz_subs_epi16
-- _mm256_mask_subs_epi8 _mm256_maskz_subs_epi8
-- _mm256_mask_subs_epu16 _mm256_maskz_subs_epu16
-- _mm256_mask_subs_epu8 _mm256_maskz_subs_epu8
-- _mm256_mask_unpackhi_epi16 _mm256_maskz_unpackhi_epi16
-- _mm256_mask_unpackhi_epi8 _mm256_maskz_unpackhi_epi8
-- _mm256_mask_unpacklo_epi16 _mm256_maskz_unpacklo_epi16
-- _mm256_mask_unpacklo_epi8 _mm256_maskz_unpacklo_epi8
-- _mm512_mask_adds_epi16 _mm512_maskz_adds_epi16
-- _mm512_mask_adds_epi8 _mm512_maskz_adds_epi8
-- _mm512_mask_adds_epu16 _mm512_maskz_adds_epu16
-- _mm512_mask_adds_epu8 _mm512_maskz_adds_epu8
-- _mm512_mask_broadcastb_epi8 _mm512_maskz_broadcastb_epi8
-- _mm512_mask_broadcastw_epi16 _mm512_maskz_broadcastw_epi16
-- _mm512_mask_mov_epi16 _mm512_maskz_mov_epi16
-- _mm512_mask_mov_epi8 _mm512_maskz_mov_epi8
-- _mm512_mask_packs_epi16 _mm512_maskz_packs_epi16
-- _mm512_mask_packs_epi32 _mm512_maskz_packs_epi32
-- _mm512_mask_packus_epi16 _mm512_maskz_packus_epi16
-- _mm512_mask_packus_epi32 _mm512_maskz_packus_epi32
-- _mm512_mask_set1_epi16 _mm512_maskz_set1_epi16
-- _mm512_mask_set1_epi8 _mm512_maskz_set1_epi8
-- _mm512_mask_subs_epi16 _mm512_maskz_subs_epi16
-- _mm512_mask_subs_epi8 _mm512_maskz_subs_epi8
-- _mm512_mask_subs_epu16 _mm512_maskz_subs_epu16
-- _mm512_mask_subs_epu8 _mm512_maskz_subs_epu8
-- _mm512_mask_unpackhi_epi16 _mm512_maskz_unpackhi_epi16
-- _mm512_mask_unpackhi_epi8 _mm512_maskz_unpackhi_epi8
-- _mm512_mask_unpacklo_epi16 _mm512_maskz_unpacklo_epi16
-- _mm512_mask_unpacklo_epi8 _mm512_maskz_unpacklo_epi8
closes #162070
|
|
(#166615)
Resolves https://github.com/llvm/llvm-project/issues/166057
---------
Co-authored-by: Phoebe Wang <phoebe.wang@intel.com>
|
|
Fix AArch64 argument passing for C++ empty classes with large explicitly specified alignment
reproducer: https://godbolt.org/z/qsze8fqra
rel issue: https://github.com/llvm/llvm-project/issues/69872
rel commit: https://github.com/llvm/llvm-project/commit/1711cc930bda8d27e87a2092bd220c18e4600c98
|
|
(#165431)
Add support for the following new AArch64 Neon intrinsics:
```
float16x8_t vmmlaq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t);
float32x4_t vmmlaq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t);
```
|
|
Add support for these new vcvt* intrinsics:
```
int64_t vcvts_s64_f32(float32_t);
uint64_t vcvts_u64_f32(float32_t);
int32_t vcvtd_s32_f64(float64_t);
uint32_t vcvtd_u32_f64(float64_t);
int64_t vcvtns_s64_f32(float32_t);
uint64_t vcvtns_u64_f32(float32_t);
int32_t vcvtnd_s32_f64(float64_t);
uint32_t vcvtnd_u32_f64(float64_t);
int64_t vcvtms_s64_f32(float32_t);
uint64_t vcvtms_u64_f32(float32_t);
int32_t vcvtmd_s32_f64(float64_t);
uint32_t vcvtmd_u32_f64(float64_t);
int64_t vcvtps_s64_f32(float32_t);
uint64_t vcvtps_u64_f32(float32_t);
int32_t vcvtpd_s32_f64(float64_t);
uint32_t vcvtpd_u32_f64(float64_t);
int64_t vcvtas_s64_f32(float32_t);
uint64_t vcvtas_u64_f32(float32_t);
int32_t vcvtad_s32_f64(float64_t);
uint32_t vcvtad_u32_f64(float64_t);
```
|
|
This reverts commit b1e511bf5a4c702ace445848b30070ac2e021241.
https://github.com/llvm/llvm-project/issues/160243
Reverting because the GCC C front end is incorrect.
---------
Co-authored-by: Jim Lin <jim@andestech.com>
|
|
AVX512 KTEST/KORTEST intrinsics to be used in constexpr (#166103)
Add AVX512 KTEST/KORTEST intrinsics to be used in constexpr.
Fixes #162051
|
|
The counted_by attribute currently rejects void* members because void
has no defined size. However, the sized_by attribute accepts void* since
it explicitly measures bytes. As a GNU extension, void pointer
arithmetic treats void as having size 1 byte, so counted_by on void*
should behave identically to sized_by (treating the count as bytes).
Allow counted_by on void* as a GNU extension. The implementation
validates this only at declaration time in SemaBoundsSafety.cpp,
emitting a -Wpointer-arith warning that the attribute is treated as a
GNU extension equivalent to sized_by. Both use-site validation and code
generation trust this earlier validation, avoiding redundant checks.
In CodeGen, __builtin_dynamic_object_size now correctly handles
counted_by on void* by treating any CountAttributedType with zero
element size as having 1-byte elements, matching the GNU void pointer
arithmetic semantics.
Add tests validating both Sema diagnostics and CodeGen behavior (correct
byte counts from __builtin_dynamic_object_size). Update existing
counted_by tests to explicitly use -Wpointer-arith to preserve their
original intent of rejecting void* in strict C mode.
|
|
This PR adds __builtin_elementwise_ldexp. It can be used for
implementing OpenCL ldexp builtin with vector inputs.
|
|
Currently, the ARM backend incorrectly parses every `arm` prefixed arch
to be non-thumb, but `armv6m` is THUMB and doesnt have ARM ops causing
the test to fail when compiling to assembly and not LLVM IR: `error:
Function 'foo' uses ARM instructions, but the target does not support
ARM mode execution.` This only happens when invoking cc1 directly and
not the Clang driver.
As a quick triage, this patch changes the tests to use `thumb`.
Uncovered by https://github.com/llvm/llvm-project/pull/151404
|
|
This patch enables compile-time evaluation of AVX512 permutex2var
intrinsics in constexpr contexts.
Extend shuffle generic to handle both integer immediate and vector mask
operands.
Resolves #161335
|
|
Fixes https://github.com/llvm/llvm-project/issues/114402.
This patch accept empty enum in C as a microsoft extension and introduce
an new warning `-Wmicrosoft-empty-enum`.
---------
Signed-off-by: yicuixi <qin_17914@126.com>
Co-authored-by: Erich Keane <ekeane@nvidia.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
|