| Age | Commit message (Collapse) | Author |
|
|
|
This is followup patch to #157680, which allows simd fpcvt instructions
to be generated from fptoi(_sat) nodes.
|
|
Post cleanup for #164534.
|
|
This commit adds support for the EXT-INT64 extension added
to the specification here:
https://github.com/arm/tosa-specification/commit/1b690f8e120de2cc9b28a23b9f607225aedafdce
|
|
postinc. (#164810)
We might be looking at a different use, for example in the uses of a
i32,i64,ch preindex load.
Fixes #164775
|
|
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
|
|
Currently when RDSVL is followed by constant multiplication, no specific
optimization exist which would leverage the immediate multiplication
operand to generate simpler assembly. This patch adds such optimization
and allow rewrites like these if certain conditions are met:
`(mul (srl (rdsvl 1), 3), x) -> (shl (rdsvl y), z) `
|
|
|
|
#155348 (#164833)
This fixes strict weak ordering checks violations from #155348 when
running these two tests:
mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir
Sample error:
/stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering
This is because (x < x) should be false, not true, to meet the
irreflexibility property. (Note that .dominates(x, x) returns true.)
I'm afraid that even after this commit we can't guarantee a strict weak
ordering, because we can't guarantee transitivity of equivalence by
sorting with a strict dominance function. However the tests are not
failing anymore, and I am not at all familiar with this code so I will
leave this concern up to the original author for consideration. (Ideas
without any further context: I would consider a topological sort or
walking a dominator tree.)
Reference on std::sort and strict weak ordering:
https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
|
|
Reverts llvm/llvm-project#163806 due to linking errors on the function
`mlir::scf::computeUbMinusLb`
|
|
#148410 (#164551)
This PR reapplies the changes previously introduced in #148410.
It introduces a redesigned and rebuilt Cling-based auto-loading
workaround that enables scanning libraries and resolving unresolved
symbols within those libraries.
|
|
(#163786)
getIntrinsicInstrCost should halve the cost returned by getTypeLegalizationCost
when the return type requires splitting, but we know that the whilelo
(predicate pair) instruction can be used.
When splitting is still required, the cost get_active_lane_mask should also
reflect the additional saturating add required to increment the start value.
|
|
%t is currently documented as:
temporary file name unique to the test
https://llvm.org/docs/CommandGuide/lit.html#substitutions
Which I take to mean if the path is a/b/c/tempfile, then %t would be
tempfile. It is not, it's the whole path.
(which is hinted at by %basename_t, but why would you read that if you
didn't need to use it)
As seen in #164396 this can create confusion when people use it as if it
were just the file name.
Make it clear in the docs that this is a unique path, which can be used
to make files or folders.
|
|
- In the SCF Utils, add the `parallelLoopUnrollByFactors()` function
to unroll scf::ParallelOp loops according to the specified unroll factors
- Add a test pass "TestParallelLoopUnrolling" and the related LIT test
- Expose `mlir::parallelLoopUnrollByFactors()`, `mlir::generateUnrolledLoop()`,
and `mlir::scf::computeUbMinusLb()` functions in the
mlir/Dialect/SCF/Utils/Utils.h header to make them available
to other passes.
- In `mlir::generateUnrolledLoop()`, add also an optional
`IRMapping *clonedToSrcOpsMap` argument to map the new cloned
operations to their original ones.
In the function body, change the default `AnnotateFn` type to
`static const` to silence potential warnings about dangling references
when a function_ref is assigned to a variable with automatic storage.
Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
|
|
Support the ExplicitCast for ConstantExpr
|
|
clang (#152724)
Define `_LIBCPP_HAS_C8RTOMB_MBRTOC8` to `1` if compiling with clang.
Some tests involving functionality from `uchar.h`/`cuchar` fail when the
platform or the supporting C library does not provide support for the
corresponding features. These have been xfailed.
This patch will enable the adoption of newer picolibc versions.
|
|
Tests in "fold_maskedload_to_load_all_true_dynamic" excercise folders
for:
* vector.maskedload, vector.maskedstore, vector.scatter,
vector.gather, vector.compressstore, vector.expandload.
This patch renames and documents these tests in accordance with:
* https://mlir.llvm.org/getting_started/TestingGuide/
Note: the updated tests are referenced in the Test Formatting Best
Practices section of the MLIR testing guide:
* https://mlir.llvm.org/getting_started/TestingGuide/#test-formatting-best-practices
Keeping them aligned with the guidelines ensures consistency and clarity
across MLIR’s test suite.
|
|
constexpr (#164166)
Support constexpr usage for SLLDQ/SRLDQ byte shift intrinsics
This draft PR adds support for using the following SRLDQ intrinsics in
constant expressions:
- _mm_srli_si128
- _mm256_srli_si256
- _mm_slli_si128
- _mm256_slli_si256
Relevant tests are included.
Fixes #156494
|
|
Fix #163125
This PR enhances `combineX86AddSub` so that it can handle `X86ISD::SUB(X,Constant)` with `add(X,-Constant)` and other similar cases:
- `X86ISD::ADD(LHS, C)` will fold `sub(-C, LHS)`
- `X86ISD::SUB(LHS, C)` will fold `add(LHS, -C)`
- `X86ISD::SUB(C, RHS)` will fold `add(RHS, -C)`
`CodeGen/X86/dag-update-nodetomatch.ll` is updated because following IR is folded:
```llvm
for.body2:
; ......
; This generates `add t6, Constant:i64<1>`
%indvars.iv.next = add nsw i64 %indvars.iv, 1;
; This generates `X86ISD::SUB t6, Constant:i64<-1>` and folds the previous `add`
%cmp = icmp slt i64 %indvars.iv, -1;
br i1 %cmp, label %for.body2, label %for.cond1.for.inc3_crit_edge.loopexit
```
```diff
- ; CHECK-NEXT: movq (%r15), %rax
- ; CHECK-NEXT: movq %rax, (%r12,%r13,8)
- ; CHECK-NEXT: leaq 1(%r13), %rdx
- ; CHECK-NEXT: cmpq $-1, %r13
- ; CHECK-NEXT: movq %rdx, %r13
+ ; CHECK-NEXT: movq (%r12), %rax
+ ; CHECK-NEXT: movq %rax, (%r13,%r9,8)
+ ; CHECK-NEXT: incq %r9
```
|
|
|
|
`include_next` doesn't work very well with the C++03 headers and
modules. Since these specific headers are very self-contained there
isn't much of a reason to split them into C++03/non-C++03 headers, so
let's just remove them. The few C wrapper headers that aren't as
self-contained will be refactored in a separate patch.
|
|
This patch improves constant folding through `llvm.vector.insert`. It
does not change anything for fixed-length vectors (which can already be
folded to ConstantVectors for these cases), but folds scalable vectors
that otherwise would not be folded.
These folds preserve the destination vector (which could be undef or
poison), giving targets more freedom in lowering the operations.
|
|
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
|
|
Post cleanup for #164534.
|
|
and WTF::makeVisitor (#161926)
Lambda passed to WTF::ScopeExit / WTF::makeScopeExit and
WTF::makeVisitor should be ignored by the lambda captures checker so
long as its resulting object doesn't escape the current scope.
Unfortunately, recognizing this pattern generally is too hard to do so
directly hard-code these two function names to the checker.
|
|
We failed to check for null and non-block pointers.
Fixes https://github.com/llvm/llvm-project/issues/152952
|
|
|
|
This is a thing apparently.
Fixes https://github.com/llvm/llvm-project/issues/153803
|
|
In certain cases the context/size info we use for reporting of hinted
bytes in the LTO link was being dropped when we re-constructed context
tries and memprof metadata after inlining. This only affected cases
where we were using the -memprof-min-percent-max-cold-size option to
only keep that information for the largest cold contexts, and where the
pre-LTO compile did *not* specify -memprof-report-hinted-sizes.
The issue is that we don't have a MaxSize, which is only available
during the profile matching step. Use an existing bool indicating that
we are redoing this from existing metadata to always propagate any
context size metadata in that case.
|
|
Replace a loop over all summary copies with a simple check for a single
externally available copy of a symbol. The usage of this result has
changed since it was added and we now only need to know if there is a
single one.
|
|
We could inadvertently create new entries in the PrevailingModuleForGUID
map during lookup, which was always using operator[]. In most cases we
will have one for external symbols, but not in cases where the
prevailing copy is in a native object. Or if this happened to be looked
up for a local.
Make the map private and create and use accessors.
|
|
Selection DAG has a more sophisticated execution order representation
than the simple sequence used in IR, so building the DAG can take into
account specific properties of the nodes to better express possible
parallelism. The existing implementation does this for constrained
function calls, some of them are considered as independent, which can
potentially improve the generated code. However this mechanism
incorrectly implies that the calls with exception behavior 'ebIgnore'
cannot raise floating-point exception. The purpose of this change is to
fix the implementation.
In the current implementation, constrained function calls don't
immediately update the DAG root. Instead, the DAG builder collects their
output chains and flushes them when the root is required. Constrained
function calls cannot be moved across calls of external functions and
intrinsics that access floating-point environment, they work as
barriers. Between the barriers, constrained function calls can be
reordered, they may be considered independent from viewpoint of raising
exceptions. For strictfp functions this is possible only if
floating-point trapping is disabled.
This change introduces a new restriction - the calls with default
exception handling cannot not be moved between strictfp function calls.
Otherwise the exceptions raised by such call can disturb the expected
exception sequence. It means that constrained function calls with strict
exception behavior act as barriers for the calls with non-strict
behavior and vice versa. Effectively it means that the entire sequence
of constrained calls in IR is split into "strict" and "non-strict"
regions, in which restrictions on the order of constrained calls are
relaxed, but move from one region to another is not allowed. It agrees
with the representation of strictfp code in high-level languages. For
example, C/C++ strictfp code correspond to blocks where pragma `STDC
FENV_ACCESS ON` is in effect, this restriction should help preserving
the intended semantics.
When floating-point exception trapping is enabled, constrained
intrinsics with 'ebStrict' cannot be reordered, their sequence must be
identical to the original source order. The current implementation does
not distinguish between strictfp modes with trapping and without it.
This change make assumption that the trapping is disabled. It is not
correct in the general case, but is compatible with the existing
implementation.
|
|
https://github.com/llvm/llvm-project/pull/164906 converted a
-Wpointer-bool-conversion warning into a -Wtautological-pointer-compare
warning. Avoid both by using the bool cast.
|
|
Corruption can occur with passing parameters on the stack when under register pressure.
Fixes #163015 .
|
|
`Module::setModuleFlag` is supposed to change a single module. However,
when an `MDNode` has the same value in more than one module in the same
`LLVMContext`, such `MDNode` is shared (uniqued) across all of them.
Therefore `MDNode::replaceOperandWith` changes all modules that share
the same `MDNode`.
This used to cause problems for #86212, where a module is marked as
"upgraded" via a module flag. When this flag is shared across multiple
modules, all of them are marked, yet some may not have been processed at
all.
After the patch we now construct a new `MDNode` and replace the old one.
|
|
This matches what it expands to. The P extension adds a proper ABSW
instruction so being precise is important to avoid confusion.
|
|
The current code may trigger a compiler warning:
```
address of function 'wcsnlen' will always evaluate to 'true' [-Wpointer-bool-conversion]
```
Fix this by comparing to nullptr. The same fix is applied to strnlen for
future-proofing.
|
|
`FEAT_FPRCVT` is moved from being mandatory in Armv9.6-A to Armv9.7-A
`FEAT_SVE2p2` is removed from being mandatory in Armv9.6-A
|
|
(#163645)
It was noted in a code-review for earlier changes in this stack
that some of the new 9.7 entries were mis-aligned. But actually,
many of the entries were, so I've tidied them all up.
|
|
Remove `AArch64::FeatureMPAM` guards from some MPAM system registers,
since these system registers are not any under feature guard for gcc.
|
|
instructions (#163165)
Add support for new Advanced SIMD (Neon) instructions:
- FDOT (half-precision to single-precision, by element)
- FDOT (half-precision to single-precision, vector)
- FMMLA (half-precision, non-widening)
- FMMLA (widening, half-precision to single-precision)
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
|
|
Add instructions for SVE2p3 LUTI6 operations:
- LUTI6 (16-bit)
- LUTI6 (8-bit)
- LUTI6 (vector, 16-bit)
- LUTI6 (table, four registers, 8-bit)
- LUTI6 (table, single, 8-bit)
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
|
|
Add instructions for SVE2p3 shift operations:
- SQRSHRN
- SQRSHRUN
- SQSHRN
- SQSHRUN
- UQRSHRN
- UQSHRN
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
|
|
In normal circumstances we can never get to this point as earlier Sema
checks will have already have prevented us from making these queries.
However in some cases, for example a sufficiently large number of
errors, clang can start allowing incomplete types in records.
This means a number of the internal interfaces can end up perform type
trait queries that require querying the pointer authentication
properties of types that contain incomplete types. While the trait
queries attempt to guard against incomplete types, those tests fail in
this case as the incomplete types are actually nested in the seemingly
complete parent type.
|
|
Currently only regions with a single block are supported by the legality
checks.
|
|
Add instructions for SVE2p3 CVT operations:
- FCVTZSN
- FCVTZUN
- SCVTF
- SCVTFLT
- UCVTF
- UCVTFLT
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
|
|
(#163161)
Add instructions for SVE2p3 DOT and MLA operations:
- BFMMLA (non-widening)
- FMMLA (non-widening)
- SDOT (2-way, vectors)
- SDOT (2-way, indexed)
- UDOT (2-way, vectors)
- UDOT (2-way, indexed)
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
|
|
(#163160)
Add instructions for SVE2p3 arithmetic operations:
- `ADDQP` (add pairwise within quadword vector segments)
- `ADDSUBP` (add subtract pairwise)
- `SABAL` (two-way signed absolute difference sum and accumulate long)
- `SUBP` (subtract pairwise)
- `UABAL` (two-way unsigned absolute difference sum and accumulate long)
as documented here:
* https://developer.arm.com/documentation/ddi0602/2025-09/
* https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
|
|
We will need the full 16-bit range of the operand to record
previous mode.
|
|
The intermediate result is in fact the add with saturation
regardless of the clamp bit.
|