summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-10-26Testusers/guy-david/machine-licm-implicit-defGuy David
2025-10-24[AArch64][GlobalISel] SIMD fpcvt codegen for fptoi(_sat) (#160831)Lukacma
This is followup patch to #157680, which allows simd fpcvt instructions to be generated from fptoi(_sat) nodes.
2025-10-24[test][Transforms] Remove unsafe-fp-math uses part 3 (NFC) (#164787)paperchalice
Post cleanup for #164534.
2025-10-24[mlir][tosa] Add ext-int64 support (#164389)Luke Hutton
This commit adds support for the EXT-INT64 extension added to the specification here: https://github.com/arm/tosa-specification/commit/1b690f8e120de2cc9b28a23b9f607225aedafdce
2025-10-24[DAG][AArch64] Ensure that ResNo is correct for uses of Ptr when considering ↵David Green
postinc. (#164810) We might be looking at a different use, for example in the uses of a i32,i64,ch preindex load. Fixes #164775
2025-10-24[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703)Ramkumar Ramachandra
InstSimplifyFolder can fold binary intrinsics, so take the opportunity to unify code with getOpcodeOrIntrinsicID, and handle the case. The additional handling of WidenGEP is non-functional, as the GEP is simplified before it is widened, as the included test shows.
2025-10-24[AArch64] Optimized rdsvl followed by constant mul (#162853)Lukacma
Currently when RDSVL is followed by constant multiplication, no specific optimization exist which would leverage the immediate multiplication operand to generate simpler assembly. This patch adds such optimization and allow rewrites like these if certain conditions are met: `(mul (srl (rdsvl 1), 3), x) -> (shl (rdsvl y), z) `
2025-10-24[gn build] Port 4f53413ff0a5LLVM GN Syncbot
2025-10-24[flang][mlir] fix irreflexibility violation of strict weak ordering in ↵Emilio Cota
#155348 (#164833) This fixes strict weak ordering checks violations from #155348 when running these two tests: mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir Sample error: /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering This is because (x < x) should be false, not true, to meet the irreflexibility property. (Note that .dominates(x, x) returns true.) I'm afraid that even after this commit we can't guarantee a strict weak ordering, because we can't guarantee transitivity of equivalence by sorting with a strict dominance function. However the tests are not failing anymore, and I am not at all familiar with this code so I will leave this concern up to the original author for consideration. (Ideas without any further context: I would consider a topological sort or walking a dominator tree.) Reference on std::sort and strict weak ordering: https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
2025-10-24Revert "[mlir][scf] Add parallelLoopUnrollByFactors()" (#164949)fabrizio-indirli
Reverts llvm/llvm-project#163806 due to linking errors on the function `mlir::scf::computeUbMinusLb`
2025-10-24REAPPLY [ORC] Add automatic shared library resolver for unresolved symbols. ↵SahilPatidar
#148410 (#164551) This PR reapplies the changes previously introduced in #148410. It introduces a redesigned and rebuilt Cling-based auto-loading workaround that enables scanning libraries and resolving unresolved symbols within those libraries.
2025-10-24[AArch64][CostModel] Reduce cost of wider than legal get.active.lane.mask ↵Kerry McLaughlin
(#163786) getIntrinsicInstrCost should halve the cost returned by getTypeLegalizationCost when the return type requires splitting, but we know that the whilelo (predicate pair) instruction can be used. When splitting is still required, the cost get_active_lane_mask should also reflect the additional saturating add required to increment the start value.
2025-10-24[llvm][docs] Correct description of %t lit substitution (#164397)David Spickett
%t is currently documented as: temporary file name unique to the test https://llvm.org/docs/CommandGuide/lit.html#substitutions Which I take to mean if the path is a/b/c/tempfile, then %t would be tempfile. It is not, it's the whole path. (which is hinted at by %basename_t, but why would you read that if you didn't need to use it) As seen in #164396 this can create confusion when people use it as if it were just the file name. Make it clear in the docs that this is a unique path, which can be used to make files or folders.
2025-10-24[mlir][scf] Add parallelLoopUnrollByFactors() (#163806)fabrizio-indirli
- In the SCF Utils, add the `parallelLoopUnrollByFactors()` function to unroll scf::ParallelOp loops according to the specified unroll factors - Add a test pass "TestParallelLoopUnrolling" and the related LIT test - Expose `mlir::parallelLoopUnrollByFactors()`, `mlir::generateUnrolledLoop()`, and `mlir::scf::computeUbMinusLb()` functions in the mlir/Dialect/SCF/Utils/Utils.h header to make them available to other passes. - In `mlir::generateUnrolledLoop()`, add also an optional `IRMapping *clonedToSrcOpsMap` argument to map the new cloned operations to their original ones. In the function body, change the default `AnnotateFn` type to `static const` to silence potential warnings about dangling references when a function_ref is assigned to a variable with automatic storage. Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
2025-10-24[CIR] Support ExplicitCast for ConstantExpr (#164783)Amr Hesham
Support the ExplicitCast for ConstantExpr
2025-10-24[libcxx] Define `_LIBCPP_HAS_C8RTOMB_MBRTOC8` to true if compiling with ↵Victor Campos
clang (#152724) Define `_LIBCPP_HAS_C8RTOMB_MBRTOC8` to `1` if compiling with clang. Some tests involving functionality from `uchar.h`/`cuchar` fail when the platform or the supporting C library does not provide support for the corresponding features. These have been xfailed. This patch will enable the adoption of newer picolibc versions.
2025-10-24[mlir][vector][nfc] Update tests for folding mem operations (#164255)Andrzej Warzyński
Tests in "fold_maskedload_to_load_all_true_dynamic" excercise folders for: * vector.maskedload, vector.maskedstore, vector.scatter, vector.gather, vector.compressstore, vector.expandload. This patch renames and documents these tests in accordance with: * https://mlir.llvm.org/getting_started/TestingGuide/ Note: the updated tests are referenced in the Test Formatting Best Practices section of the MLIR testing guide: * https://mlir.llvm.org/getting_started/TestingGuide/#test-formatting-best-practices Keeping them aligned with the guidelines ensures consistency and clarity across MLIR’s test suite.
2025-10-24[Headers][X86] Allow SLLDQ/SRLDQ byte shift intrinsics to be used in ↵Ye Tian
constexpr (#164166) Support constexpr usage for SLLDQ/SRLDQ byte shift intrinsics This draft PR adds support for using the following SRLDQ intrinsics in constant expressions: - _mm_srli_si128 - _mm256_srli_si256 - _mm_slli_si128 - _mm256_slli_si256 Relevant tests are included. Fixes #156494
2025-10-24[X86] Fold generic ADD/SUB with constants to X86ISD::SUB/ADD (#164316)Brandon
Fix #163125 This PR enhances `combineX86AddSub` so that it can handle `X86ISD::SUB(X,Constant)` with `add(X,-Constant)` and other similar cases: - `X86ISD::ADD(LHS, C)` will fold `sub(-C, LHS)` - `X86ISD::SUB(LHS, C)` will fold `add(LHS, -C)` - `X86ISD::SUB(C, RHS)` will fold `add(RHS, -C)` `CodeGen/X86/dag-update-nodetomatch.ll` is updated because following IR is folded: ```llvm for.body2: ; ...... ; This generates `add t6, Constant:i64<1>` %indvars.iv.next = add nsw i64 %indvars.iv, 1; ; This generates `X86ISD::SUB t6, Constant:i64<-1>` and folds the previous `add` %cmp = icmp slt i64 %indvars.iv, -1; br i1 %cmp, label %for.body2, label %for.cond1.for.inc3_crit_edge.loopexit ``` ```diff - ; CHECK-NEXT: movq (%r15), %rax - ; CHECK-NEXT: movq %rax, (%r12,%r13,8) - ; CHECK-NEXT: leaq 1(%r13), %rdx - ; CHECK-NEXT: cmpq $-1, %r13 - ; CHECK-NEXT: movq %rdx, %r13 + ; CHECK-NEXT: movq (%r12), %rax + ; CHECK-NEXT: movq %rax, (%r13,%r9,8) + ; CHECK-NEXT: incq %r9 ```
2025-10-24[gn build] Port 44331d259493LLVM GN Syncbot
2025-10-24[libc++][C++03] Remove some of the C++03-specific C wrapper headers (#163772)Nikolas Klauser
`include_next` doesn't work very well with the C++03 headers and modules. Since these specific headers are very self-contained there isn't much of a reason to split them into C++03/non-C++03 headers, so let's just remove them. The few C wrapper headers that aren't as self-contained will be refactored in a separate patch.
2025-10-24[InstCombine] Constant fold binops through `vector.insert` (#164624)Benjamin Maxwell
This patch improves constant folding through `llvm.vector.insert`. It does not change anything for fixed-length vectors (which can already be folded to ConstantVectors for these cases), but folds scalable vectors that otherwise would not be folded. These folds preserve the destination vector (which could be undef or poison), giving targets more freedom in lowering the operations.
2025-10-24[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)David Green
I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-24[test][X86] Remove unsafe-fp-math uses (NFC) (#164814)paperchalice
Post cleanup for #164534.
2025-10-23[webkit.UncountedLambdaCapturesChecker] Add the support for WTF::ScopeExit ↵Ryosuke Niwa
and WTF::makeVisitor (#161926) Lambda passed to WTF::ScopeExit / WTF::makeScopeExit and WTF::makeVisitor should be ignored by the lambda captures checker so long as its resulting object doesn't escape the current scope. Unfortunately, recognizing this pattern generally is too hard to do so directly hard-code these two function names to the checker.
2025-10-24[clang][bytecode] Catch placement-new into invalid destination (#164804)Timm Baeder
We failed to check for null and non-block pointers. Fixes https://github.com/llvm/llvm-project/issues/152952
2025-10-24[ARM] Update remaining cost tests with -cost-kind=all. NFCDavid Green
2025-10-24[clang][bytecode] Fix CXXConstructExpr for multidim arrays (#164760)Timm Baeder
This is a thing apparently. Fixes https://github.com/llvm/llvm-project/issues/153803
2025-10-23[MemProf] Fix the propagation of context/size info after inlining (#164872)Teresa Johnson
In certain cases the context/size info we use for reporting of hinted bytes in the LTO link was being dropped when we re-constructed context tries and memprof metadata after inlining. This only affected cases where we were using the -memprof-min-percent-max-cold-size option to only keep that information for the largest cold contexts, and where the pre-LTO compile did *not* specify -memprof-report-hinted-sizes. The issue is that we don't have a MaxSize, which is only available during the profile matching step. Use an existing bool indicating that we are redoing this from existing metadata to always propagate any context size metadata in that case.
2025-10-23[ThinLTO] Simplify checking for single external copy (NFCI) (#164861)Teresa Johnson
Replace a loop over all summary copies with a simple check for a single externally available copy of a symbol. The usage of this result has changed since it was added and we now only need to know if there is a single one.
2025-10-23[ThinLTO] Avoid creating map entries on lookup (NFCI) (#164873)Teresa Johnson
We could inadvertently create new entries in the PrevailingModuleForGUID map during lookup, which was always using operator[]. In most cases we will have one for external symbols, but not in cases where the prevailing copy is in a native object. Or if this happened to be looked up for a local. Make the map private and create and use accessors.
2025-10-24[SDAG] Fix deferring constrained function calls (#153029)Serge Pavlov
Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation. In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled. This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma `STDC FENV_ACCESS ON` is in effect, this restriction should help preserving the intended semantics. When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrict' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation.
2025-10-24[asan] Avoid -Wtautological-pointer-compare (#164918)Thurston Dang
https://github.com/llvm/llvm-project/pull/164906 converted a -Wpointer-bool-conversion warning into a -Wtautological-pointer-compare warning. Avoid both by using the bool cast.
2025-10-24[AVR] Fix occasional corruption in stack passed paramsCarl Peto
Corruption can occur with passing parameters on the stack when under register pressure. Fixes #163015 .
2025-10-24[IR] Fix Module::setModuleFlag for uniqued metadata (#164580)Andrew Savonichev
`Module::setModuleFlag` is supposed to change a single module. However, when an `MDNode` has the same value in more than one module in the same `LLVMContext`, such `MDNode` is shared (uniqued) across all of them. Therefore `MDNode::replaceOperandWith` changes all modules that share the same `MDNode`. This used to cause problems for #86212, where a module is marked as "upgraded" via a module flag. When this flag is shared across multiple modules, all of them are marked, yet some may not have been processed at all. After the patch we now construct a new `MDNode` and replace the old one.
2025-10-23[RISCV] Rename RISCVISD::ABSW->NEGW_MAX. NFC (#164909)Craig Topper
This matches what it expands to. The P extension adds a proper ABSW instruction so being precise is important to avoid confusion.
2025-10-23[asan] Avoid -Wpointer-bool-conversion warning by comparing to nullptr (#164906)Thurston Dang
The current code may trigger a compiler warning: ``` address of function 'wcsnlen' will always evaluate to 'true' [-Wpointer-bool-conversion] ``` Fix this by comparing to nullptr. The same fix is applied to strnlen for future-proofing.
2025-10-24[AArch64][llvm] Relax mandatory features for Armv9.6-A (#163973)Jonathan Thackray
`FEAT_FPRCVT` is moved from being mandatory in Armv9.6-A to Armv9.7-A `FEAT_SVE2p2` is removed from being mandatory in Armv9.6-A
2025-10-24[AArch64] (NFC) Tidy up alignment/formatting in AArch64/AArch64InstrInfo.td ↵Jonathan Thackray
(#163645) It was noted in a code-review for earlier changes in this stack that some of the new 9.7 entries were mis-aligned. But actually, many of the entries were, so I've tidied them all up.
2025-10-24[AArch64][llvm] Remove FeatureMPAM guards for parity with gcc (#163166)Jonathan Thackray
Remove `AArch64::FeatureMPAM` guards from some MPAM system registers, since these system registers are not any under feature guard for gcc.
2025-10-24[AArch64][llvm] Armv9.7-A: Add support for new Advanced SIMD (Neon) ↵Jonathan Thackray
instructions (#163165) Add support for new Advanced SIMD (Neon) instructions: - FDOT (half-precision to single-precision, by element) - FDOT (half-precision to single-precision, vector) - FMMLA (half-precision, non-widening) - FMMLA (widening, half-precision to single-precision) as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com> Co-authored-by: Caroline Concatto <caroline.concatto@arm.com> Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-10-24[AArch64][llvm] Armv9.7-A: Add support for SVE2p3 LUTI6 operations (#163164)Jonathan Thackray
Add instructions for SVE2p3 LUTI6 operations: - LUTI6 (16-bit) - LUTI6 (8-bit) - LUTI6 (vector, 16-bit) - LUTI6 (table, four registers, 8-bit) - LUTI6 (table, single, 8-bit) as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-10-23[AArch64][llvm] Armv9.7-A: Add support for SVE2p3 shift operations (#163163)Jonathan Thackray
Add instructions for SVE2p3 shift operations: - SQRSHRN - SQRSHRUN - SQSHRN - SQSHRUN - UQRSHRN - UQSHRN as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
2025-10-23[PAC][clang] Correct handling of ptrauth queries of incomplete types (#164528)Oliver Hunt
In normal circumstances we can never get to this point as earlier Sema checks will have already have prevented us from making these queries. However in some cases, for example a sufficiently large number of errors, clang can start allowing incomplete types in records. This means a number of the internal interfaces can end up perform type trait queries that require querying the pointer authentication properties of types that contain incomplete types. While the trait queries attempt to guard against incomplete types, those tests fail in this case as the incomplete types are actually nested in the seemingly complete parent type.
2025-10-23[VPlan] Limit narrowInterleaveGroups to single block regions for now.Florian Hahn
Currently only regions with a single block are supported by the legality checks.
2025-10-23[AArch64][llvm] Armv9.7-A: Add support for SVE2p3 CVT operations (#163162)Jonathan Thackray
Add instructions for SVE2p3 CVT operations: - FCVTZSN - FCVTZUN - SCVTF - SCVTFLT - UCVTF - UCVTFLT as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
2025-10-23[AArch64][llvm] Armv9.7-A: Add support for SVE2p3 DOT and MLA operations ↵Jonathan Thackray
(#163161) Add instructions for SVE2p3 DOT and MLA operations: - BFMMLA (non-widening) - FMMLA (non-widening) - SDOT (2-way, vectors) - SDOT (2-way, indexed) - UDOT (2-way, vectors) - UDOT (2-way, indexed) as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
2025-10-23[AArch64][llvm] Armv9.7-A: Add support for SVE2p3 arithmetic operations ↵Jonathan Thackray
(#163160) Add instructions for SVE2p3 arithmetic operations: - `ADDQP` (add pairwise within quadword vector segments) - `ADDSUBP` (add subtract pairwise) - `SABAL` (two-way signed absolute difference sum and accumulate long) - `SUBP` (subtract pairwise) - `UABAL` (two-way unsigned absolute difference sum and accumulate long) as documented here: * https://developer.arm.com/documentation/ddi0602/2025-09/ * https://developer.arm.com/documentation/109697/2025_09/2025-Architecture-Extensions
2025-10-23[AMDGPU] Remove validation of s_set_vgpr_msb range (#164888)Stanislav Mekhanoshin
We will need the full 16-bit range of the operand to record previous mode.
2025-10-23[AMDGPU] Change patterns for v_[pk_]add_{min|max} (#164881)Stanislav Mekhanoshin
The intermediate result is in fact the add with saturation regardless of the clamp bit.