summaryrefslogtreecommitdiff
path: root/llvm/test/Transforms
AgeCommit message (Collapse)Author
2025-11-22[DFAJumpThreading] Try harder to avoid cycles in paths. (#169151)Usman Nadeem
If a threading path has cycles within it then the transformation is not correct. This patch fixes a couple of cases that create such cycles. Fixes https://github.com/llvm/llvm-project/issues/166868
2025-11-22[InstCombine] Generalize trunc-shift-icmp fold from (1 << Y) to (Pow2 << Y) ↵Pedro Lobo
(#169163) Extends the `icmp(trunc(shl))` fold to handle any power of 2 constant as the shift base, not just 1. This generalizes the following patterns by adjusting the comparison offsets by `log2(Pow2)`. ```llvm (trunc (1 << Y) to iN) == 0 --> Y u>= N (trunc (1 << Y) to iN) != 0 --> Y u< N (trunc (1 << Y) to iN) == 2**C --> Y == C (trunc (1 << Y) to iN) != 2**C --> Y != C ; to (trunc (Pow2 << Y) to iN) == 0 --> Y u>= N - log2(Pow2) (trunc (Pow2 << Y) to iN) != 0 --> Y u< N - log2(Pow2) (trunc (Pow2 << Y) to iN) == 2**C --> Y == C - log2(Pow2) (trunc (Pow2 << Y) to iN) != 2**C --> Y != C - log2(Pow2) ``` Proof: https://alive2.llvm.org/ce/z/2zwTkp
2025-11-21[unroll-and-jam] Document dependencies_multidims.ll and fix loop bounds ↵Sebastian Pop
(NFC) (#156578) Add detailed comments explaining why each function should/shouldn't be unroll-and-jammed based on memory access patterns and dependencies. Fix loop bounds to ensure array accesses are within array bounds: * sub_sub_less: j starts from 1 (not 0) to ensure j-1 >= 0 * sub_sub_less_3d: k starts from 1 (not 0) to ensure k-1 >= 0 * sub_sub_outer_scalar: j starts from 1 (not 0) to ensure j-1 >= 0
2025-11-21AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)Nicolai Hähnle
These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.
2025-11-21[profcheck] Propagate profile metadata to Wrapper function in optimize mode ↵Jin Huang
of ExpandVariadic. (#168161) This PR fixes the issue where profile metadata (`!prof`) is dropped from the `VariadicWrapper` when `ExpandVariadics` runs in `--expand-variadics-override=optimize` mode. In optimize mode, the pass splits the original variadic function into two parts: - A **VariadicWrapper** (retaining the original name) that handles the `va_list` setup. - A **FixedArityReplacement** (new function) that contains the original core logic. During this process, the basic blocks and associated metadata are spliced into the `FixedArityReplacement`. Consequently, the `VariadicWrapper`—which serves as the entry point for callers—is left without function entry count metadata. This change explicitly copies the `MD_prof` metadata from the `FixedArityReplacement` back to the `VariadicWrapper` after the split is defined. Co-authored-by: Jin Huang <jingold@google.com>
2025-11-21[VPlan] Only apply forced cost to recipes with underlying values. (#168372)Florian Hahn
Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: https://github.com/llvm/llvm-project/pull/168372
2025-11-21[LoopCacheAnalysis] Replace delinearization for fixed size array (#164798)Ryotaro Kasuga
This patch replaces the delinearization function used in LoopCacheAnalysis, switching from one that depends on type information in GEPs to one that does not. Once this patch and https://github.com/llvm/llvm-project/pull/161822 are landed, we can delete `tryDelinearizeFixedSize` from Delienarization, which is an optimization heuristic guided by GEP type information. After Polly eliminates its use of `getIndexExpressionsFromGEP`, we will be able to completely delete GEP-driven heuristics from Delinearization.
2025-11-21[VPlan] Drop poison-generating flags on induction trunc (#168922)Ramkumar Ramachandra
After truncating an integer-induction, neither nuw nor nsw hold. Fixes #168902. Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-20[unroll-and-jam] Document dependency patterns in dependencies.ll (NFC) (#156577)Sebastian Pop
Add detailed comments explaining each function's memory access patterns and why they should/shouldn't be unroll-and-jammed: - fore_aft_*: Dependencies between fore block and aft block - fore_sub_*: Dependencies between fore block and sub block - sub_aft_*: Dependencies between sub block and aft block - sub_sub_*: Dependencies within sub block - *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft; unsafe for sub/sub due to jamming conflicts - *_eq: Same iteration dependency (i+0) - safe due to preserved execution order - *_more: Forward dependency (i+1) - unsafe due to write-after-write races between unrolled iterations, except sub/sub case creates conflicts
2025-11-20[InstSimplify] Extend icmp-of-add simplification to sle/sgt/sge (#168900)Pedro Lobo
When comparing additions with the same base where one has `nsw`, the following simplification can be performed: ```llvm icmp slt/sgt/sle/sge (x + C1), (x +nsw C2) => icmp slt/sgt/sle/sge C1, C2 ``` Previously this was only done for `slt`. This patch extends it to the `sgt`, `sle`, and `sge` predicates when either of the conditions hold: - `C1 <= C2 && C1 >= 0`, or - `C2 <= C1 && C1 <= 0` This patch also handles the `C1 == C2` case, which was previously excluded. Proof: https://alive2.llvm.org/ce/z/LtmY4f
2025-11-20[LV] Add test a low-trip count test without folding the tail.Florian Hahn
Add a low trip count test that is currently vectorized but unprofitable, for https://github.com/llvm/llvm-project/issues/167858.
2025-11-20[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)Florian Hahn
Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289
2025-11-20[SLP]Check if the non-schedulable phi parent node has unique operandsAlexey Bataev
Need to check if the non-schedulable phi parent node has unique operands, if the incoming node has copyables, and the node is commutative. Otherwise, there might be issues with the correct calculation of the dependencies. Fixes #168589
2025-11-20[LV] Add tests for loops with low trip counts requiring tail-folding.Florian Hahn
Add extra tests for over-eager tail-folding for tiny trip-count loops. Reduced from https://github.com/llvm/llvm-project/issues/167858.
2025-11-20[profcheck] Exclude `naked`, asm-only functions from profcheck (#168447)Mircea Trofin
We can't do anything meaningful to such functions: they aren't optimizable, and even if inlined, they would bring no code open to optimization.
2025-11-20[LV] Check full partial reduction chains in order. (#168036)Florian Hahn
https://github.com/llvm/llvm-project/pull/162822 added another validation step to check if entries in a partial reduction chain have the same scale factor. But the validation was still dependent on the order of entries in PartialReductionChains, and would fail to reject some cases (e.g. if the first first link matched the scale of the second link, but the second link is invalidated later). To fix that, group chains by their starting phi nodes, then perform the validation for each chain, and if it fails, invalidate the whole chain for the phi. Fixes https://github.com/llvm/llvm-project/issues/167243. Fixes https://github.com/llvm/llvm-project/issues/167867. PR: https://github.com/llvm/llvm-project/pull/168036
2025-11-20[LoopPeel] Fix BFI when peeling last iteration without guard (#168250)Joel E. Denny
LoopPeel sometimes proves that, when reached, the original loop always executes at least two iterations. LoopPeel then unconditionally executes both the remaining loop's initial iteration and the peeled final iteration. But that increases the latter's frequency above its frequency in the original loop. To maintain the total frequency, this patch compensates by decreasing the remaininng loop's latch probability. This is another step in issue #135812 and was discussed at <https://github.com/llvm/llvm-project/pull/166858#discussion_r2528968542>.
2025-11-20VectorCombine/AMDGPU: Cleanup a test and add a new one (#168817)Nicolai Hähnle
The existing, recently added test contains a whole lot of noise in the form of dead instructions. Also, prefer named values. The new test isolates a separate issue with concatenating i8 vectors.
2025-11-20[LV] Allow partial reductions with an extended bin op (#165536)Sam Tebbs
A pattern of the form reduce.add(ext(mul)) is valid for a partial reduction as long as the mul and its operands fulfill the requirements of a normal partial reduction. The mul's extend operands will be optimised to the wider extend, and we already have oneUse checks in place to make sure the mul and operands can be modified safely. 1. -> https://github.com/llvm/llvm-project/pull/165536 2. https://github.com/llvm/llvm-project/pull/165543
2025-11-20AMDGPU: Autogenerate checks in a test (#168815)Nicolai Hähnle
2025-11-19Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135)Gang Chen
This is the fixed version of https://github.com/llvm/llvm-project/pull/163019
2025-11-19[SLP]Fix insertion point for setting for the nodesAlexey Bataev
The problem with the many def-use chain problems in SLP vectorizer are related to the fact that some nodes reuse the same instruction as insertion point. Insertion point is not the instruction, but the place between instructions. To set it correctly, better to generate pseudo instruction immediately after the last instruction, and use it as insertion point. It resolves the issues in most cases. Fixes #168512 #168576
2025-11-19[ConstantFolding] Add constant folding for scalable vector interleave ↵Craig Topper
intrinsics. (#168668) We can constant fold interleave of identical splat vectors to a larger splat vector.
2025-11-19[ConstantFolding] Generalize constant folding for vector_deinterleave2 to ↵Craig Topper
deinterleave3-8. (#168640)
2025-11-19[SLPVectorizer] Widen constant strided loads. (#162324)Mikhail Gudim
Given a set of pointers, check if they can be rearranged as follows (%s is a constant): %b + 0 * %s + 0 %b + 0 * %s + 1 %b + 0 * %s + 2 ... %b + 0 * %s + w %b + 1 * %s + 0 %b + 1 * %s + 1 %b + 1 * %s + 2 ... %b + 1 * %s + w ... If the pointers can be rearanged in the above pattern, it means that the memory can be accessed with a strided loads of width `w` and stride `%s`.
2025-11-19[LV] Simplify existing load/store sink/hoisting tests, extend coverage.Florian Hahn
Clean up some of the existing predicated load/store sink/hosting tests and add additional test coverage for more complex cases.
2025-11-19[InstSimplify] Add whitespace to struct declarations in vector-calls.ll. NFCCraig Topper
This matches how IR is printed.
2025-11-19[LoopInterchange] Don't consider loops with BTC=0 (#167113)Sjoerd Meijer
Do not consider loops with a zero backedge taken count as candidates for interchange. This seems like a sensible thing because it suggests the loop doesn't execute and there is no point in interchanging. As a bonus, this seems to avoid triggering an assert about phis and their uses from source code, so this is a partial fix for #163954 but it needs more work to properly fix that.
2025-11-19[DA] Replace delinearization for fixed size array (#161822)Ryotaro Kasuga
This patch replaces the delinearization function used in DA, switching from one that depends on type information in GEPs to one that does not. There are three types of changes in regression tests: improvements, degradations, and degradations but the related features will be removed. Since there were very few cases that are classified into the second category, I believe the impact of this change should be practically insignificant.
2025-11-19[VPlan] Print debug info for all recipes. (#168454)Florian Hahn
Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: https://github.com/llvm/llvm-project/pull/168454
2025-11-19[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724)Hassnaa Hamdi
Consider skipping epilogue scalable VF when they are greater than RemainingIterations same as fixed VF. And skip scalable RemainingIterations from that comparison because SCEV ATM can't evaluate non-canonical vscale-based expressions.
2025-11-18[LTT] Mark as unkown weak function tests. (#167399)Mircea Trofin
We don't have enough information to infer the probability of a weak function pointer being nullptr or not (open question if we could propagate this from the linker) Issue #147390
2025-11-18[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI).Florian Hahn
FCmp instructions have both a predicate and fast-math flags. Introduce a new FCmp kind, that combines both to model this correctly in the current system. This should be NFC modulo VPlan printing which now includes the correct fast-math flags.
2025-11-18[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (#168560)Ramkumar Ramachandra
Follow up on a cse OpType-mismatch crash reported due to ef023cae388d (Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the OpType correctly when returning from getFlagsFromIndDesc.
2025-11-18[AArch64] - Improve costing for Identity shuffles for SVE targets. (#165375)Pawan Nirpal
Identity masks can be treated as free when scalable vectorization is possible making the check agnostic of the vectorization policy fixed/scalable, This allows for aggressive vector combines for identity shuffle masks.
2025-11-18[ConstantFolding] Generalize constant folding for vector_interleave2 to ↵Craig Topper
interleave3-8. (#168473)
2025-11-19[InstCombine] Canonicalize signed saturated additions (#153053)AZero13
https://alive2.llvm.org/ce/z/YGT5SN https://alive2.llvm.org/ce/z/PVDxCw https://alive2.llvm.org/ce/z/8buR2N This is tricky because with positive numbers, we only go up, so we can in fact always hit the signed_max boundary. This is important because the intrinsic we use has the behavior of going the OTHER way, aka clamp to INT_MIN if it goes in that direction. And the range checking we do only works for positive numbers. Because of this issue, we can only do this for constants as well.
2025-11-18[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)Florian Hahn
Update VPlan to populate VPIRFlags during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRFlags from the underlying IR instruction each time. The VPRecipeWithIRFlags constructor taking an underlying instruction and setting the flags based on it has been removed. This centralizes initial VPIRFlags creation and ensures flags are consistently available throughout VPlan transformations and makes sure we don't accidentally re-add flags from the underlying instruction that already got dropped during transformations. Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did the same for VPIRMetadata. Should be NFC w.r.t. to the generated IR. PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-18[LLVM][InstSimplify] Add folds for SVE integer reduction intrinsics. (#167519)Paul Walker
[andv, eorv, orv, s/uaddv, s/umaxv, s/uminv] sve_reduce_##(none, ?) -> op's neutral value sve_reduce_##(any, neutral) -> op's neutral value [andv, orv, s/umaxv, s/uminv] sve_reduce_##(all, splat(X)) -> X [eorv] sve_reduce_##(all, splat(X)) -> 0
2025-11-18[LLVM][AArch64] Mark SVE integer intrinsics as speculatable. (#167915)Paul Walker
Exceptions include intrinsics that: * take or return floating point data * read or write FFR * read or write memory * read or write SME state
2025-11-18Extend MemoryEffects to Support Target-Specific Memory Locations (#148650)CarolineConcatto
This patch introduces preliminary support for additional memory locations. They are: target_mem0 and target_mem1 and they model memory locations that cannot be represented with existing memory locations. It was a solution suggested in : https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6 Currently, these locations are not yet target-specific. The goal is to enable the compiler to express read/write effects on these resources.
2025-11-18[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)Florian Hahn
This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18[SLP] Invariant loads cannot have a memory dependency on stores. (#167929)Michael Bedy
2025-11-17InstCombine: Stop transforming EQ/NE of SHR to 0 to ULT/UGT if >1 usePeter Collingbourne
This is a small code size optimization that lets us avoid both shifting and comparing to a constant if we need the shifted value anyway. On most architectures the zero comparison is cheaper than a constant comparison (or free if the shift sets flags). Although this change appears to remove the optimization entirely, we continue to do this transform if there is one use because of the code below the removed code that transforms the shift into an and, followed by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that transforms the and into a ult/ugt. Added a test case to verify this explicitly. Per [1] reduces clang .text size by 0.09% and dynamic instruction count by 0.01%. [1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text Reviewers: nikic, dtcxzyw Reviewed By: dtcxzyw Pull Request: https://github.com/llvm/llvm-project/pull/168007
2025-11-17[MemProf] Fixup edges for largest N cold contexts (#167599)Teresa Johnson
We build the callsite graph by first adding nodes and edges for all allocation contexts, then match the interior callsite nodes onto actual calls (IR or summary), which due to inlining may result in the generation of new nodes representing the inlined context sequence. We attempt to update edges correctly during this process, but in the case of recursion this becomes impossible to always get correct. Specifically, when creating new inlined sequence nodes for stack ids on recursive cycles we can't always update correctly, because we have lost the original ordering of the context. This PR introduces a mechanism, guarded by -memprof-top-n-important= flag, to keep track of extra information for the largest N cold contexts. Another flag -memprof-fixup-important (enabled by default) will perform more expensive fixup of the edges for those largest N cold contexts, by saving and walking the original ordered list of stack ids from the context.
2025-11-17[LV] Add test with existing noalias metadata and runtime checks.Florian Hahn
Add test where we have loads with existing noalias metadata and noalias metadata gets added by loop versioning.
2025-11-17Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)Ramkumar Ramachandra
Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17[VPlan] Fix LastActiveLane assertion on scalar VF (#167897)Luke Lau
For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: <x1> vector loop: { vector.body: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next> vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0> EMIT vp<%6> = icmp ule vp<%5>, vp<%3> EMIT vp<%index.next> = add nuw vp<%4>, vp<%1> EMIT branch-on-count vp<%index.next>, vp<%2> No successors } Successor(s): middle.block middle.block: EMIT vp<%8> = last-active-lane vp<%6> EMIT vp<%9> = extract-lane vp<%8>, vp<%5> Successor(s): ir-bb<exit> The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes #167813
2025-11-17[AArch64] Allow forcing unrolling of small loops (#167488)Vladi Krapp
- Introduce the -aarch64-force-unroll-threshold option; when a loop’s cost is below this value we set UP.Force = true (default 0 keeps current behaviour) - Add an AArch64 loop-unroll regression test that runs once at the default threshold and once with the flag raised, confirming forced unrolling
2025-11-16[SLP]Do not consider split nodes, when checking parent PHI-based nodesAlexey Bataev
The compiler should not consider split vectorize nodes, when checking for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be considered, they only can be considered as explicit users, split nodes are not. Fixes #168268