summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/LoopUtils.cpp
AgeCommit message (Collapse)Author
2025-11-03[LoopUnroll] Fix assert fail on zeroed branch weights (#165938)Joel E. Denny
BranchProbability fails an assert when its denominator is zero. Reported at <https://github.com/llvm/llvm-project/pull/159163#pullrequestreview-3406318423>.
2025-10-31[LoopUnroll] Fix block frequencies for epilogue (#159163)Joel E. Denny
As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch fixes branch weights in tests that PR #157754 adversely affected.
2025-09-18[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)Florian Hahn
After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-11[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)Joel E. Denny
This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As the RFC explains, that metadata enables future patches, such as PR #128785, to fix block frequency issues without losing estimated trip counts.
2025-09-09[LoopUtils] Simplify expanded RT-checks (#157518)Ramkumar Ramachandra
Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so that other callers (LoopDistribute and LoopVersioning) can benefit from the patch.
2025-08-24[VectorCombine] New folding pattern for extract/binop/shuffle chains (#145232)Rajveer Singh Bharadwaj
Resolves #144654 Part of #143088 This adds a new `foldShuffleChainsToReduce` for horizontal reduction of patterns like: ```llvm define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 { %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison> %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1) %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3) %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5) %7 = extractelement <8 x i16> %6, i64 0 ret i16 %7 } ``` ...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call. Similar transformation for other ops when costs permit to do so.
2025-08-12[LV] Create in-loop sub reductions (#147026)Sam Tebbs
This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513
2025-07-31Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585)Joel E. Denny
Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31Revert "[Utils] Fix a warning"Joel E. Denny
This reverts commit 3a18fe33f0763cd9276c99c276448412100f6270. So that we can revert PR #148758.
2025-07-31[Utils] Fix a warningKazu Hirata
This patch fixes: llvm/lib/Transforms/Utils/LoopUtils.cpp:818:28: error: unused function 'operator<<' [-Werror,-Wunused-function]
2025-07-31[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758)Joel E. Denny
This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.
2025-07-18[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)Florian Hahn
Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, *exit the vector loop, *compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-04[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)Austin
Use llvm::fill instead of std::fill
2025-06-29[LV] Add support for cmp reductions with decreasing IVs. (#140451)Florian Hahn
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y) reductions where one of x or y is a decreasing induction, producing a SMin reduction. It uses signed max as sentinel value. PR: https://github.com/llvm/llvm-project/pull/140451
2025-06-23[LV] Extend FindLastIV to unsigned case (#141752)Ramkumar Ramachandra
Split the FindLastIV RecurKind into SMax and UMax variants, depending on the reduction op produced.
2025-06-12[LV] Simplify creation of vp.load/vp.store/vp.reduce intrinsics (#143804)Philip Reames
The use of VectorBuilder here was simply obscuring what was actually going on. For vp.load and vp.store, the resulting code is significantly more idiomatic. For the vp.reduce cases, we remove several layers of indirection, including passing parameters via implicit state on the builder. In both cases, the code is significantly easier to follow.
2025-06-11Reapply 76197ea6f91f after removing an assertionJeremy Morse
Specifically this is the assertion in BasicBlock.cpp. Now that we're not examining or setting that flag consistently (because it'll be deleted in about an hour) there's no need to keep this assertion. Original commit title: [DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
2025-06-11Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths ↵Jeremy Morse
(#143451)" This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61. /me squints -- this is hitting an assertion I thought had been deleted, will revert and investigate for a bit.
2025-06-11[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)Jeremy Morse
These are opportunistic deletions as more places that make use of the IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code now that `IsNewDbgInfoFormat` should be true everywhere. FastISel: we don't need to do debug-aware instruction counting any more, because there are no debug instructions, Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to records DIBuilder: Delete the code for creating debug intrinsics (!) LoopUtils: No need to handle debug instructions, they don't exist
2025-06-10[llvm] annotate interfaces in llvm/Transforms for DLL export (#143413)Andrew Rogers
## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Transforms` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). The bulk of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The following manual adjustments were also applied after running IDS on Linux: - Removed a redundant `operator<<` from Attributor.h. IDS only auto-annotates the 1st declaration, and the 2nd declaration being un-annotated resulted in an "inconsistent linkage" error on Windows when building LLVM as a DLL. - `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and remove the local declaration of the `vfs::FileSystem` class. This is required because exporting the `PGOInstrumentationUse` constructor requires the class be fully defined because it is used by an argument. - Add #include "llvm/Support/Compiler.h" to files where it was not auto-added by IDS due to no pre-existing block of include statements. - Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported instantiated templates. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-05-28[LoopUtils] Pass sentinel value directly to createFindLastIVRed (NFC).Florian Hahn
Now that there is only a single FindLastIV recurrence kind, simply pass the sentinel value instead of the full recurrence descriptor to tighten the interface.
2025-05-28[LoopUtils] Pass start value directly to createAnyOfReduction (NFC).Florian Hahn
Now that there is only a single AnyOf recurrence kind, simply pass the start value instead of the full recurrence descriptor, to tighten the interface.
2025-04-28[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335)Florian Hahn
Add a new reduction recurrence kind for reductions with minimumnum/maximumnum. Such reductions can be vectorized without nsz/nnans, same as reductions with maximum/minimum intrinsics. Note that a new reduction kind is needed to make sure partial reductions are also combined with minimumnum/maximumnum. Note that the final reduction to a scalar value is performed with vector.reduce.fmin/fmax. This should be fine, as the results of the partial reductions with maximumnum/minimumnum silences any sNaNs. In-loop and reductions in SLP are not supported yet, as there's no reduction version of maximumnum/minimumnum yet and fmax may be incorrect. PR: https://github.com/llvm/llvm-project/pull/137335
2025-03-27[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)Florian Hahn
Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-23[llvm] Use range constructors for *Set (NFC) (#132636)Kazu Hirata
2025-03-22[llvm] Use *Set::insert_range (NFC) (#132591)Kazu Hirata
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-20[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017)Luke Lau
The other createSimpleReduction takes the FMFs from the IRBuilder, so this aligns the VectorBuilder variant to do the same and reduce the possibility of there being a mismatch in flags.
2025-03-19[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. ↵Luke Lau
NFC (#132014) Split off from #131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.
2025-03-18[VPlan] Remove createReduction. NFCI (#131336)Luke Lau
This is split off from #131300. A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so when it calls createReduction it always calls createSimpleReduction. If we replace the call then it leaves createReduction with one user in VPInstruction::ComputeReductionResult, which we can inline and then remove.
2025-03-06[LoopUtils] Rename a var in addDiffRuntimeChecks (NFC) (#130128)Ramkumar Ramachandra
2025-03-05[LoopUtils] Saturate at INT_MAX when estimating TC (#129683)Ramkumar Ramachandra
getLoopEstimatedTripCount returns std::nullopt when the trip count would overflow the return type, but since it is an estimate anyway, we might as well saturate at UINT_MAX, improving results.
2025-03-04[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080)Ramkumar Ramachandra
getLoopEstimatedTripCount returns the trip count based on profiling data, and its documentation says that it could return 0 when the trip count is zero, but this is not the case: a valid trip count can never be zero, and it returns 0 when the unsigned ExitCount is incremented by 1 and wraps. Some callers are careful about checking for this zero value in an std::optional, but it makes for an API with footguns, as a std::optional return value indicates that a non-nullopt value would be a valid trip count. Fix this by explicitly returning std::nullopt when the return value would wrap, and strip additional checks in callers. This also fixes a minor bug in LoopVectorize.
2025-02-22[VectorCombine] Fold binary op of reductions. (#121567)Mikhail Gudim
Replace binary of of two reductions with one reduction of the binary op applied to vectors. For example: ``` %v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0) %v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1) %res = add i32 %v0_red, %v1_red ``` gets transformed to: ``` %1 = add <16 x i32> %v0, %v1 %res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1) ```
2024-12-12[LoopVectorize] Vectorize select-cmp reduction pattern for increasing ↵Mel Chen
integer induction variable (#67812) Consider the following loop: ``` int rdx = init; for (int i = 0; i < n; ++i) rdx = (a[i] > b[i]) ? i : rdx; ``` We can vectorize this loop if `i` is an increasing induction variable. The final reduced value will be the maximum of `i` that the condition `a[i] > b[i]` is satisfied, or the start value `init`. This patch added new RecurKind enums - IFindLastIV and FFindLastIV. --------- Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-09-29[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc (#107190)Joshua Cao
The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over.
2024-09-04Consolidate all IR logic for getting the identity value of a reduction [nfc]Philip Reames
This change merges the three different places (at the IR layer) for finding the identity value of a reduction into a single copy. This depends on several prior commits which fix ommissions and bugs in the distinct copies, but this patch itself should be fully non-functional. As the new comments and naming try to make clear, the identity value is a property of the @llvm.vector.reduce.* intrinsic, not of e.g. the recurrence descriptor. (We still provide an interface for clients using recurrence descriptors, but the implementation simply translates to the intrinsic which each corresponds to.) As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum or fminimum/fmaximum which is why we still need manual logic (but at least only one copy of manual logic) for those cases.
2024-09-03Remove "Target" from createXReduction naming [nfc]Philip Reames
Despite the stale comments, none of these actually use TTI, and they're solely generating standard LLVM IR.
2024-09-03Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)Philip Reames
This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.
2024-08-30Reuse getBinOpIdentity in createAnyOfTargetReduction [nfc]Philip Reames
Consolidating code so that we have one copy instead of multiple reasoning about identity element. Note that we're (deliberately) not passing the FMF flags to common utility to preserve behavior in this change.
2024-08-30Restructure createSimpleTargetReduction to match VP path [NFC]Philip Reames
Reduces code significantly, but more importantly makes it obvious that this variant matches the VP variant just below.
2024-08-02[llvm] Make InstSimplifyFolder constructor explicit (NFC) (#101654)Sergei Barannikov
2024-07-25[VP] Refactor VectorBuilder to avoid layering violation. NFC (#99276)Mel Chen
This patch refactors the handling of reduction to eliminate layering violations. * Introduced `getReductionIntrinsicID` in LoopUtils.h for mapping recurrence kinds to llvm.vector.reduce.* intrinsic IDs. * Updated `VectorBuilder::createSimpleTargetReduction` to accept llvm.vector.reduce.* intrinsic directly. * New function `VPIntrinsic::getForIntrinsic` for mapping intrinsic ID to the same functional VP intrinsic ID.
2024-07-17[TTI][WebAssembly] Pairwise reduction expansion (#93948)Sam Parker
WebAssembly doesn't support horizontal operations nor does it have a way of expressing fast-math or reassoc flags, so runtimes are currently unable to use pairwise operations when generating code from the existing shuffle patterns. This patch allows the backend to select which, arbitary, shuffle pattern to be used per reduction intrinsic. The default behaviour is the same as the existing, which is by splitting the vector into a top and bottom half. The other pattern introduced is for a pairwise shuffle. WebAssembly enables pairwise reductions for int/fp add/sub.
2024-07-16[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)Mel Chen
Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-04[LV] Apply loop guards when checking recur during hoisting RT checks.Florian Hahn
Apply loop guards when checking if the recurrence is non-negative in cases where runtime checks are hoisted out of an inner loop.
2024-06-04[LoopUtils] Simplify code for runtime check generation a bit (NFCI).Florian Hahn
Store getSE result in variable to re-use and use structured bindings when looping over bounds.
2024-05-22Revert "[indvars] Missing variables at Og (#88270)" (#93016)Carlos Alberto Enciso
This reverts commit 89e1f7784be40bea96d5e65919ce8d34151c1d69. https://github.com/llvm/llvm-project/pull/88270#discussion_r1609559724 https://github.com/llvm/llvm-project/pull/88270#discussion_r1609552972 Main concerns from @nikic are the interaction between the 'IndVars' and 'LoopDeletion' passes, increasing build times and adding extra complexity.
2024-05-22[indvars] Missing variables at Og (#88270)Carlos Alberto Enciso
https://bugs.llvm.org/show_bug.cgi?id=51735 https://github.com/llvm/llvm-project/issues/51077 In the given test case: ``` 4 ... 5 void bar() { 6 int End = 777; 7 int Index = 27; 8 char Var = 1; 9 for (; Index < End; ++Index) 10 ; 11 nop(Index); 12 } 13 ... ``` Missing local variable `Index` after loop `Induction Variable Elimination`. When adding a breakpoint at line `11`, LLDB does not have information on the variable. But it has info on `Var` and `End`.
2024-05-04[Transforms] Use StringRef::operator== instead of StringRef::equals (NFC) ↵Kazu Hirata
(#91072) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 31 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".