summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
AgeCommit message (Collapse)Author
2025-11-19[LoopInterchange] Don't consider loops with BTC=0 (#167113)Sjoerd Meijer
Do not consider loops with a zero backedge taken count as candidates for interchange. This seems like a sensible thing because it suggests the loop doesn't execute and there is no point in interchanging. As a bonus, this seems to avoid triggering an assert about phis and their uses from source code, so this is a partial fix for #163954 but it needs more work to properly fix that.
2025-10-20[LoopInterchange] Add simplifyLCSSAPhis: remove phi from non-exit bb (#160889)Sjoerd Meijer
This deals with a corner case of LCSSA phi nodes in the outer loop latch block: the loop was in LCSSA form, some transformations can come along (e.g. unswitch) and create an empty block: BB4: br label %BB5 BB5: %old.cond.lcssa = phi i16 [ %cond, %BB4 ] br outer.header Interchange then brings it in LCSSA form again and we get: BB4: %new.cond.lcssa = phi i16 [ %cond, %BB3 ] br label %BB5 BB5: %old.cond.lcssa = phi i16 [ %new.cond.lcssa, %BB4 ] Which means that we have a chain of LCSSA phi nodes from %new.cond.lcssa to %old.cond.lcssa. The problem is that interchange can reoder blocks BB4 and BB5 placing the use before the def if we don't check this. The solution is to simplify lcssa phis, and remove them from non-exit blocks if they are 1-input phi nodes. Fixes #160068
2025-09-25[LoopInterchange] Bail out when finding a dependency with all `*` elements ↵Ryotaro Kasuga
(#149049) If a direction vector with all `*` elements, like `[* * *]`, is present, it indicates that none of the loop pairs are legal to interchange. In such cases, continuing the analysis is meaningless. This patch introduces a check to detect such direction vectors and exits early when one is found. This slightly reduces compile time.
2025-07-25[LoopInterchange] Consider forward/backward dependency in vectorize ↵Ryotaro Kasuga
heuristic (#133672) The vectorization heuristic of LoopInterchange attempts to move a vectorizable loop to the innermost position. Before this patch, a loop was deemed vectorizable if there are no loop-carried dependencies induced by the loop. This patch extends the vectorization heuristic by introducing the concept of forward and backward dependencies, inspired by LoopAccessAnalysis. Specifically, an additional element is appended to each direction vector to indicate whether it represents a forward dependency (`<`) or not (`*`). Among these, only the forward dependencies (i.e., those whose last element is `<`) affect the vectorization heuristic. Accordingly, the check is conservative, and dependencies are considered forward only when this can be proven. Currently, we only support perfectly nested loops whose body consists of a single basic block. For other cases, dependencies are pessimistically treated as non-forward.
2025-07-18[LoopInterchange] Ignore the cost-model, force interchange if legal (#148858)Sjoerd Meijer
This is and has been proven useful for testing purposes, to get more test coverage.
2025-07-15[Scalar] Fix a warningKazu Hirata
This patch fixes: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:863:20: error: unused variable 'OpCode' [-Werror,-Wunused-variable]
2025-07-15[LoopInterchange] Drop nuw/nsw flags from reduction ops when interchanging ↵Ryotaro Kasuga
(#148612) Before this patch, when a reduction exists in the loop, the legality check of LoopInterchange only verified if there exists a non-reassociative floating-point instruction in the reduction calculation. However, it is insufficient, because reordering integer reductions can also lead to incorrect transformations. Consider the following example: ```c int A[2][2] = { { INT_MAX, INT_MAX }, { INT_MIN, INT_MIN }, }; int sum = 0; for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) sum += A[j][i]; ``` To make this exchange legal, we must drop nuw/nsw flags from the instructions involved in the reduction operations. This patch extends the legality check to correctly handle such cases. In particular, for integer addition and multiplication, it verifies that the nsw and nuw flags are set on involved instructions, and drop them when the transformation actually performed. This patch also introduces explicit checks for the kind of reduction and permits only those that are known to be safe for interchange. Consequently, some "unknown" reductions (at the moment, `FindFirst*` and `FindLast*`) are rejected. Fix #148228
2025-07-08[LoopInterchange] Defer CacheCost calculation until needed (#146874)Ryotaro Kasuga
LoopInterchange currently stop and exit its process when the number of loads/stores in the loop is greater than `MaxMemInstrCount`. This prevents excessive querying to DependenceAnalysis. However, computing `CacheCost` also involves DependenceAnalysis queries, and their number can grow to `O(N^2)` in the worst case, where `N` is the number of loads/stores in the loop. Therefore, we should also avoid calculating it if the loads/stores count exceeds `MaxMemInstrCount`. This patch defers the calculation of `CacheCost` until it is actually needed to reduce compile time. This avoids computing `CacheCost` when the number of loads/stores is large. Additionally, since this patch delays its calculation as much as possible, it is also effective in other scenarios, e.g., when there are no legal loop pairs to exchange.
2025-06-28[LoopInterchange] Modernize loops (NFC) (#146105)Ramkumar Ramachandra
2025-06-27[LoopInterchange] Use ArrayRef in more places (NFC) (#146077)Ramkumar Ramachandra
2025-06-25[Transforms] Use range-based for loops (NFC) (#145252)Kazu Hirata
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-05[LoopInterchange] Handle confused dependence correctly (#140709)Ryotaro Kasuga
This patch fixes the handling of a confused `Dependence` object. Such an object doesn’t contain any information about dependencies, so we must process it conservatively. However, it was converted into a direction vector like `[I I ... I]`. As a result, it was treated as if there are no loop-carried dependencies, which can lead to illegal loop exchanges. Fixes #140238
2025-05-26[Scalar] Use llvm::count (NFC) (#141445)Kazu Hirata
2025-05-13[LoopInterchange] Relax the legality check to accept more patterns (#139690)Ryotaro Kasuga
When proving the legality of exchanging two loops, it doesn't need to check the elements of the direction vectors associated with the loops outside of the two target loops. Before this patch, the legality check looked at all elements of a direction vector to calculate the lexicographically order of the vector, which may reject some legal exchanges. For example, if a direction vector is `[* < =]`, it is safe to swap the last two loops because the corresponding subsequence of the vector (`[< =]`) is lexicographically positive for both before and after the exchange. However, the its order is unknown if we don't drop the prefix since the first element is `*`. This patch improves the logic of legality check to ignore such unrelated prefixes of direction vectors.
2025-05-13[LoopInterchange] Skip legality check if surrounding loops already guarantee ↵Ryotaro Kasuga
dependency (NFC) (#139254) The legality check in LoopInterchange allows two loops to be exchanged if all direction vectors are lexicographically positive (or zero) for both before and after the swap. The current implementation performs this routine naively. However, if a direction vector is lexicographically positive due to an element corresponding to a loop that is outside the given two loops (i.e., if there is an element `<` before the loops we are trying to interchange), then obviously it is also positive after exchanging them. For example, for a direction vector `[< < >]`, swapping the last two elements doesn't make it lexicographically negative because the first element is `<`. This patch adds a code to skip legality check if surrounding loops already guarantee that the direction vector is lexicographically positive. Note that this is only a small improvement on its own, but it's necessary to relax the legality check I'm working on. Split off from #118267 --------- Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
2025-05-04[Transforms] Remove unused local variables (NFC) (#138442)Kazu Hirata
2025-04-18[Transforms] Construct SmallVector with iterator ranges (NFC) (#136259)Kazu Hirata
2025-04-03[LoopInterchange] Fix the vectorizable check for a loop (#133667)Ryotaro Kasuga
In the profitability check for vectorization, the dependency matrix was not handled correctly. This can result to make a wrong decision: It may say "this loop can be vectorized" when in fact it cannot. The root cause of this is that the check process early returns when it finds '=' or 'I' in the dependency matrix. To make sure that we can actually vectorize the loop, we need to check all the rows of the matrix. This patch fixes the process of checking whether we can vectorize the loop or not. Now it won't make a wrong decision for a loop that cannot be vectorized. Related: #131130
2025-04-02[LoopInterchange] Add an option to control the cost heuristics applied (#133664)Ryotaro Kasuga
LoopInterchange has several heuristic functions to determine if exchanging two loops is profitable or not. Whether or not to use each heuristic and the order in which to use them were fixed, but #125830 allows them to be changed internally at will. This patch adds a new option to control them via the compiler option. The previous patch also added an option to prioritize the vectorization heuristic. This patch also removes it to avoid conflicts between it and the newly introduced one, e.g., both `-loop-interchange-prioritize-vectorization=1` and `-loop-interchange-profitabilities='cache,vectorization'` are specified.
2025-03-21[LoopInterchange] Prevent from undoing its own transformation (#127473)Ryotaro Kasuga
LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops `A` and `B`, it calls `isProfitable(A, B)` to determine whether these loops should be exchanged. The result of `isProfitable(A, B)` should be conservative, that is, it should return true only when we are sure exchanging `A` and `B` will improve performance. If it's not conservative enough, it's possible that a subsequent `isProfitable(B, A)` will also return true, in which case LoopInterchange will undo its previous transformation. To avoid such cases, `isProfitable(B, A)` must not return true if `isProfitable(A, B)` returned true in the past. However, the current implementation can be in such a situation. This patch resolves it by correcting the handling of two loops that have the same cache cost. This resolve the problem mentioned in https://github.com/llvm/llvm-project/pull/118267#issuecomment-2510759354.
2025-03-21[LoopInterchange] Add an option to prioritize vectorization (#131988)Ryotaro Kasuga
The LoopInterchange cost-model consists of several decision rules. They are called one by one, and if some rule can determine the profitability, then the subsequent rules aren't called. In the current implementation, the rule for `CacheCostAnalysis` is called first, and if it fails to determine the profitability, then the rule for vectorization is called. However, there are cases where interchanging loops for vectorization makes the code faster even if such exchanges are detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with `-O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64`), even though it's rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot exchange these loops due to legality checks. This should also be improved.) ```c __attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256], dd[256][256],ee[256][256],ff[256][256]; // Alternative of TSVC s231 with more array accesses than the original. void s231_alternative() { for (int nl = 0; nl < 100*(100000/256); nl++) { for (int i = 0; i < 256; ++i) { for (int j = 1; j < 256; j++) { aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j] + dd[i][j] + ee[i][j] + ff[i][j]; } } } } ``` This patch introduces a new option to prioritize the vectorization rule over the cache cost rule. Related issue: #131130 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-03-12[Transforms] Avoid repeated hash lookups (NFC) (#130890)Kazu Hirata
2025-03-04[LoopInterchange] Move some processes to another function (NFC) (#129514)Ryotaro Kasuga
Some post-processing involved in exchanging a pair of loops has been done separately from `processLoop`, which is a main function that does the transformation. It's better to consolidate these processes into the same function. This patch is a preparation of #127474.
2025-02-11[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615)Alireza Torabian
Parameter PossiblyLoopIndependent has lost its intended purpose. This flag is always set to true in all cases when depends() is called, hence we want to reconsider the utility of this variable and remove it from the function signature entirely. This is an NFC patch.
2025-02-05[LoopInterchange] Hoist isComputableLoopNest() in the control flow (#124247)Madhur Amilkanthwar
The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.
2025-01-29[LoopInterchange] Handle LE and GE correctly (#124901)Ryotaro Kasuga
LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve #123920
2025-01-24[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites ↵Jeremy Morse
(#123737) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)Jeremy Morse
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).
2025-01-23[LoopInterchange] Constrain LI within supported loop nest depth (#118656)Madhur Amilkanthwar
This patch is an extension to #115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.
2025-01-21[LoopInterchange] Constrain number of load/stores in a loop (#118973)Madhur Amilkanthwar
In the current state of the code, the transform computes entries for the dependency matrix until `MaxMemInstrCount` which is 100. After 99th entry, it terminates and thus overall wastes compile-time. It would be nice if we can compute total number of entries upfront and early exit if the number of entries > 100. However, computing the number of entries is not always possible as it depends on two factors: 1. Number of load-store pairs in a loop. 2. Number of common loop levels for each of the pair. This patch constrains the whole computation on the number of loads and stores instructions in the loop. In another approach, I experimented with computing 1 and constraining the number of pairs, but that did not lead to any additional benefit in terms of compile time. However, when other issues are fixed, I can revisit this approach.
2025-01-20[LoopInterchange] Remove 'S' Scalar Dependencies (#119345)Sjoerd Meijer
We are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that: [LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176 [LoopInterchange] Interchange breaks program correctness #46867 [LoopInterchange] Loops should not interchanged due to dependencies #47259 [LoopInterchange] Loops should not interchanged due to control flow #47401 This patch does no longer insert the "S" dependency/direction into the dependency matrix, so a dependency is never "S". We seem to have forgotten what the exact meaning is of this dependency type, and don't see why it should be treated differently. We prefer correctness over incorrect and more aggressive results. I.e., this prevents the miscompiles at the expense of handling less cases, i.e. making interchange more pessimistic. However, some of the cases that are now rejected for dependence analysis reasons, were rejected before too but for other reasons (e.g. profitability). So at least for the llvm regression tests, the number of regression are very reasonable. This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later see if we can get solve the regressions.
2024-11-19[LoopInterchange] Make the entries of the Dependency Matrix unique (#116195)Sjoerd Meijer
The entries in the dependency matrix can contain a lot of duplicates, which is unnecessary and results in more checks that we can avoid, and this patch adds that.
2024-11-19[LoopInterchange] Bail out early if minimum loop nest is not met (#115128)Madhur Amilkanthwar
This patch bails out early if minimum depth is not met. As it stands today, the pass computes CacheCost before it attempts to do the transform. This is not needed if minimum depth is not met. This handles basic cases where depth is typically 1. As the patch avoids unnecessary computation, it is aimed to improve compile-time.
2024-11-02[Scalar] Remove unused includes (NFC) (#114645)Kazu Hirata
Identified with misc-include-cleaner.
2024-08-03[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851)Kazu Hirata
2024-04-12Fix typos (#88565)Victor Toni
2023-09-07[NFC][RemoveDIs] Create a new spelling of the moveBefore methodJeremy Morse
As outlined in my proposal of how to get rid of debug intrinsics, this patch adds a moveBefore method that signals the caller /intends/ the order of moved instructions is to stay the same. This semantic difference has an effect on debug-info, as it signals whether debug-info needs to move with instructions or not. The patch just replaces a few calls to moveBefore with calls to moveBeforePreserving -- and the latter just calls the former, so it's all NFC right now. A future patch will add an implementation of moveBeforePreserving that takes action to correctly preserve debug-info, but that's tightly coupled with our non-instruction debug-info representation that's still being reviewed. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D156369
2023-06-05Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"Nikita Popov
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7. In preparation for reverting a dependent revision.
2023-05-25[SCEVExpander] Remember phi nodes inserted by LCSSA constructionNikita Popov
SCEVExpander keeps track of all instructions it inserted. However, it currently misses some phi nodes created during LCSSA construction. Fix this by collecting these into another argument. This also removes the IRBuilder argument, which was added for essentially the same purpose, but only handles the root LCSSA nodes, not those inserted by SSAUpdater. This was reported as a regression on D149344, but the reduced test case also reproduces without it. Differential Revision: https://reviews.llvm.org/D150681
2023-05-02[LCSSA] Remove unused ScalarEvolution argument (NFC)Nikita Popov
After D149435, LCSSA formation no longer needs access to ScalarEvolution, so remove the argument from the utilities.
2023-04-17Remove several no longer needed includes. NFCIBjorn Pettersson
Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.
2023-04-16[Scalar] Use range-based for loops (NFC)Kazu Hirata
2023-03-14[Transforms] Use *{Set,Map}::contains (NFC)Kazu Hirata
2023-03-03[LoopInterchange] Remove unused RecurrenceDescriptor object. NFCCraig Topper
2023-02-15[LoopInterchange] Remove legacy pass (unused in the pipeline)Fangrui Song
Following recent changes to remove non-core legacy passes.
2023-01-16[LoopInterchange] Correcting the profitability checkRam-NK
Before D135808, There would be endless loop interchange posibility (no proper priority was there in profitability check. Any profitable check may leads to loop-interchange). With this patch, there is no endless interchange (priority in profitable check is defined. Order of decision is 'Cache cost' check, 'InstrOrderCost', 'Vectorization'). Corrected the dependency checking inside isProfitableForVectorization(), corrected the checking of bad order loops in isProfitablePerInstrOrderCost(). Reviewed By: Meinersbur, bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D135808
2023-01-12[NFC][LoopFlatten][LoopInterchange] Do not explicitly forget subloopsJoshua Cao
We don't need to explicitly forget subloops because forgetting parent loops will automatically forget their subloops Differential Revision: https://reviews.llvm.org/D141029
2022-12-01[NFC] Cleanup: Replaces BB->getInstList().splice() with BB->splice().Vasileios Porpodas
This is part of a series of cleanup patches towards making BasicBlock::getInstList() private. Differential Revision: https://reviews.llvm.org/D138979
2022-11-17[LoopInterchange] Refactor and rewrite validDepInterchange()Mengxuan Cai
The current code of validDepInterchange() enumerates cases that are legal for interchange. This could be simplified by checking lexicographically order of the swapped direction matrix. Reviewed By: congzhe, Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D137461
2022-11-04[LoopInterchange] Check phis in all subloopsCongzhe Cao
This is the bugfix to the miscompile mentioned in https://reviews.llvm.org/D132055#3814831. The IR that reproduced the bug is added as the test case in this patch. What this patch does is that, during legality phase instead of checking the phi nodes only in `InnerLoop` and `OuterLoop`, we check phi nodes in all subloops of the `OuterLoop`. Suppose if the loop nest is triply nested, and `InnerLoop` and `OuterLoop` is the middle loop and the outermost loop respectively, we'll check phi nodes in the innermost loop as well, in addition to the ones in the middle and outermost loops. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134930