summaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineLICM.cpp
AgeCommit message (Collapse)Author
2025-11-16[CodeGen] Turn MCRegUnit into an enum class (NFC) (#167943)Sergei Barannikov
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned` and inserts necessary casts. The added `MCRegUnitToIndex` functor is used with `SparseSet`, `SparseMultiSet` and `IndexedMap` in a few places. `MCRegUnit` is opaque to users, so it didn't seem worth making it a full-fledged class like `Register`. Static type checking has detected one issue in `PrologueEpilogueInserter.cpp`, where `BitVector` created for `MCRegister` is indexed by both `MCRegister` and `MCRegUnit`. The number of casts could be reduced by using `IndexedMap` in more places and/or adding a `BitVector` adaptor, but the number of casts *per file* is still small and `IndexedMap` has limitations, so it didn't seem worth the effort. Pull Request: https://github.com/llvm/llvm-project/pull/167943
2025-11-10CodeGen: Remove TRI argument from getRegClass (#158225)Matt Arsenault
TargetInstrInfo now directly holds a reference to TargetRegisterInfo and does not need TRI passed in anywhere.
2025-10-22[MachineLICM] Use structured bindings for reg pressure cost map. NFC (#164368)Luke Lau
2025-09-24[TII] Split isTrivialReMaterializable into two versions [nfc] (#160377)Philip Reames
This change builds on https://github.com/llvm/llvm-project/pull/160319 which tries to clarify which *callers* (not backends) assume that the result is actually trivial. This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic. This change does *not* enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-09-12CodeGen: Remove MachineFunction argument from getRegClass (#158188)Matt Arsenault
This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.
2025-07-23[llvm] Remove unused includes (NFC) (#150265)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-07-09MachineLICM: Merge logic for implicit and explicit definitions.Peter Collingbourne
Anatoly Trosinenko found that when hasSideEffect was set to 0 in the definition of LOADgotAUTH, MultiSource/Benchmarks/Ptrdist/ks/ks test from llvm-test-suite started to crash. The issue was traced down to MachineLICM pass placing LOADgotAUTH right after an unrelated copy to x16 like rewriting this code: ```` bb.0: renamable $x16 = COPY renamable $x12 B %bb.1 bb.1: ... /* use $x16 */ ... renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv /* use $x20 */ ... ```` like the following: ```` bb.0: renamable $x16 = COPY renamable $x12 renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv B %bb.1 bb.1: ... /* use $x16 */ ... /* use $x20 */ ... ``` The issue was caused by inconsistent logic between implicit and explicit operand definitions, where the implicit side was incorrectly skipping checking RUDefs for dead operands, leading to RuledOut not being set for the X16 operand. Because there isn't really a semantic difference between implicit and explicit operands at this point, let's remove the isImplicit check and adjust the logic to do the same thing in both cases: - For implicit operands, we now check and update RUDefs in the same way as explicit operands. - For explicit operands, we now allow dead operands to be skipped. Reviewers: arsenm, s-barannikov, atrosinenko Reviewed By: arsenm, s-barannikov Pull Request: https://github.com/llvm/llvm-project/pull/147624
2025-07-05[MachineLICM] Let targets decide if copy instructions are cheap (#146599)Guy David
When checking whether it is profitable to hoist an instruction, the pass may override a target's ruling because it assumes that all COPY instructions are cheap, and that may not be the case for all micro-architectures (especially for when copying between different register classes). On AArch64 there's 0% difference in performance in LLVM's test-suite with this change. Additionally, very few tests were affected which shows how it is not so useful to keep it. x86 performance is slightly better (but maybe that's just noise) for an A/B comparison consisting of five iterations on LLVM's test suite (Ryzen 5950X on Ubuntu): ``` $ ./utils/compare.py build-a/results* vs build-b/results* --lhs-name base --rhs-name patch --absolute-diff Tests: 3341 Metric: exec_time Program exec_time base patch diff LoopVector...meChecks4PointersDBeforeA/1000 824613.68 825394.06 780.38 LoopVector...timeChecks4PointersDBeforeA/32 18763.60 19486.02 722.42 LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217 37109.92 37572.52 462.60 LoopVector...ntimeChecks4PointersDAfterA/32 14211.35 14562.14 350.79 LoopVector...timeChecks4PointersDEqualsA/32 14221.44 14562.85 341.40 LoopVector...intersAllDisjointIncreasing/32 14222.73 14562.20 339.47 LoopVector...intersAllDisjointDecreasing/32 14223.85 14563.17 339.32 LoopVector...nLoopFrom_uint32_t_To_uint8_t_ 739.60 807.45 67.86 harris/har...est:BENCHMARK_HARRIS/2048/2048 15953.77 15998.94 45.17 LoopVector...nLoopFrom_uint8_t_To_uint16_t_ 301.94 331.21 29.27 LCALS/Subs...Raw.test:BM_DISC_ORD_RAW/44217 616.35 637.13 20.78 LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001 3814.95 3833.70 18.75 LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217 812.98 830.64 17.66 LCALS/Subs...test:BM_IMP_HYDRO_2D_RAW/44217 811.26 828.13 16.87 ImageProce...ENCHMARK_BILATERAL_FILTER/64/4 714.77 726.23 11.46 exec_time l/r base patch diff count 3341.000000 3341.000000 3341.000000 mean 903.866450 899.732349 -4.134101 std 20635.900959 20565.289417 115.346928 min 0.000000 0.000000 -3380.455787 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 1.806500 1.836397 0.000100 max 824613.680801 825394.062500 780.381699 ```
2025-07-04[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)Austin
Use llvm::fill instead of std::fill
2025-05-24[CodeGen] Remove unused includes (NFC) (#141320)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-04-26[CodeGen] Use `TRI::regunits()` (NFC) (#137356)Sergei Barannikov
2025-04-25[MachineLICM] Recognize registers clobbered at EH landing pad entry (#122446)Ulrich Weigand
EH landing pad entry implicitly clobbers target-specific exception pointer and exception selector registers. The post-RA MachineLICM pass needs to take these into account when deciding whether to hoist an instruction out of the loop that initializes one of these registers. Fixes: https://github.com/llvm/llvm-project/issues/122315
2025-04-15[MachineLICM] Remove CurPreheader parameter that is always nullptr (#135554)Sergei Barannikov
Also, rename `getCurPreheader` -> `getOrCreatePreheader` to make it clear that this method may alter CFG. Update `Changed` if the method modified a loop by splitting a critical edge (this change is not strictly NFC). The removed parameter was probably intended to save compile time by not trying to a critical edge after the first attempt has failed, but it is only tried once per loop. PR: https://github.com/llvm/llvm-project/pull/135554
2025-04-13[CodeGen] Use llvm::append_range (NFC) (#135567)Kazu Hirata
2025-03-02[MachineLICM] Use Register. NFCCraig Topper
2025-01-13[aarch64][win] Update Called Globals info when updating Call Site info (#122762)Daniel Paoliello
Fixes the "use after poison" issue introduced by #121516 (see <https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>). The root cause of this issue is that #121516 introduced "Called Global" information for call instructions modeling how "Call Site" info is stored in the machine function, HOWEVER it didn't copy the copy/move/erase operations for call site information. The fix is to rename and update the existing copy/move/erase functions so they also take care of Called Global info.
2025-01-10Revert "[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826)"Nikita Popov
This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a. This causes a large compile-time regression.
2025-01-09[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826)Pengcheng Wang
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from https://github.com/llvm/llvm-project/pull/118787
2024-12-13Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" ↵paperchalice
(#119547) This relands commit #115111. Use traditional way to update post dominator tree, i.e. break critical edge splitting into insert, insert, delete sequence. When splitting critical edges, the post dominator tree may change its root node, and `setNewRoot` only works in normal dominator tree... See https://github.com/llvm/llvm-project/blob/6c7e5827eda26990e872eb7c3f0d7866ee3c3171/llvm/include/llvm/Support/GenericDomTree.h#L684-L687
2024-12-11Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)paperchalice
Reverts llvm/llvm-project#115111 Causes #119511
2024-12-11[DomTreeUpdater] Move critical edge splitting code to updater (#115111)paperchalice
Support critical edge splitting in dominator tree updater. Continue the work in #100856. Compile time check: https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
2024-11-21[MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987)Florian Hahn
The improvements in 63917e1 / #70796 do not check for memory barriers/unmodelled sideeffects, which means we may incorrectly hoist loads across memory barriers. Fix this by checking any machine instruction in the loop is a load-fold barrier. PR: https://github.com/llvm/llvm-project/pull/116987
2024-11-21[DebugInfo][InstrRef][MIR][GlobalIsel][MachineLICM] NFC Use std::move to ↵abhishek-kaushik22
avoid copying (#116935)
2024-11-12[CodeGen] Remove unused includes (NFC) (#115996)Kazu Hirata
Identified with misc-include-cleaner.
2024-11-09[Instrumentation] Support `MachineFunction` in `OptNoneInstrumentation` ↵paperchalice
(#115471) Support `MachineFunction` in `OptNoneInstrumentation`, also add `isRequired` to all necessary passes.
2024-11-05[NFC] Use `std::move` to avoid copy (#113080)abhishek-kaushik22
2024-10-25[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573)Gaëtan Bossu
Both are based on MachineLICMBase, and the functionality there is "switched" based on a PreRegAlloc flag. This commit is simply about trusting the original value of that flag, defined by the `MachineLICM` and `EarlyMachineLICM` classes. The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(), which is un-reliable due to how it is inferred by the MIRParser. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong. Note that I'm looking into upstreaming more changes to MachineLICM, see [the discourse thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
2024-09-30[MachineLICM] Avoid repeated hash lookups (NFC) (#110452)Kazu Hirata
2024-09-26[NFC] Reapply 3f37c517f, SmallDenseMap speedupsJeremy Morse
This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
2024-09-25Revert "[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup ↵Jeremy Morse
(#109417)" This reverts commit 3f37c517fbc40531571f8b9f951a8610b4789cd6. Lo and behold, I missed a unit test
2024-09-25[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)Jeremy Morse
If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
2024-09-20[NewPM][CodeGen] Port MachineLICM to NPM (#107376)Akshat Oke
2024-07-26[CodeGen] Remove AA parameter of isSafeToMove (#100691)Pengcheng Wang
This `AA` parameter is not used and for most uses they just pass a nullptr. The use of `AA` was removed since 8d0383e.
2024-07-24[llvm][MachineLICM] Fix a comment typo. NFCJon Roelofs
2024-07-12[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317)paperchalice
- Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.
2024-07-11Revert "[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` ↵Nikita Popov
(#97055)" This reverts commit c5e5088033fed170068d818c54af6862e449b545. Causes large compile-time regressions.
2024-07-11[CodeGen] Remove `applySplitCriticalEdges` in `MachineDominatorTree` (#97055)paperchalice
Summary: - Remove wrappers in `MachineDominatorTree`. - Remove `MachineDominatorTree` update code in `MachineBasicBlock::SplitCriticalEdge`. - Use `MachineDomTreeUpdater` in passes which call `MachineBasicBlock::SplitCriticalEdge` and preserve `MachineDominatorTreeWrapperPass` or CFG analyses. Commit abea99f65a97248974c02a5544eaf25fc4240056 introduced related methods in 2014. Now we have SemiNCA based dominator tree in 2017 and dominator tree updater, the solution adopted here seems a bit outdated.
2024-07-09[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793)paperchalice
- Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-06-20[MachineLICM] Work-around Incomplete RegUnits (#95926)Pierre van Houtryve
Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. #96146 opened to investigate the issue
2024-06-17[CodeGen] Do not include $noreg in any regmask operands. NFCI. (#95775)Jay Foad
Saying that a call preserves $noreg seems weird and required a workaround in MachineLICM.
2024-06-17[MachineLICM] Correctly Apply Register Masks (#95746)Pierre van Houtryve
Fix regression introduced in d4b8b72
2024-06-12[NFC][MachineLICM] Use SmallDenseSet instead of SmallSet (#95201)Pierre van Houtryve
All values are small so no reason to ever use SmallSet really. In large programs we'll end up using std::set which is extremely slow compared to DenseSet. This brings a decent speedup to the pass in large programs.
2024-06-11[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis ↵paperchalice
result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11[CodeGen][MachineLICM] Use RegUnits in HoistRegionPostRA (#94608)Pierre van Houtryve
Those BitVectors get expensive on targets like AMDGPU with thousands of registers, and RegAliasIterator is also expensive. We can move all liveness calculations to use RegUnits instead to speed it up for targets where RegAliasIterator is expensive, like AMDGPU. On targets where RegAliasIterator is cheap, this alternative can be a little more expensive, but I believe the tradeoff is worth it.
2024-05-29[MachineLICM] Hoist copies of constant physical register (#93285)Pengcheng Wang
Previously, we just check if the source is a virtual register and this prevents some potential hoists. We can see some improvements in AArch64/RISCV tests.
2024-05-01MachineLICM: Allow hoisting REG_SEQUENCE (#90638)Matt Arsenault
2024-04-30MachineLICM: Remove unnecessary isReg checksMatt Arsenault
COPY operands are always registers.
2024-02-27[MachineLICM] Hoist COPY instruction only when user can be hoisted (#81735)michaelselehov
befa925acac8fd6a9266e introduced preliminary hoisting of COPY instructions when the user of the COPY is inside the same loop. That optimization appeared to be too aggressive and hoisted too many COPY's greatly increasing register pressure causing performance regressions for AMDGPU target. This is intended to fix the regression by hoisting COPY instruction only if either: - User of COPY can be hoisted (other args are invariant) or - Hoisting COPY doesn't bring high register pressure
2023-11-27[MachineLICM] Fix incorrect CSE on hoisted const load (#73007)Igor Kirillov
When hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop. Fixes https://github.com/llvm/llvm-project/issues/72855
2023-11-20[MachineLICM][AArch64] Hoist COPY instructions with other uses in the loop ↵Rin
(#71403) When there is a COPY instruction in the loop with other uses, we want to hoist the COPY, which in turn leads to the users being hoisted as well. Co-authored-by David Green : David.Green@arm.com