summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Vectorize/VPlan.cpp
AgeCommit message (Collapse)Author
2025-11-15[llvm] Delete pointers without null checks (NFC) (#168183)Kazu Hirata
Identified with readability-delete-null-pointer.
2025-11-11[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC)Florian Hahn
Classof for most recipes directly supports VPValue, so there is no need to call getDefiningRecipe when using isa/cast/dyn_cast.
2025-11-10[VPlan] Use getDefiningRecipe instead of directly accessing Def. (NFC)Florian Hahn
Use getDefiningRecipe to future-proof the code. Split off from https://github.com/llvm/llvm-project/pull/156262 as suggested.
2025-11-08[Vectorize] Remove a redundant declaration (NFC) (#167188)Kazu Hirata
EnableVPlanNativePath is declared in LoopVectorizationPlanner.h. Identified with readability-redundant-declaration.
2025-11-05[VPlan] Strip redundant code in VPTransformState::get (NFC) (#166145)Ramkumar Ramachandra
vputils::isSingleScalar is sufficient.
2025-10-22Revert "[VPlan] Run narrowInterleaveGroups during general VPlan ↵Florian Hahn
optimizations. (#149706)" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.
2025-10-21[VPlan] Clarify naming for helpers to create loop&replicate regions (NFC)Florian Hahn
Split off to clarify naming, as suggested in https://github.com/llvm/llvm-project/pull/156262.
2025-10-21[VPlan] Move two VPBlockUtils members (NFC) (#162507)Ramkumar Ramachandra
2025-10-21[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)Florian Hahn
Move narrowInterleaveGroups to to general VPlan optimization stage. To do so, narrowInterleaveGroups now has to find a suitable VF where all interleave groups are consecutive and saturate the full vector width. If such a VF is found, the original VPlan is split into 2: a) a new clone which contains all VFs of Plan, except VFToOptimize, and b) the original Plan with VFToOptimize as single VF. The original Plan is then optimized. If a new copy for the other VFs has been created, it is returned and the caller has to add it to the list of candidate plans. Together with https://github.com/llvm/llvm-project/pull/149702, this allows to take the narrowed interleave groups into account when computing costs to choose the best VF and interleave count. One example where we currently miss interleaving/unrolling when narrowing interleave groups is https://godbolt.org/z/Yz77zbacz PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-16[VPlan] Improve code around canConstantBeExtended (NFC) (#161652)Ramkumar Ramachandra
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming scaled reductions), introducing m_APInt, and improving code around canConstantBeExtended: we change canConstantBeExtended to take an APInt.
2025-10-13[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721)Ramkumar Ramachandra
2025-10-11[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCostFlorian Hahn
Replication is currently not supported for scalable VFs. Make sure VPReplicateRecipe::computeCost returns an invalid cost early, for scalable VFs if the recipe is not a single-scalar. Note that this moves the existing invalid-costs.ll out of the AArch64 subdirectory, as it does not use a target triple. Fixes https://github.com/llvm/llvm-project/issues/160792.
2025-10-06Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" (#162157) This reverts commit f80c0baf058dbdc5 and 94eade61a02ae5. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-05Revert "Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Alexey Bataev
::computeCost. (#160053)" (#161724)" This reverts commit 8f2466bc72a5ab163621cb1bf4bf53a27f1cefe7 to fix crashes reported in commits
2025-10-02Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" (#161724) This reverts commit f61be4352592639a0903e67a9b5d3ec664ad4d23. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes https://github.com/llvm/llvm-project/issues/161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-02[LV] Support multiplies by constants when forming scaled reductions. (#161092)Florian Hahn
We can create partial reductions for multiplies with constants, if the constant is small enough to be extended from source to destination type w/o changing the value. This only handles constant on the right side of a multiply, relying on other passes to canonicalize the input. Alive2 Proofs: https://alive2.llvm.org/ce/z/iWRMr6 PR: https://github.com/llvm/llvm-project/pull/161092
2025-10-01[VPlan] Remove VPIRPhis in exit blocks when deleting scalar loop BBs.Florian Hahn
DeleteDeadBlocks will remove single-entry phis. Remove them from the exit VPIRBBs in VPlan as well, otherwise we would retain references to deleted IR instructions. Fixes MSan failures after 8907b6d39 https://lab.llvm.org/buildbot/#/builders/164/builds/14013
2025-10-01[VPlan] Remove original loop blocks if dead. (#155497)Florian Hahn
Build on top of https://github.com/llvm/llvm-project/pull/154510 to completely remove the blocks of dead scalar loops. Depends on https://github.com/llvm/llvm-project/pull/154510. PR: https://github.com/llvm/llvm-project/pull/155497
2025-09-30Revert "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77. See https://github.com/llvm/llvm-project/issues/161404 for a crash exposed by the change. Revert while I investigate.
2025-09-29[VPlan] Compute cost of more replicating loads/stores in ::computeCost. ↵Florian Hahn
(#160053) Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
2025-09-28[VPlan] Remove dead code for scalar VFs in VPRegionBlock::cost (NFC).Florian Hahn
The VPlan cost model is not used to compute costs of scalar VFs currently, as conversion to replicate regions makes accurately computing the original scalar cost difficult. Remove left over, dead code.
2025-09-23[LV][EVL] Remove metadata on EVL vectorized loops (#155760)Shih-Po Hung
This patch removes the metadata emission for EVL‑vectorized loops, since there is no current in-tree consumer: 1) after VPlan performs canonical IV replacement #147222 and 2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user of this metadata.
2025-09-18[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)Florian Hahn
After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-13[VPlan] Move logic to compute scalarization overhead to cost helper(NFC)Florian Hahn
Extract the logic to compute the scalarization overhead to a helper for easy re-use in the future.
2025-09-12[VPlan] Explicitly replicate VPInstructions by VF. (#155102)Florian Hahn
Extend replicateByVF added in #142433 (aa240293190) to also explicitly unroll replicating VPInstructions. Now the only remaining case where we replicate for all lanes is VPReplicateRecipes in replicate regions. PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-04[VPlan] Consolidate logic to update loop metadata and profile info.Florian Hahn
This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.
2025-09-01[VPlan] Move runtime check blocks to correct position during exec (NFC).Florian Hahn
Move adjusting the position of completely disconnected IR blocks to VPIRBasicBlock::execute.
2025-09-01[VPlan] Add VPBlockBase::hasPredecessors (NFC).Florian Hahn
Split off from https://github.com/llvm/llvm-project/pull/154510/, add helper to check if a block has any predecessors.
2025-08-26[VPlan] Improve style around container-inserts (NFC) (#155174)Ramkumar Ramachandra
2025-08-18[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) ↵Florian Hahn
(#151487) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-17[VPlan] Remove dead code from GetBroadCastInstr (NFCI).Florian Hahn
All relevant places should already explicitly materialize broadcasts. Remove dead code from VPTransformState::get
2025-08-12[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)Florian Hahn
Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-11[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)Luke Lau
A lot of time getCanonicalIV() is used to get the canonical IV type, e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext. However VPTypeAnalysis has a constructor that takes the VPlan directly and there's a method on VPlan to get the LLVMContext directly, so use those instead where possible. This lets us remove a constructor on VPTypeAnalysis. Also remove an unused LLVMContext argument in UnrollState whilst we're here.
2025-08-08[VPlan] Materialize vector trip count using VPInstructions. (#151925)Florian Hahn
Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-07[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940)Florian Hahn
We need to reject plans that contain recipes with invalid costs. LICM can move recipes with invalid costs out of the loop region, which then get missed by the main cost computation. Extend the logic to check recipes for invalid cost currently only covering the middle block to include all skeleton blocks. Fixes https://github.com/llvm/llvm-project/issues/144358 Fixes https://github.com/llvm/llvm-project/issues/151664 PR: https://github.com/llvm/llvm-project/pull/151940
2025-08-05[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)Luke Lau
This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.
2025-08-03[VPlan] Materialize BackedgeTakenCount using VPInstructions.Florian Hahn
Explicitly compute the backedge-taken count using VPInstruction. This is needed to model the full skeleton in VPlan. NFC modulo some instruction re-ordering.
2025-07-26[VPlan] Materialize constant vector trip counts before final opts. (#142309)Florian Hahn
Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-09[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). ↵Florian Hahn
(#143879) Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. PR: https://github.com/llvm/llvm-project/pull/143879
2025-06-26[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker (#139881)Igor Kirillov
Currently, when VPSlotTracker is initialized with a VPlan, its assignName method calls printAsOperand on each underlying instruction. Each such call recomputes slot numbers for the entire function, leading to O(N × M) complexity, where M is the number of instructions in the loop and N is the number of instructions in the function. This results in slow debug output for large loops. For example, printing costs of all instructions becomes O(M² × N), which is especially painful when enabling verbose dumps. This patch improves debugging performance by caching slot numbers using ModuleSlotTracker. It avoids redundant recomputation and makes debug output significantly faster.
2025-06-26[VPlan] Unroll VPReplicateRecipe by VF. (#142433)Florian Hahn
Explicitly unroll VPReplicateRecipes outside replicate regions by VF, replacing them by VF single-scalar recipes. Extracts for operands are added as needed and the scalar results are combined to a vector using a new BuildVector VPInstruction. It also adds a few folds to simplify unnecessary extracts/BuildVectors. It also adds a BuildStructVector opcode for handling of calls that have struct return types. VPReplicateRecipe in replicate regions can will be unrolled as follow up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e. not executable. PR: https://github.com/llvm/llvm-project/pull/142433
2025-06-25[VPlan] Format some print forms.NFC (#144644)LiqinWeng
2025-06-21[VPlan] Update packScalarIntoVector to take and return wide value (NFC)Florian Hahn
Make the function more flexible in preparation for new users.
2025-06-18Revert "[VPlan] Remove unnecessary DomTreeUpdater flush (NFC)." (#144758)Arthur Eubanks
This reverts commit 2e337349f436d75af112c081df5ec683871cbcc8. Causes breakages internally, will post reproducer later.
2025-06-17[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638)Luke Lau
The motivation of this PR is to make #115274 easier to implement, and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example: ``` vector.ph: Successor(s): vector loop <x1> vector loop: { vector.body: WIDEN-INDUCTION %i = phi %start, %step, %vf ... EMIT branch-on-count ... No successors } ``` gets expanded to: ``` vector.ph: ... vp<%induction.start> = ... vp<%induction.increment> = ... Successor(s): vector loop <x1> vector loop: { vector.body: ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next> ... vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment> EMIT branch-on-count ... No successors } ``` This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).
2025-06-15[VPlan] Mark VPFirstOrderRecurrencePHI as not reading/writing memory.Florian Hahn
First-order recurrence phis don't have side-effects and don't read or write memory. Mark them as such.
2025-06-13[LV] Use getFixedValue instead of getKnownMinValue when appropriate (#143526)David Sherwood
There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable. While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code. In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.
2025-06-12[DLCov][NFC] Propagate annotated DebugLocs through transformations (#138047)Stephen Tozer
Part of the coverage-tracking feature, following #107279. In order for DebugLoc coverage testing to work, we firstly have to set annotations for intentionally-empty DebugLocs, and secondly we have to ensure that we do not drop these annotations as we propagate DebugLocs throughout compilation. As the annotations exist as part of the DebugLoc class, and not the underlying DILocation, they will not survive a DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies a number of places in the compiler to propagate DebugLocs directly rather than via the underlying DILocation. This has no effect on the output of normal builds; it only ensures that during coverage builds, we do not drop incorrectly annotations and therefore create false positives. The bulk of these changes are in replacing DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in changing the IRBuilder to store a DebugLoc directly rather than storing DILocations in its general Metadata array. We also use a new function, `DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair (valid location > annotated > empty), preferring the current DebugLoc on a tie - this encapsulates the existing behaviour at a few sites where we _may_ assign a DebugLoc to an existing instruction, while extending the logic to handle annotation DebugLocs at the same time.
2025-06-05[VPlan] Remove unnecessary DomTreeUpdater flush (NFC).Florian Hahn
The current version does not need the explicit flush at this point.
2025-06-03[VPlan] Remove CanonicalIV when dissolving loop regions (NFC). (#142372)Florian Hahn
Directly replace the canonical IV when we dissolve the containing region. That ensures that it won't get removed before the region gets removed, which would result in an invalid region. This removes the current ordering constraint between convertToConcreteRecipes and dissolving regions. PR: https://github.com/llvm/llvm-project/pull/142372