summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
AgeCommit message (Collapse)Author
2025-11-21[VPlan] Only apply forced cost to recipes with underlying values. (#168372)Florian Hahn
Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: https://github.com/llvm/llvm-project/pull/168372
2025-11-20[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)Florian Hahn
Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and replace it with `onlyScalarValuesUsed`. This removes the need to carry state from the legacy cost model through VPlan, and the VPlan-based analysis gives more accurate results, avoiding a number of extracts. PR: https://github.com/llvm/llvm-project/pull/168289
2025-11-19[VPlan] Print debug info for all recipes. (#168454)Florian Hahn
Use the recently refactored VPRecipeBase::print to print debug location for all recipes. PR: https://github.com/llvm/llvm-project/pull/168454
2025-11-19[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029)Shih-Po Hung
- Split from #165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.
2025-11-18[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI).Florian Hahn
FCmp instructions have both a predicate and fast-math flags. Introduce a new FCmp kind, that combines both to model this correctly in the current system. This should be NFC modulo VPlan printing which now includes the correct fast-math flags.
2025-11-18[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)Florian Hahn
Update VPlan to populate VPIRFlags during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRFlags from the underlying IR instruction each time. The VPRecipeWithIRFlags constructor taking an underlying instruction and setting the flags based on it has been removed. This centralizes initial VPIRFlags creation and ensures flags are consistently available throughout VPlan transformations and makes sure we don't accidentally re-add flags from the underlying instruction that already got dropped during transformations. Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did the same for VPIRMetadata. Should be NFC w.r.t. to the generated IR. PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-17[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253)Florian Hahn
Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: https://github.com/llvm/llvm-project/pull/167253
2025-11-17Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)Ramkumar Ramachandra
Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17[VPlan] Add printRecipe, prepare printing metadata in ::print (NFC) (#166244)Florian Hahn
Add a new pinrRecipe which handles printing the recipe without common info like debug info or metadata. Prepares to print them once, in ::print(), after/in combination with https://github.com/llvm/llvm-project/pull/165825. PR: https://github.com/llvm/llvm-project/pull/166244
2025-11-16[VPlan] Delegate to other VPInstruction constructors. (NFCI)Florian Hahn
Update VPInstruction constructor to delegate to constructor with more comprehensive checking and validation. This required updating some unit tests, to make sure the constructed VPInstructions are valid.
2025-11-14Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)Alex Bradbury
Reverts llvm/llvm-project#163538 This is causing build failures on the two-stage RVV buildbots. e.g. https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a reproducer and more information at https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822 This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)Ramkumar Ramachandra
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14[LV] Explicitly disable in-loop reductions for AnyOf and FindIV. nfc (#163541)Mel Chen
Currently, in-loop reductions for AnyOf and FindIV are not supported. They were implicitly blocked. This happened because RecurrenceDescriptor::getReductionOpChain could not detect their recurrence chain. The reason is that RecurrenceDescriptor::getOpcode was set to Instruction::Or, but the recurrence chains of AnyOf and FindIV do not actually contain an Instruction::Or. This patch explicitly disables in-loop reductions for AnyOf and FindIV instead of relying on getReductionOpChain to implicitly prevent them.
2025-11-13Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. ↵Florian Hahn
(#149042)" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647)Luke Lau
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399) has the property that if the AVL (%cnt) is less than or equal to VF (%max_lanes) then the return value is just AVL. This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds `ExplicitVectorLength` to `VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once dead.
2025-11-12[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)Florian Hahn
Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-11[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC)Florian Hahn
Classof for most recipes directly supports VPValue, so there is no need to call getDefiningRecipe when using isa/cast/dyn_cast.
2025-11-11Revert "[VPlan] Handle WidenGEP in narrowToSingleScalars" (#167509)Ramkumar Ramachandra
This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it causes buildbot failures. This will give us time to investigate the failure. https://lab.llvm.org/buildbot/#/builders/210/builds/5160
2025-11-11[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740)Ramkumar Ramachandra
This allows us to strip a special case in VPWidenGEP::execute.
2025-11-11[LV] Move condition to VPPartialReductionRecipe::execute (#166136)Sander de Smalen
This means that VPExpressions will now be constructed for VPPartialReductionRecipe's when the loop has tail-folding predication. Note that control-flow (if/else) predication is not yet handled for partial reductions, because of the way partial reductions are recognised and built up.
2025-11-10[VPlan] Don't apply predication discount to non-originally-predicated blocks ↵Luke Lau
(#160449) Split off from #158690. Currently if an instruction needs predicated due to tail folding, it will also have a predicated discount applied to it in multiple places. This is likely inaccurate because we can expect a tail folded instruction to be executed on every iteration bar the last. This fixes it by checking if the instruction/block was originally predicated, and in doing so prevents vectorization with tail folding where we would have had to scalarize the memory op anyway. On llvm-test-suite this causes 4 loops in total to no longer be vectorized with -O3 on arm64-apple-darwin, and there's no observable performance impact.
2025-11-06[VPlan] Rename onlyFirst(Lane|Part)Used (NFC) (#166562)Ramkumar Ramachandra
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line with usesScalars, for clarity.
2025-11-05[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)Florian Hahn
Generalize VPWidenSelectRecipe codegen to consider single-scalar conditions instead of just loop-invariant ones. If the condition is a single-scalar, we can simply use a scalar condition. PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-02[VPlan] Mark BranchOnCount and BranchOnCond as having side effects (NFC)Florian Hahn
BranchOnCount and BranchOnCond do not read memory, but cannot be moved. Mark them as having side-effects, but not reading/writing memory, which more accurately models that above. This allows removing some special checking for branches both in the current code and future patches.
2025-11-01[VPlan] Add VPIRMetadata parameter to VPInstruction constructor. (NFC)Florian Hahn
Update VPInstruction constructor to accept VPIRMetadata between the Flags and DebugLoc parameters. This allows metadata to be passed during construction rather than assigned afterward.
2025-10-31[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127)Florian Hahn
Split off from https://github.com/llvm/llvm-project/pull/156262. Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe. PR: https://github.com/llvm/llvm-project/pull/164127
2025-10-30[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276)Florian Hahn
Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: https://github.com/llvm/llvm-project/pull/161276
2025-10-28[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674)Ramkumar Ramachandra
Factor out common code to determine legality of hoisting and sinking. The patch has the side-effect of fixing an underlying bug, where a load/store pair is reordered.
2025-10-28[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255)Mel Chen
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment directly in the recipe. Update constructors, clone(), and relevant methods to use this stored alignment instead of querying the IR instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be constructed without relying on the original IR instruction in the future.
2025-10-28[VPlan] Use VPlan type inference to get address space for recipes. (NFC)Florian Hahn
Instead of accessing the address space from the IR reference, retrieve it via type inference.
2025-10-23[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)Sam Tebbs
This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513
2025-10-20[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs.Florian Hahn
When the legacy cost model scalarizes loads that are used as addresses for other loads and stores, it looks to phi nodes, if they are direct address operands of loads/stores. Match this behavior in isUsedByLoadStoreAddress, to fix a divergence between legacy and VPlan-based cost model.
2025-10-19[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)Florian Hahn
Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on https://github.com/llvm/llvm-project/pull/155102 (included in PR) PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-18[VPlan] Add VPRecipeBase::getRegion helper (NFC).Florian Hahn
Multiple places retrieve the region for a recipe. Add a helper to make the code more compact and clearer.
2025-10-16[VPlan] Improve code around canConstantBeExtended (NFC) (#161652)Ramkumar Ramachandra
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming scaled reductions), introducing m_APInt, and improving code around canConstantBeExtended: we change canConstantBeExtended to take an APInt.
2025-10-15[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)Florian Hahn
When narrowing stores of a single-scalar, we currently use ExtractLastElement, which extracts the last element across all parts. This is not correct if the store's address is not uniform across all parts. If it is only uniform-per-part, the last lane per part must be extracted. Add a new ExtractLastLanePerPart opcode to handle this correctly. Most transforms apply to both ExtractLastElement and ExtractLastLanePerPart, with the only difference being their treatment during unrolling. Fixes https://github.com/llvm/llvm-project/issues/162498. PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)Florian Hahn
The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-12[llvm] Use [[fallthrough]] instead of LLVM_FALLTHROUGH (NFC) (#163086)Kazu Hirata
[[fallthrough]] is now part of C++17, so we don't need to use LLVM_FALLTHROUGH.
2025-10-11[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCostFlorian Hahn
Replication is currently not supported for scalable VFs. Make sure VPReplicateRecipe::computeCost returns an invalid cost early, for scalable VFs if the recipe is not a single-scalar. Note that this moves the existing invalid-costs.ll out of the AArch64 subdirectory, as it does not use a target triple. Fixes https://github.com/llvm/llvm-project/issues/160792.
2025-10-08[VPlan] Skip VPBlendRecipe in isUsedByLoadStoreAddress.Florian Hahn
VPBlendRecipes are introduced as part of if-conversion, potentially adding a def-use chain from a load used in a compare to another load/store. In the scalar IR, there is no connection via def-use chains, so the legacy cost model won't consider the load used by memory operation. Skipping blends brings the VPlan-based cost-computation in line with the legacy cost model after https://github.com/llvm/llvm-project/pull/162157.
2025-10-08[VPlan] Mark ActiveLaneMask as not having mem effects (#162330)Ramkumar Ramachandra
VPInstruction::ActiveLaneMask does not read or write memory. This allows us to clean up some dead recipes.
2025-10-06[VPlan] Process ExpressionRecipes in reverse order in constructor.Florian Hahn
Currently there's a crash when trying to construct VPExpressionRecipes for a mul (ext, ext), if the multiply has outside users; the mul will be cloned to serve its external users, but the extends won't get cloned and will stay connected to users outside the loop (the cloned multiply). To fix this, process recipes in reverse order. This ensures that we visit bundled users before their operands, properly ensuring that the extends for the external user are cloned as well.
2025-10-06Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" (#162157) This reverts commit f80c0baf058dbdc5 and 94eade61a02ae5. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-05Revert "Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Alexey Bataev
::computeCost. (#160053)" (#161724)" This reverts commit 8f2466bc72a5ab163621cb1bf4bf53a27f1cefe7 to fix crashes reported in commits
2025-10-05Revert "[VPlan] Match legacy CM in ::computeCost if load is used by load/store."Alexey Bataev
This reverts commit 1d65d9ce06fef890389e61990d9c748162334e55 to fix crashes, reported in the commits
2025-10-03[VPlan] Match legacy CM in ::computeCost if load is used by load/store.Florian Hahn
If a load is scalarized because it is used by a load/store address, the legacy cost model does not pass ScalarEvolution to getAddressComputationCost. Match the behavior in VPReplicateRecipe::computeCost.
2025-10-02Reapply "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" (#161724) This reverts commit f61be4352592639a0903e67a9b5d3ec664ad4d23. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes https://github.com/llvm/llvm-project/issues/161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-02[LV] Support multiplies by constants when forming scaled reductions. (#161092)Florian Hahn
We can create partial reductions for multiplies with constants, if the constant is small enough to be extended from source to destination type w/o changing the value. This only handles constant on the right side of a multiply, relying on other passes to canonicalize the input. Alive2 Proofs: https://alive2.llvm.org/ce/z/iWRMr6 PR: https://github.com/llvm/llvm-project/pull/161092
2025-10-01[LV] Keep duplicate recipes in VPExpressionRecipe (#156976)Sam Tebbs
The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set. When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set. Fixes #156464
2025-09-30Revert "[VPlan] Compute cost of more replicating loads/stores in ↵Florian Hahn
::computeCost. (#160053)" This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77. See https://github.com/llvm/llvm-project/issues/161404 for a crash exposed by the change. Revert while I investigate.