summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
AgeCommit message (Collapse)Author
2025-11-18[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)Florian Hahn
Update VPlan to populate VPIRFlags during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRFlags from the underlying IR instruction each time. The VPRecipeWithIRFlags constructor taking an underlying instruction and setting the flags based on it has been removed. This centralizes initial VPIRFlags creation and ensures flags are consistently available throughout VPlan transformations and makes sure we don't accidentally re-add flags from the underlying instruction that already got dropped during transformations. Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did the same for VPIRMetadata. Should be NFC w.r.t. to the generated IR. PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-17[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253)Florian Hahn
Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: https://github.com/llvm/llvm-project/pull/167253
2025-11-17Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)Ramkumar Ramachandra
Changes: The previous patch had to be reverted to a mismatching-OpType assert in cse. The reduced-test has now been added corresponding to a RVV pointer-induction, and the pointer-induction case has been updated to use createOverflowingBinaryOp. While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-01[VPlan] Add VPIRMetadata parameter to VPInstruction constructor. (NFC)Florian Hahn
Update VPInstruction constructor to accept VPIRMetadata between the Flags and DebugLoc parameters. This allows metadata to be passed during construction rather than assigned afterward.
2025-11-01[VPlan] Add getConstantInt helpers for constant int creation (NFC).Florian Hahn
Add getConstantInt helper methods to VPlan to simplify the common pattern of creating constant integer live-ins. Suggested as follow-up in https://github.com/llvm/llvm-project/pull/164127.
2025-10-20[VPlan] Set flags when constructing zexts using VPWidenCastRecipe (#164198)Luke Lau
createWidenCast doesn't set the flag type, so when we simplify trunc (zext nneg x) -> zext x we would hit an assertion in CSE that the flag types don't match with other VPWidenCastRecipes that weren't simplified. This fixes it the same way trunc flags are handled too. As an aside I think it should be correct to preserve the nneg flag in this case since the input operand is still non-negative after the transform. But that's left to another PR. Fixes https://github.com/llvm/llvm-project/issues/164171
2025-10-12[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.Florian Hahn
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.
2025-09-18[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)Florian Hahn
After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-04[VPlan] Consolidate logic to update loop metadata and profile info.Florian Hahn
This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.
2025-09-04[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546)Hassnaa Hamdi
In case of equal costs Prefer epilogue with fixed-width over scalable VF. That is helpful in cases like post-LTO vectorization where epilogue with fixed-width VF can be removed when we eventually know that the trip count is less than the epilogue iterations.
2025-08-26[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)Florian Hahn
This patch adds a new VPlan-based addMinimumIterationCheck, which replaced the ILV version for the non-epilogue case. The VPlan-based version constructs a SCEV expression to compute the minimum iterations, use that to check if the check is known true or false. Otherwise it creates a VPExpandSCEV recipe and emits a compare-and-branch. When using epilogue vectorization, we still need to create the minimum trip-count-check during the legacy skeleton creation. The patch moves the definitions out of ILV. PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-18[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549)Ramkumar Ramachandra
Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as well as inbounds at the callsite.
2025-08-12[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)Florian Hahn
Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-05[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)Luke Lau
This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.
2025-08-05[VPlan] Compute interleave count for VPlan. (#149702)Florian Hahn
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some information directly from VPlan. Register pressure was already computed for a VPlan, and with this patch we now also check for reductions directly on VPlan, as well as checking how many load and store operations remain in the loop. This should be mostly NFC, but we may compute slightly different interleave counts, except for some edge cases, e.g. where dead loads have been removed. This shouldn't happen in practice, and the patch doesn't cause changes across a large test corpus on AArch64. Computing the interleave count based on VPlan allows for making better decisions in presence of VPlan optimizations, for example when operations on interleave groups are narrowed. Note that there are a few test changes for tests that were still checking the legacy cost-model output when it was computed in selectInterleaveCount. PR: https://github.com/llvm/llvm-project/pull/149702
2025-07-31[VPlan] Make VPBuilder APIs uniformly take ArrayRef (NFC) (#151484)Ramkumar Ramachandra
2025-07-18[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)Florian Hahn
Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, *exit the vector loop, *compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-09[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). ↵Florian Hahn
(#143879) Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. PR: https://github.com/llvm/llvm-project/pull/143879
2025-07-08[LV] Strip redundant fn in VPBuilder (NFC) (#147499)Ramkumar Ramachandra
2025-06-20[VPlan] Refine return types in VPBuilder (NFC) (#108858)Ramkumar Ramachandra
2025-06-19[LV] Introduce and use VPBuilder::createScalarZExtOrTrunc [nfc] (#144946)Philip Reames
Reduce redundant code, make the flow slightly easier to read.
2025-06-11[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192)Stephen Tozer
Following the work in PR #107279, this patch applies the annotative DebugLocs, which indicate that a particular instruction is intentionally missing a location for a given reason, to existing sites in the compiler where their conditions apply. This is NFC in ordinary LLVM builds (each function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the instruction in coverage-tracking builds so that it will be ignored by Debugify, allowing only real errors to be reported. From a developer standpoint, it also communicates the intentionality and reason for a missing DebugLoc. Some notes for reviewers: - The difference between `I->dropLocation()` and `I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide to keep some debug info alive, while the latter will always be empty; in this patch, I always used the latter (even if the former could technically be correct), because the former could result in some (barely) different output, and I'd prefer to keep this patch purely NFC. - I've generally documented the uses of `DebugLoc::getUnknown()`, with the exception of the vectorizers - in summary, they are a huge cause of dropped source locations, and I don't have the time or the domain knowledge currently to solve that, so I've plastered it all over them as a form of "fixme".
2025-05-26[VPlan] Construct initial once and pass clones to tryToBuildVPlan (NFC). ↵Florian Hahn
(#141363) Update to only build an initial, plain-CFG VPlan once, and then transform & optimize clones. This requires changes to ::clone() for VPInstruction and VPWidenPHIRecipe to allow for proper cloning of the recipes in the initial VPlan. PR: https://github.com/llvm/llvm-project/pull/141363
2025-05-25[VPlan] Separate out logic to manage IR flags to VPIRFlags (NFC). (#140621)Florian Hahn
This patch moves the logic to manage IR flags to a separate VPIRFlags class. For now, VPRecipeWithIRFlags is the only class that inherits VPIRFlags. The new class allows for simpler passing of flags when constructing recipes, simplifying the constructors for various recipes (VPInstruction in particular, which now just has 2 constructors, one taking an extra VPIRFlags argument. This mirrors the approach taken for VPIRMetadata and makes it easier to extend in the future. The patch also adds a unified flagsValidForOpcode to check if the flags in a VPIRFlags match the provided opcode. PR: https://github.com/llvm/llvm-project/pull/140621
2025-05-10[VPlan] Add VPPhi subclass for VPInstruction with PHI opcodes.(NFC) (#139151)Florian Hahn
Similarly to VPInstructionWithType and VPIRPhi, add VPPhi as a subclass for VPInstruction. This allows implementing the VPPhiAccessors trait, making available helpers for generic printing of incoming values / blocks and accessors for incoming blocks and values. It will also allow properly verifying def-uses for values used by VPInstructions with PHI opcodes via https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/139151
2025-05-09[VPlan] Manage noalias/alias_scope metadata in VPlan. (#136450)Florian Hahn
Use VPIRMetadata added in https://github.com/llvm/llvm-project/pull/135272 to also manage no-alias metadata added by versioning. Note that this means we have to build the no-alias metadata up-front once. If it is not used, it will be discarded automatically. This also fixes a case where incorrect metadata was added to wide loads/stores that got converted from an interleave group. Compile-time impact is neutral: https://llvm-compile-time-tracker.com/compare.php?from=38bf1af41c5425a552a53feb13c71d82873f1c18&to=2fd7844cfdf5ec0f1c2ce0b9b3ae0763245b6922&stat=instructions:u
2025-05-07[VPlan] Create PHI VPInstruction using VPBuilder (NFC).Florian Hahn
Use builder to create scalar PHI VPInstructions.
2025-04-29[VPlan] Use correct non-FMF constructor in VPInstructionWithType ↵Luke Lau
createNaryOp (#137632) Currently if we try to create a VPInstructionWithType without a FMF via VPBuilder::createNaryOp we will use the constructor that asserts `assert(isFPMathOp() && "this op can't take fast-math flags");`. This fixes it by checking if FMFs have a value, similar to the other createNaryOp overloads. This is needed by #129508
2025-04-14[VPlan] Add opcode to create step for wide inductions. (#119284)Florian Hahn
This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 2 operands * the vector step * the scale of the vector step The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale. This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to https://github.com/llvm/llvm-project/pull/108378 with an increment of the wide IV step. PR: https://github.com/llvm/llvm-project/pull/119284
2025-04-11[VPlan] Add hasScalarTail, use instead of !CM.foldTailByMasking() (NFC). ↵Florian Hahn
(#134674) Now that VPlan is able to fold away redundant branches to the scalar preheader, we can directly check in VPlan if the scalar tail may execute. hasScalarTail returns true if the tail may execute. We know that the scalar tail won't execute if the scalar preheader doesn't have any predecessors, i.e. is not reachable. This removes some late uses of the legacy cost model. PR: https://github.com/llvm/llvm-project/pull/134674
2025-04-10[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) ↵Florian Hahn
(#129706) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706
2025-04-04[VPlan] Set and use debug location for VPScalarIVStepsRecipe.Florian Hahn
This adds missing debug location for VPscalarIVStepsRecipe. The location of the corresponding phi is used.
2025-03-28[VPlan] Add VF as operand to VPScalarIVStepsRecipe.Florian Hahn
Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.
2025-03-25[LV] Audit and fix nits in cl::opts (NFC) (#130601)Ramkumar Ramachandra
Non-static cl::opts should be under the llvm namespace.
2025-03-22[LV] Move IV bypass value creation out of ILV (NFC)Florian Hahn
createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.
2025-03-19[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.Florian Hahn
Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-02-03[LV] Add VPBuilder::insert, use to insert created vector pointer (NFC).Florian Hahn
Split off from https://github.com/llvm/llvm-project/pull/124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.
2025-02-02[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)Florian Hahn
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104
2025-01-05[VPlan] Add and use debug location for VPScalarCastRecipe.Florian Hahn
Update the recipe it always take a debug location and set it.
2024-12-29[VPlan] Compute induction end values in VPlan. (#112145)Florian Hahn
Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-17[VPlan] Propagate all GEP flags (#119899)Nikita Popov
Store GEPNoWrapFlags instead of only InBounds and propagate them.
2024-11-24[VPlan] Allow setting IR name for VPDerivedIVRecipe (NFCI).Florian Hahn
Allow setting the name to use for the generated IR value of the derived IV in preparations for https://github.com/llvm/llvm-project/pull/112145. This is analogous to VPInstruction::Name.
2024-11-17[LV] Vectorize Epilogues for loops with small VF but high IC (#108190)Julian Nagele
- Consider MainLoopVF * IC when determining whether Epilogue Vectorization is profitable - Allow the same VF for the Epilogue as for the main loop - Use an upper bound for the trip count of the Epilogue when choosing the Epilogue VF PR: https://github.com/llvm/llvm-project/pull/108190 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-10-31[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)Florian Hahn
Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-25[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC)Florian Hahn
This clarifies the flag, which is now only passed if the epilogue loop is being vectorized.
2024-10-06[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)Florian Hahn
Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.
2024-09-29[LV] Retrieve reduction resume values directly for epilogue vec. (NFC)Florian Hahn
Use the reduction resume values from the phis in the scalar header, instead of collecting them in a map. This removes some complexity from the general executePlan code paths and pushes it to only the epilogue vectorization part.
2024-09-24[VPlan] Add createPtrAdd helper (NFC).Florian Hahn
Preparation for https://github.com/llvm/llvm-project/pull/106431.
2024-09-21[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)Florian Hahn
This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-13[VPlan] Use VPBuilder to create scalar IV steps and derived IV (NFCI).Florian Hahn
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe and VPScalarIVStepsRecipe. Use them to simplify the code to create scalar IV steps slightly.