summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
AgeCommit message (Collapse)Author
2025-11-13Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. ↵Florian Hahn
(#149042)" This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-12[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)Florian Hahn
Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-04[VPlan] Verify incoming values of VPIRPhi matches before checking (NFC)Florian Hahn
Update the verifier to first check if the number of incoming values matches the number of predecessors, before using incoming_values_and_blocks. We unfortunately need also check here, as this may be called before verifyPhiRecipes runs. Also update the verifier unit tests, to actually fail for the expected recipes.
2025-10-15[VPlan] Use switch over opcodes in verifier (NFC).Florian Hahn
Preparation to make it easier to extend to verify additional opcodes.
2025-10-13[VPlan] Strip VPDT's default constructor (NFC) (#162692)Ramkumar Ramachandra
2025-10-13[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721)Ramkumar Ramachandra
2025-09-27[VPlan] Allow multiple users of (broadcast %evl).Florian Hahn
CSE may replace multiple redundant broadcasts of EVL with a single broadcast which may have more than 1 user. Adjust the verifier to allow this. Fixes a crash when building llvm-test-suite with EVL: https://lab.llvm.org/buildbot/#/builders/210/builds/3303
2025-09-18[VPlanPatternMatch] Introduce match functor (NFC) (#159521)Ramkumar Ramachandra
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-01[LV][EVL] Support interleaved access with tail folding by EVL (#152070)Mel Chen
The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201
2025-09-01[VPlan] Add VPBlockBase::hasPredecessors (NFC).Florian Hahn
Split off from https://github.com/llvm/llvm-project/pull/154510/, add helper to check if a block has any predecessors.
2025-08-16[VPlan] Run final VPlan simplifications before codegen.Florian Hahn
Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.
2025-08-14[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) ↵Florian Hahn
(#138472) Add 3 new iterator ranges to VPPhiAccessors * incoming_values(): returns a range over the incoming values of a phi * incoming_blocks(): returns a range over the incoming blocks of a phi * incoming_values_and_blocks: returns a range over pairs of incoming values and blocks. Depends on https://github.com/llvm/llvm-project/pull/124838. PR: https://github.com/llvm/llvm-project/pull/138472
2025-08-07[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)Luke Lau
Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-01[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481)Luke Lau
This implements the first half of #151459, by changing the AVL so it's no longer computed as `trip-count - EVL-based IV`, but instead a separate scalar phi that is decremented by EVL each iteration. This shortens the dependency chain for computing the AVL and should eventually allow us to convert the branch condition to `branch-count avl-next, 0`. `simplifyBranchConditionForVFAndUF` had to be updated to prevent a regression because this introduces a VPPhi in the header block.
2025-07-31[VPlan] Fix header phi VPInstruction verification. NFC (#151472)Luke Lau
Noticed this when checking the invariant that all phis in the header block must be header phis. I think there's a missing set of parentheses here, since otherwise it only cast<VPInstruction> when RecipeI isn't a VPInstruction.
2025-07-30[VPlan] Convert EVL loops to variable-length stepping after dissolution ↵Shih-Po Hung
(#147222) Loop regions require fixed-length steps and rounded-up trip counts, but after dissolution creates explicit control flow, EVL loops can leverage variable-length stepping with original trip counts. This patch adds a post-dissolution transform pass to convert EVL loops from fixed-length to variable-length stepping .
2025-07-30[VPlan] Fix header masks in EVL tail folding (#150202)Luke Lau
With EVL tail folding, the EVL may not always be VF on the second-to-last iteration. Recipes that have been converted to VP intrinsics via optimizeMaskToEVL account for this, but recipes that are left behind will still use the old header mask which may end up having a different vector length. This is effectively the same as #95368, and fixes this by converting header masks from icmp ule wide-canonical-iv, backedge-trip-count -> icmp ult step-vector, evl. Without it, recipes that fall through optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and #149981. We really need to split off optimizeMaskToEVL into VPlanTransforms::optimize and move transformRecipestoEVLRecipes into tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for correctness and what is needed to optimize away the mask computations. We should be able to still generate a correct albeit suboptimal VPlan without running optimizeMaskToEVL. I've added a TODO for this, which I think we can do after #148274 Fixes #150197
2025-07-01[VPlan] Support VPWidenIntOrFpInductionRecipes with EVL tail folding (#144666)Luke Lau
Following on from #118638, this handles widened induction variables with EVL tail folding by setting the VF operand to be EVL, calculated in the vector body. We need to do this for correctness since with EVL tail folding the number of elements processed in the penultimate iteration may not be VF, but the runtime EVL, and we need take this into account when updating the backedge value. - Because the VF may now not be a live-in we need to move the insertion point to just after the VFs definition - We also need to avoid truncating it when it's the same size as the step type, previously this wasn't a problem for live-ins. - Also because the VF may be smaller than the IV type, since the EVL is always i32, we may need to zext it. On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and 42.8% more loops vectorized on SPEC CPU 2017
2025-06-30[VPlan] Replace all uses of VF when EVL tail folding. NFCI (#146339)Luke Lau
With EVL tail folding, any use of the VF live in should be replaced by the EVL. Otherwise, it should likely be directly emitted as a constant via VPTransformState::VF. This strengthens the EVL transformation by replacing all uses of VF with EVL and asserting that the only users are VPVectorEndPointerRecipe and VPScalarIVStepsRecipe, the latter of which is new. This should be NFC because even though we didn't previously replace the EVL of VPScalarIVStepsRecipe, it's only used when unrolling which we don't allow with EVL tail folding yet.
2025-06-11[VPlan] Always verify VPCanonicalIVPHIRecipe placement (NFC).Florian Hahn
Loop regions are dissolved since dcef154b5caf6556e69bb1, remove the check for VerifyLate and corresponding TODO.
2025-05-30[VPlan] Remove ResumePhi opcode, use regular PHI instead (NFC). (#140405)Florian Hahn
Use regular VPPhi instead of a separate opcode for resume phis. This removes an unneeded specialized opcode and unifies the code (verification, printing, updating when CFG is changed). Depends on https://github.com/llvm/llvm-project/pull/140132. PR: https://github.com/llvm/llvm-project/pull/140405
2025-05-24[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)Florian Hahn
Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-17[VPlan] Verify final VPlan, just before execution. (NFC)Florian Hahn
Add additional verifier call just before execution, to make sure the final VPlan is valid. Note that this currently requires disabling a small number of checks when running late.
2025-05-15[VPlan] Add VPTypeAnalysis constructor taking a VPlan (NFC).Florian Hahn
Add constructor that retrieves the scalar type from the trip count expression, if no canonical IV is available. Used in the verifier, in preparation for late verification, when the canonical IV has been dissolved.
2025-05-15[VPlan] Verify dominance for incoming values of phi-like recipes. (#124838)Florian Hahn
Update the verifier to verify dominance for incoming values for phi-like recipes. The defining recipe must dominate the incoming block for the incoming value. Builds on top of https://github.com/llvm/llvm-project/pull/138472 to retrieve incoming values & corresponding blocks for phi-like recipes. PR: https://github.com/llvm/llvm-project/pull/124838
2025-05-13[VPlan] Print use and definition in verifier on violation.Florian Hahn
Improves the error message when a use comes before the def by including the use and def, when print utilities are available.
2025-05-05[VPlan] Verify number preds and operands matches for VPIRPhis. (NFC)Florian Hahn
Extend the verifier to ensure the number of predecessors and operands match for VPIRPhis.
2025-04-10[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) ↵Florian Hahn
(#129706) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706
2025-03-28[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). ↵Florian Hahn
(#129387) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387
2025-03-19[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC ↵Luke Lau
(#131086) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.
2025-03-13[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767)Florian Hahn
Now that all phi nodes manage their incoming blocks through the VPlan-predecessors, there should be no need for having a dedicate recipe, it should be sufficient to allow PHI opcodes in VPInstruction. Follow-ups will also migrate VPWidenPHIRecipe and possibly others, building on top of https://github.com/llvm/llvm-project/pull/129388. PR: https://github.com/llvm/llvm-project/pull/129767
2025-03-03[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093)Mel Chen
This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.
2025-02-22[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)Luke Lau
This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.
2025-01-30[LoopVectorize] Enable vectorisation of early exit loops with live-outs ↵David Sherwood
(#120567) This work feeds part of PR https://github.com/llvm/llvm-project/pull/88385, and adds support for vectorising loops with uncountable early exits and outside users of loop-defined variables. When calculating the final value from an uncountable early exit we need to calculate the vector lane that triggered the exit, and hence determine the value at the point we exited. All code for calculating the last value when exiting the loop early now lives in a new vector.early.exit block, which sits between the middle.split block and the original exit block. Doing this required two fixes: 1. The vplan verifier incorrectly assumed that the block containing a definition always dominates the block of the user. That's not true if you can arrive at the use block from multiple incoming blocks. This is possible for early exit loops where both the early exit and the latch jump to the same block. 2. We were adding the new vector.early.exit to the wrong parent loop. It needs to have the same parent as the actual early exit block from the original loop. I've added a new ExtractFirstActive VPInstruction that extracts the first active lane of a vector, i.e. the lane of the vector predicate that triggered the exit. NOTE: The IR generated for dealing with live-outs from early exit loops is unoptimised, as opposed to normal loops. This inevitably leads to poor quality code, but this can be fixed up later.
2025-01-28[VPlan] Use cast<VPRecipeBase> in verifier (NFC).Florian Hahn
All users of VPValue must be a VPRecipeBase, use cast.
2025-01-16[VPlan] Verify scalar types in VPlanVerifier. NFCI (#122679)Luke Lau
VTypeAnalysis contains some assertions which can be useful for reasoning that the types of various operands match. This patch teaches VPlanVerifier to invoke VTypeAnalysis to check them, and catches some issues with VPInstruction types that are also fixed here: * Handles the missing cases for CalculateTripCountMinusVF, CanonicalIVIncrementForPart and AnyOf * Fixes ICmp and ActiveLaneMask to return i1 (to align with `icmp` and `@llvm.get.active.lane.mask` in the LangRef) The VPlanVerifier unit tests also need to be fleshed out a bit more to satisfy the stricter assertions
2024-12-12[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)Florian Hahn
As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292
2024-11-23[VPlan] Simplify and unify code in verifyEVLRecipe using all_of. (NFCI)Florian Hahn
Use all_of instead of explicit loop to reduce indentation, also properly check VPScalarCastRecipe operand.
2024-11-22[VPlan] Support VPReverseVectorPointer in DataWithEVL vectorization (#113667)Shih-Po Hung
VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL tail-folding, EVL (which can be less than VF at runtime) should be used instead. This patch updates the logic to check the users of VF and replaces the second operand if the user is VPReverseVectorPointer.
2024-11-03[Vectorize] Remove unused includes (NFC) (#114643)Kazu Hirata
Identified with misc-include-cleaner.
2024-10-31[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)Florian Hahn
Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-15[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489)Florian Hahn
Use VPWidenIntrinsicRecipe (https://github.com/llvm/llvm-project/pull/110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: https://github.com/llvm/llvm-project/pull/110489
2024-09-16[LV] Added verification of EVL recipes (#107630)Kolya Panchenko
2024-09-14[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)Florian Hahn
Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on https://github.com/llvm/llvm-project/pull/100658. PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-08[Vectorize] Avoid repeated hash lookups (NFC) (#107729)Kazu Hirata
2024-07-05[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)Florian Hahn
This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651
2024-05-30[VPlan] Add VPIRBasicBlock, use to model pre-preheader. (#93398)Florian Hahn
This patch adds a new special type of VPBasicBlock that wraps an existing IR basic block. Recipes of the block get added before the terminator of the wrapped IR basic block. Making it a subclass of VPBasicBlock avoids duplicating various APIs to manage recipes in a block, as well as makes sure the traversals filtering VPBasicBlocks automatically apply as well. Initially VPIRBasicBlock are only used for the pre-preheader (wrapping the original preheader of the scalar loop). As follow-up, this will be used to move more parts of the skeleton inside VPlan, starting with the branch and condition in the middle block. Separated out of https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/93398
2024-05-29[VPlan] Move verifier to class to reduce need to pass via args. (NFC)Florian Hahn
Move VPlan verification functions to avoid the need to pass VPDT across multiple calls. This also allows easier extensions in the future.
2024-04-22[VPlan] Remove custom checks for EVL placement in verifier (NFCI).Florian Hahn
After e2a72fa583d9, def-use chains of EVL are modeled explicitly. So there's no need for a custom check of its placement, as regular def-use verification will catch mis-placements.
2024-04-17[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)Florian Hahn
This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: https://github.com/llvm/llvm-project/pull/87411