llvm-project.git/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp, branch users/SamTebbs33/expression-recipe-sub

[LV] Bundle sub reductions into VPExpressionRecipe

2025-07-06T21:21:12+00:00

This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

[LV] Add support for partial reductions without a binary op (#133922)

2025-07-02T12:05:51+00:00

Consider IR such as this:

for.body:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
  %accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %gep.a = getelementptr i8, ptr %a, i64 %iv
  %load.a = load i8, ptr %gep.a, align 1
  %ext.a = zext i8 %load.a to i32
  %add = add i32 %ext.a, %accum
  %iv.next = add i64 %iv, 1
  %exitcond.not = icmp eq i64 %iv.next, 1025
  br i1 %exitcond.not, label %for.exit, label %for.body

Conceptually we can vectorise this using partial reductions too,
although the current loop vectoriser implementation requires the
accumulation of a multiply. For AArch64 this is easily done with
a udot or sdot with an identity operand, i.e. a vector of (i16 1).

In order to do this I had to teach getScaledReductions that the
accumulated value may come from a unary op, hence there is only
one extension to consider. Similarly, I updated the vplan and
AArch64 TTI cost model to understand the possible unary op.

---------

Co-authored-by: Matt Devereau

[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864)

2025-07-02T10:16:02+00:00

A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.

Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.

The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.

One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.

VPlanRecipes.cpp - fix "'llvm::VPExpressionRecipe::computeCost': not all control paths return a value" MSVC warning. NFC.

2025-07-02T08:59:01+00:00

[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281)

2025-07-01T19:44:50+00:00

This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.

PR: https://github.com/llvm/llvm-project/pull/144281

[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860)

2025-06-30T21:39:17+00:00

Instead of looking up the narrower reduction type via getRecurrenceType
we can generate the needed extend directly at constructiond re-use the
truncated value from the loop.

PR: https://github.com/llvm/llvm-project/pull/141860

[LV] Add support for cmp reductions with decreasing IVs. (#140451)

2025-06-29T10:17:03+00:00

Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451

[VPlan] Handle FirstActiveLane when unrolling. (#145394)

2025-06-27T07:44:57+00:00

Currently FirstActiveLane is not handled correctly during
 unrolling. This is currently causing mis-compiles when
 vectorizing early-exit loops with interleaving forced.

This patch updates handling of FirstActiveLane to be analogous to
computing final reduction results: during unrolling, the created copies
for its original operand are added as additional operands, and
FirstActiveLane will always produce the index of the first active lane
across all unrolled iterations.

Note that some of the generated code is still incorrect, as we also need
to handle ExtractElement with FirstActiveLane operands. I will share
patches for those soon as well.

PR: https://github.com/llvm/llvm-project/pull/145394

[VPlan] Handle AnyOf when unrolling. (#145340)

2025-06-26T13:19:38+00:00

Currently AnyOf is not handled correctly during unrolling. This is
currently causing mis-compiles when vectorizing early-exit loops with
interleaving forced (even though selectInterleaveCount will currently
only pick IC = 1, unless forced by the user).

This patch updates handling of AnyOf to be analogous to computing final
reduction results: during unrolling, the created copies for its original
operand are added as additional operands, and AnyOf will always produce
the reduced value across all unrolled iterations.

Note that the generated code is still incorrect, as we also need to
handle FirstActiveLane and ExtractElement with FirstActiveLane operands.
I will share patches for those soon as well.

PR: https://github.com/llvm/llvm-project/pull/145340

[VPlan] Unroll VPReplicateRecipe by VF. (#142433)

2025-06-26T10:19:09+00:00

Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.

It also adds a few folds to simplify unnecessary extracts/BuildVectors.

It also adds a BuildStructVector opcode for handling of calls that have
struct return types.

VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.

PR: https://github.com/llvm/llvm-project/pull/142433