llvm-project.git/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp, branch users/SamTebbs33/expression-recipe-sub

[LV] Bundle sub reductions into VPExpressionRecipe

2025-07-06T21:21:12+00:00

This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

[LV] Create in-loop sub reductions

2025-07-02T15:40:23+00:00

This PR allows the loop vectorizer to handle sub reductions by forming a
normal add reduction with a negated input.

[LV] Add support for partial reductions without a binary op (#133922)

2025-07-02T12:05:51+00:00

Consider IR such as this:

for.body:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
  %accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %gep.a = getelementptr i8, ptr %a, i64 %iv
  %load.a = load i8, ptr %gep.a, align 1
  %ext.a = zext i8 %load.a to i32
  %add = add i32 %ext.a, %accum
  %iv.next = add i64 %iv, 1
  %exitcond.not = icmp eq i64 %iv.next, 1025
  br i1 %exitcond.not, label %for.exit, label %for.body

Conceptually we can vectorise this using partial reductions too,
although the current loop vectoriser implementation requires the
accumulation of a multiply. For AArch64 this is easily done with
a udot or sdot with an identity operand, i.e. a vector of (i16 1).

In order to do this I had to teach getScaledReductions that the
accumulated value may come from a unary op, hence there is only
one extension to consider. Similarly, I updated the vplan and
AArch64 TTI cost model to understand the possible unary op.

---------

Co-authored-by: Matt Devereau

[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864)

2025-07-02T10:16:02+00:00

A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.

Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.

The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.

One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.

[LV] Use vscale for tuning to improve branch weight estimates (#144733)

2025-07-01T12:23:38+00:00

In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.

[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860)

2025-06-30T21:39:17+00:00

Instead of looking up the narrower reduction type via getRecurrenceType
we can generate the needed extend directly at constructiond re-use the
truncated value from the loop.

PR: https://github.com/llvm/llvm-project/pull/141860

[LV] Add support for cmp reductions with decreasing IVs. (#140451)

2025-06-29T10:17:03+00:00

Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451

[LV] Improve code using [[maybe_unused]] (NFC) (#137138)

2025-06-27T09:58:17+00:00

[LV] Enable auto-vectorisation of loops with uncountable exits (#133099)

2025-06-27T09:39:33+00:00

Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we still have some
time from the next LLVM release it seems like a good time point
to enable the feature by default. If any issues arise post-commit
it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I also built and ran SPEC2017 successfully for
both neoverse-v1 and neoverse-v2.

[LV] Disable interleaving via hints for uncountable early exit loops (#145877)

2025-06-27T08:09:55+00:00

Currently if the user enables interleaving during vectorisation of
uncountable early exit loops via the interleave_count pragma and the
enable-early-exit-vectorization option, it will miscompile. There is
ongoing work to fix this, but for now it seems safer to ignore the hint
until it is supported.

---------

Co-authored-by: Paul Walker