llvm-project.git/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp, branch main

Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"

2025-11-13T22:34:55+00:00

This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221

[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)

2025-11-12T15:11:00+00:00

Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042

[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)

2025-10-19T18:49:05+00:00

Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670

[VPlan] Add VPRecipeBase::getRegion helper (NFC).

2025-10-18T20:25:34+00:00

Multiple places retrieve the region for a recipe. Add a helper to make
the code more compact and clearer.

[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)

2025-10-15T12:46:09+00:00

When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056

[LV] Don't create partial reductions if factor doesn't match accumulator (#158603)

2025-09-24T11:21:03+00:00

Check if the scale-factor of the accumulator is the same as the request
ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the
reduction chain form partial reductions. e.g. because we do not form a
partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of
VPPartialReduction is not used during ::execute, but it means we compute
incorrect cost/register pressure, because the partial reduction won't
reduce to the specified scaling factor.

PR: https://github.com/llvm/llvm-project/pull/158603

[VPlan] Track VPValues instead of VPRecipes in calculateRegisterUsage. (#155301)

2025-09-15T19:55:11+00:00

Update calculateRegisterUsageForPlan to track live-ness of VPValues
instead of recipes. This gives slightly more accurate results for
recipes that define multiple values (i.e. VPInterleaveRecipe).

When tracking the live-ness of recipes, all VPValues defined by an
VPInterleaveRecipe are considered alive until the last use of any of
them. When tracking the live-ness of individual VPValues, we can
accurately track the individual values until their last use.

Note the changes in large-loop-rdx.ll and pr47437.ll. This patch
restores the original behavior before introducing VPlan-based liveness
tracking.

PR: https://github.com/llvm/llvm-project/pull/155301

[LV][EVL] Support interleaved access with tail folding by EVL (#152070)

2025-09-01T13:20:06+00:00

The InterleavedAccess pass already supports transforming
vector-predicated (vp) load/store intrinsics. With this patch, we start
enabling interleaved access under tail folding by EVL.

This patch introduces a new base class, VPInterleaveBase, and a concrete
class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and
the new VPInterleaveEVLRecipe inherit from and implement
VPInterleaveBase.

Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL
operand to emit vp.load/vp.store intrinsics.

Currently, tail folding by EVL is only supported for scalable
vectorization. Therefore, VPInterleaveEVLRecipe will only emit
interleave/deinterleave intrinsics. Reverse accesses are not yet
implemented, as masked reverse interleaved access under tail folding is
not yet supported.

Fixed #123201

[VPlan] Store LoopRegion in variable in calculateRegisterUsage... (NFC)

2025-08-23T16:43:25+00:00

[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482)

2025-08-20T23:39:01+00:00

`VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from
vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing ``.

Note that this test is basically copied from AArch64.