| Age | Commit message (Collapse) | Author |
|
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction.
PR: https://github.com/llvm/llvm-project/pull/166099
|
|
After truncating an integer-induction, neither nuw nor nsw hold.
Fixes #168902.
Co-authored-by: Florian Hahn <flo@fhahn.com>
|
|
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.
This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.
Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.
Should be NFC w.r.t. to the generated IR.
PR: https://github.com/llvm/llvm-project/pull/168450
|
|
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.
This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.
PR: https://github.com/llvm/llvm-project/pull/166247
|
|
Update VPlan to populate VPIRMetadata during VPInstruction construction
and use it when creating widened recipes, instead of constructing
VPIRMetadata from the underlying IR instruction each time.
This centralizes VPIRMetadata in VPInstructions and ensures metadata is
consistently available throughout VPlan transformations.
PR: https://github.com/llvm/llvm-project/pull/167253
|
|
Replace addMetadata with setMetadata, which sets metadata, updating
existing entries or adding a new entry otherwise.
This isn't strictly needed at the moment, but will be needed for
follow-up patches.
|
|
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.
While at it, record VPIRFlags in VPWidenInductionRecipe.
|
|
|
|
|
|
|
|
Reverts llvm/llvm-project#163538
This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822
This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
|
|
While at it, record VPIRFlags in VPWidenInductionRecipe.
|
|
(#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.
This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
|
|
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.
This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
|
|
Follow up on c2d4c7c18b96 ([VPlan] Permit more users in
narrowToSingleScalars) to fix an assert related to WidenStore users of
the recipe being narrowed in narrowToSingleScalars.
|
|
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).
When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.
See also https://github.com/llvm/llvm-project/issues/148603
PR: https://github.com/llvm/llvm-project/pull/149042
|
|
On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.
narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.
This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.
Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
|
|
Fold
or (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
or (fcmp uno %A, %B), ...
This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.
Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM
Combine suggested as part of #161735
PR: https://github.com/llvm/llvm-project/pull/167251
|
|
(#154072)
Since div/rem operations don’t support a mask operand, the lanes of the
divisor that are masked out are currently replaced with 1 using
VPInstruction::Select before the predicated div/rem operation.
This patch replaces
```
VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)
```
with
```
vp.merge(conditional_mask, LHS, RHS, EVL)
```
so that the header mask can be replaced by EVL in this usage scenario
when tail folding with EVL.
|
|
Classof for most recipes directly supports VPValue, so there is no need
to call getDefiningRecipe when using isa/cast/dyn_cast.
|
|
Add helper to make it easier to retrieve the single user of a VPUser.
|
|
This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it
causes buildbot failures. This will give us time to investigate the
failure.
https://lab.llvm.org/buildbot/#/builders/210/builds/5160
|
|
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
|
|
This allows us to strip a special case in VPWidenGEP::execute.
|
|
This hardens the code to check based on WideMember0's operands. This
ensures each call will go through the same check. Should be NFC
currently but needed when generalizing in follow-up patches.
|
|
narrowToSingleScalarRecipes can permit users that are WidenStore, or a
VPInstruction that has a suitable opcode. This is a generalization and
extension of the existing code.
|
|
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
|
|
Follow-up to post-commit suggestion in
https://github.com/llvm/llvm-project/pull/165506.
C must be a single-scalar, turn check into assert.
|
|
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line
with usesScalars, for clarity.
|
|
This patch removes the explicit Alignment parameter from
VPWidenLoadRecipe and VPWidenStoreRecipe constructors. Instead, these
recipes now directly retrieve the alignment from their
LoadInst/StoreInst.
|
|
Move and combine the code to narrow ops feeding interleave groups to a
single unified static helper. NFC, as legalization logic has not
changed.
|
|
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.
If the condition is a single-scalar, we can simply use a scalar
condition.
PR: https://github.com/llvm/llvm-project/pull/165506
|
|
Use cannotHoistOrSinkRecipe to forbid sinking allocas.
|
|
To follow-up on a post-commit review.
|
|
To follow-up on a post-commit review.
|
|
nfc (#165568)
The function simplifyRecipe now takes a VPSingleDefRecipe pointer since
it only simplifies single-def recipes for now.
|
|
Currently in optimizeMaskToEVL we convert every widened load, store or
reduction to a VP predicated recipe with EVL, regardless of whether or
not it uses the header mask.
So currently we have to be careful when working on other parts VPlan to
make sure that the EVL transform doesn't break or transform something
incorrectly, because it's not a semantics preserving transform.
Forgetting to do so has caused miscompiles before, like the case that
was fixed in #113667
This PR rewrites it to work in terms of pattern matching, so it now only
converts a recipe to a VP predicated recipe if it is exactly masked with
the header mask.
After this the transform should be a true optimisation and not change
any semantics, so it shouldn't miscompile things if other parts of VPlan
change.
This fixes #152541, and allows us to move addExplicitVectorLength into
tryToBuildVPlanWithVPRecipes in #153144
It also splits out the load/store transforms into separate patterns for
reversed and non-reversed, which should make #146525 easier to implement
and reason about.
|
|
Rewrite sinkScalarOperands in VPlanTransforms for clarity, in
preparation for follow-up work to extend it to handle more recipes.
|
|
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.
Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
|
|
Fold BuildVector where all operands are equal to Broadcast of the first
operand. This will subsequently make it easier to remove additional
buildvectors/broadcasts, e.g. via
https://github.com/llvm/llvm-project/pull/165506.
PR: https://github.com/llvm/llvm-project/pull/165826
|
|
Add getConstantInt helper methods to VPlan to simplify the common
pattern of creating constant integer live-ins.
Suggested as follow-up in
https://github.com/llvm/llvm-project/pull/164127.
|
|
Split off from https://github.com/llvm/llvm-project/pull/156262.
Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of
the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe.
PR: https://github.com/llvm/llvm-project/pull/164127
|
|
Directly remove RepOrWidenR after replacing all uses. Removing the dead
user early unlocks additional opportunities for further narrowing.
|
|
This follows similar reasoning as 45ce88758d24
(https://github.com/llvm/llvm-project/pull/159556):
LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.
LV creates SCEV and memory runtime checks early on and then disconnects
the blocks temporarily. The patch fixes a mis-compile, where previously
LCSSA construction during SCEV expand may replace uses in currently
unreachable SCEV/memory check blocks.
Fixes https://github.com/llvm/llvm-project/issues/162512
PR: https://github.com/llvm/llvm-project/pull/165505
|
|
A reduction (including partial reductions) with a multiply of a constant
value can be bundled by first converting it from `reduce.add(mul(ext,
const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to
extend the constant.
This PR adds such bundling by first truncating the constant to the
source type of the other extend, then extending it to the destination
type of the extend. The first truncate is necessary so that the types of
each extend's operand are then the same, and the call to
canConstantBeExtended proves that the extend following a truncate is
safe to do. The truncate is removed by optimisations.
This is a stacked PR, 1a and 1b can be merged in any order:
1a. https://github.com/llvm/llvm-project/pull/147302
1b. https://github.com/llvm/llvm-project/pull/163175
2. -> https://github.com/llvm/llvm-project/pull/162503
|
|
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
|
|
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
|
|
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
|
|
Currently only regions with a single block are supported by the legality
checks.
|
|
This PR bundles partial reductions inside the VPExpressionRecipe class.
Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
|