llvm-project.git/llvm/lib/Transforms/Vectorize/VectorCombine.cpp, branch users/chapuni/cov/single/unify

[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068)

2025-01-08T20:48:40+00:00

This allows it to produce a more accurate cost for the shuffle, using
the more accurate calls to getShuffleCost in getInstructionCost. It
helps fix some of the regressions from vector combine a little while
ago, now that we have better subvector extract costs.

[CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0

2025-01-07T12:23:45+00:00

This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0)

This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).

[VectorCombine] Remove superfluous whitespace from debug log comment. NFC.

2025-01-06T15:37:15+00:00

[VectorCombine] foldInsExtVectorToShuffle - ignore shuffle costs for 'identity' insertion masks

2025-01-05T13:02:31+00:00

 'inplace' single src shuffles can be treated as free identity shuffles - ignore any shuffle cost (similar to what we already do in other folds like foldShuffleOfShuffles) - eventually getShuffleCost should just return TCC_Free in these cases but in a lot of the targets' shuffle cost logic this currently ends up treated as a generic SK_PermuteSingleSrc.

We still want to generate the shuffle as it will help further shuffle folds with the additional PoisonMaskElem 'undemanded' elements.

[VectorCombine] foldShuffleOfBinops - fold shuffle(binop(shuffle(x),shuffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984)

2025-01-03T10:29:07+00:00

Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp.

Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost.

One of the final steps towards finally addressing #34072

[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands (REAPPLIED)

2025-01-02T18:19:02+00:00

As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Reapplied with a fix for foldSingleElementStore/scalarizeLoadExtract which were replacing/removing memory operations - we need to ensure that the worklist is populated in the correct order so all users of the old memory operations are erased first, so there are no remaining users of the loads when its time to remove them as well.

Pulled out of #120984

[VectorCombine] replaceValue - add "VC: Replacing" debug message to help the log show replacement for old/new.

2025-01-02T17:23:13+00:00

[VectorCombine] scalarizeLoadExtract - consistently use LoadInst and ExtractElementInst specific operand getters. NFC

2024-12-31T14:42:39+00:00

Noticed while investigating the hung builds reported after af83093933ca73bc82c33130f8bda9f1ae54aae2

Revert af83093933ca73bc82c33130f8bda9f1ae54aae2 "[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands"

2024-12-30T21:20:56+00:00

Reports of hung builds, but I don't have time to investigate at the moment.

[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands

2024-12-30T17:52:42+00:00

As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Pulled out of #120984