summaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/InterleavedAccessPass.cpp
AgeCommit message (Collapse)Author
2025-11-06Revert "[InterleavedAccess] Construct interleaved access store with shuffles"Martin Storsjö
This reverts commit 78d649199b47370b72848c1ca8d9bd3323b050ac. That commit caused failed asserts, see https://github.com/llvm/llvm-project/pull/164000 for details.
2025-11-05[InterleavedAccess] Construct interleaved access store with shufflesRamkrishnan
Cost of interleaved store of 8 factor and 16 factor are cheaper in AArch64 with additional interleave instructions.
2025-10-20[IR] Replace alignment argument with attribute on masked intrinsics (#163802)Nikita Popov
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).
2025-09-17[PatternMatch] Introduce match functor (NFC) (#159386)Ramkumar Ramachandra
A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-03[CodeGen] Fix failing assert in interleaved access pass (#156457)David Sherwood
In the InterleavedAccessPass the function getMask assumes that shufflevector operations are always fixed width, which isn't true because we use them for splats of scalable vectors. This patch fixes the code by bailing out for scalable vectors.
2025-08-26[IA][RISCV] Recognize interleaving stores that could lower to strided ↵Min-Yih Hsu
segmented stores (#154647) This is a sibling patch to #151612: passing gap masks to the renewal TLI hooks for lowering interleaved stores that use shufflevector to do the interleaving.
2025-08-15[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN ↵Min-Yih Hsu
intrinsics (#153510) If the mask of a (fixed-vector) deinterleaved load is assembled by `vector.interleaveN` intrinsic, any intrinsic arguments that are all-zeros are regarded as gaps.
2025-08-14[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324)Min-Yih Hsu
For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is synthesized by the following snippet: ``` %m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3> %g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0> %c = and %m, %g ``` Then we can know that `%g` is the gap mask and `%s` is the mask for each field / component. This patch teaches InterleaveAccess pass to recognize such patterns
2025-08-12[IA][RISCV] Recognize deinterleaved loads that could lower to strided ↵Min-Yih Hsu
segmented loads (#151612) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-07-26[IA] Fix a bug introduced by a recent refactoringPhilip Reames
I had dropped the check for which intrinsics were supported. This is a quick fix to get tree back into an unbroken state, a cleaner change may follow.
2025-07-24[IA] Recognize repeated masks which come from shuffle vectors (#150285)Philip Reames
This extends the fixed vector lowering to support the case where the mask is formed via shufflevector idiom. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-07-23[IA] Add masked.load/store support for shuffle (de)interleave load/store ↵Philip Reames
(#150241) This completes the basic support for masked.laod and masked.store in InterleaveAccess. The backend already added via the intrinsic lowering path and the common code structure (in RISCV at least). Note that this isn't enough to enable in LV yet. We still need support for recognizing an interleaved mask via a shufflevector in getMask.
2025-07-22[IA] Support vp.store in lowerinterleavedStore (#149605)Philip Reames
Follow up to 28417e64, and the whole line of work started with 4b81dc7. This change merges the handling for VPStore - currently in lowerInterleavedVPStore - into the existing dedicated routine used in the shuffle lowering path. This removes the last use of the dedicated lowerInterleavedVPStore and thus we can remove it. This contains two changes which are functional. First, like in 28417e64, merging support for vp.store exposes the strided store optimization for code using vp.store. Second, it seems the strided store case had a significant missed optimization. We were performing the strided store at the full unit strided store type width (i.e. LMUL) rather than reducing it to match the input width. This became obvious when I tried to use the mask created by the helper routine as it caused a type incompatibility. Normally, I'd try not to include an optimization in an API rework, but structuring the code to both be correct for vp.store and not optimize the existing case turned out be more involved than seemed worthwhile. I could pull this part out as a pre-change, but its a bit awkward on it's own as it turns out to be somewhat of a half step on the possible optimization; the full optimization is complex with the old code structure. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-07-22[IA] Remove resriction on constant masks for shuffle lowering (#150098)Philip Reames
The point of this change is simply to show that the constant check was not required for correctness. The mixed intrinsic and shuffle tests are added purely to exercise the code. An upcoming change will add support for shuffle matching in getMask to support non-constant fixed vector cases.
2025-07-21[RISCV][IA] Support masked.store of deinterleaveN intrinsic (#149893)Philip Reames
This is the masked.store side to the masked.load support added in 881b3fd. With this change, we support masked.load and masked.store via the intrinsic lowering path used primarily with scalable vectors. An upcoming change will extend the fixed vector (i.a. shuffle vector) paths in the same manner.
2025-07-21[IA] Naming and style cleanup [nfc]Philip Reames
1) Rename argument II to something slightly more descriptive since we have more than one IntrinsicInst flowing through. 2) Perform a checked dyn_cast early to eliminate two casts later in each routine.
2025-07-21[RISCV][IA] Support masked.load for deinterleaveN matching (#149556)Philip Reames
This builds on the whole series of recent API reworks to implement support for deinterleaveN of masked.load. The goal is to be able to enable masked interleave groups in the vectorizer once all the codegen and costing pieces are in place. I considered including the shuffle path support in this review as well (since the RISCV target specific stuff should be common), but decided to separate it into it's own review just to focus attention on one thing at a time.
2025-07-17[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174)Philip Reames
This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.
2025-07-16[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978) ↵Philip Reames
(#149168) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPStore - currently in lowerInterleavedVPStore which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine.
2025-07-16[IA] Relax the requirement of having ExtractValue users on deinterleave ↵Min-Yih Hsu
intrinsic (#148716) There are cases where InstCombine / InstSimplify might sink extractvalue instructions that use a deinterleave intrinsic into successor blocks, which prevents InterleavedAccess from kicking in because the current pattern requires deinterleave intrinsic to be used by extractvalue. However, this requirement is bit too strict while we could have just replaced the users of deinterleave intrinsic with whatever generated by the target TLI hooks.
2025-07-15[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978)Philip Reames
This essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine. My plan is that if we like this factoring is that I'll do the same for the intrinsic store paths, and then remove the excess generality from the shuffle paths since we don't need to support both modes in the shared VPLoad/Store callbacks. We can probably even fold the VP versions into the non-VP shuffle variants in the analogous way.
2025-07-14[IA][NFC] Factoring out helper functions that extract (de)interleaving ↵Min-Yih Hsu
factors (#148689) Factoring out and combining `isInterleaveIntrinsic`, `isDeinterleaveIntrinsic`, and `getIntrinsicFactor` into `getInterleaveIntrinsicFactor` and `getDeinterleaveIntrinsicFactor` inside VectorUtils. NFC.
2025-07-09[IA] Partially revert interface change from 4a66baPhilip Reames
As noted in post commit review, the API change here was not required. I'd apparently confused myself when teasing apart patches from my development branch.
2025-07-09[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572)Philip Reames
For the fixed vector cases, we already support this, but the deinterleave intrinsic cases (primary used by scalable vectors) didn't. Supporting it requires plumbing through the Factor separately from the extracts, as there can now be fewer extracts than the Factor. Note that the fixed vector path handles this slightly differently - it uses the shuffle and indices scheme to achieve the same thing.
2025-07-08[InterleavedAccessPass] Add skipFunction check for opt-bisect-limit (#147629)Craig Topper
2025-06-25[IA] Remove recursive [de]interleaving support (#143875)Luke Lau
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in #139893.
2025-05-28[IA] Add support for [de]interleave{4,6,8} (#141512)Luke Lau
This teaches the interleaved access pass to the lower the intrinsics for factors 4,6 and 8 added in #139893 to target intrinsics. Because factors 4 and 8 could either have been recursively [de]interleaved or have just been a single intrinsic, we need to check that it's the former it before reshuffling around the values via interleaveLeafValues. After this patch, we can teach the loop vectorizer to emit a single interleave intrinsic for factors 2 through to 8, and then we can remove the recursive interleaving matching in interleaved access pass.
2025-05-22[IA] Add support for [de]interleave{3,5,7} (#139373)Luke Lau
This adds support for lowering deinterleave and interleave intrinsics for factors 3 5 and 7 into target specific memory intrinsics. Notably this doesn't add support for handling higher factors constructed from interleaving interleave intrinsics, e.g. factor 6 from interleave3 + interleave2. I initially tried this but it became very complex very quickly. For example, because there's now multiple factors involved interleaveLeafValues is no longer symmetric between interleaving and deinterleaving. There's then also two ways of representing a factor 6 deinterleave: It can both be done as either 1 deinterleave3 and 3 deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s. I'm not sure the complexity of supporting arbitrary factors is warranted given how we only need to support a small number of factors currently: SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8. My preference would be to just add a interleave6 and deinterleave6 intrinsic to avoid all this ambiguity, but I'll defer this discussion to a later patch.
2025-05-07[IA][RISCV] Add support for vp.load/vp.store with shufflevector (#135445)Min-Yih Hsu
Teach InterleavedAccessPass to recognize vp.load + shufflevector and shufflevector + vp.store. Though this patch only adds RISC-V support to actually lower this pattern. The vp.load/vp.store in this pattern require constant mask.
2025-04-24[IA] Remove unused argument. NFCLuke Lau
2025-03-20[llvm] Use *Set::insert_range (NFC) (#132325)Kazu Hirata
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2025-02-04[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490)Min-Yih Hsu
Teach InterleavedAccessPass to recognize the following patterns: - vp.store an interleaved scalable vector - Deinterleaving a scalable vector loaded from vp.load Upon recognizing these patterns, IA will collect the interleaved / deinterleaved operands and delegate them over to their respective newly-added TLI hooks. For RISC-V, these patterns are lowered into segmented loads/stores Right now we only recognized power-of-two (de)interleave cases, in which (de)interleave4/8 are synthesized from a tree of (de)interleave2. --------- Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>
2025-01-23[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863)Min-Yih Hsu
Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.
2025-01-14[InterleavedAccessPass]: Ensure that dead nodes get erased only once (#122643)Hassnaa Hamdi
Use SmallSetVector instead of SmallVector to avoid duplication, so that dead nodes get erased/deleted only once.
2024-11-12[CodeGen] Remove unused includes (NFC) (#115996)Kazu Hirata
Identified with misc-include-cleaner.
2024-08-28[InterleavedAccess] Use SmallVectorImpl references. NFCCraig Topper
Instead of repeating SmallVector size in multiple places.
2024-08-12[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276)Hassnaa Hamdi
- [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch, and emit equivalent sve.ld4 and sve.st4 intrinsics.
2024-04-29Move several vector intrinsics out of experimental namespace (#88748)Maciej Gabka
This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.
2024-04-21[AArch64] Add costs for LD3/LD4 shuffles.David Green
Similar to #87934, this adds costs to the shuffles in a canonical LD3/LD4 pattern, which are represented in LLVM as deinterleaving-shuffle(load). This likely has less effect at the moment than the ST3/ST4 costs as instcombine will perform certain transforms without considering the cost.
2024-03-19[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)Jeremy Morse
These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>
2023-12-10[CodeGen] Port `InterleavedAccess` to new pass manager (#74904)paperchalice
2023-11-07[InterleavedAccessPass] Avoid optimizing load instructions if it has dead ↵Skwoogey
binop users (#71339) If a load instruction qualifies to be optimized by InterleavedAccess Pass, but also has a dead binop instruction, this will lead to a crash. Binop instruction will not be deleted, because normally it would be deleted through its' users, but it has none. Later on deleting a load instruction will fail because it still has uses.
2023-06-26[AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2Graham Hunter
The InterleavedAccess pass currently matches (de)interleaving shufflevector instructions with loads or stores, and calls into target lowering to generate ldN or stN instructions. Since we can't use shufflevector for scalable vectors (besides a splat with zeroinitializer), we have interleave2 and deinterleave2 intrinsics. This patch extends InterleavedAccess to recognize those intrinsics and if possible replace them with ld2/st2. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D146218
2023-04-21Fix uninitialized scalar members in CodeGenAkshay Khadse
This change fixes some static code analysis warnings. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148811
2023-03-31[InterleaveAccess] Check that binop shuffles have an undef second operandDavid Green
It is expected that shuffles that we hoist through binops only have a single vector operand, the other being undef/poison. The checks for isDeInterleaveMaskOfFactor check that all the elements come from inside the first vector, but with non-canonical shuffles the second operand could still have a value. Add a quick check to make sure it is UndefValue as expected, to make sure we don't run into problems with BinOpShuffles not using BinOps. Fixes #61749 Differential Revision: https://reviews.llvm.org/D147306
2023-03-14[RISCV][NFC] Share interleave mask checking logicLuke Lau
This adds two new methods to ShuffleVectorInst, isInterleave and isInterleaveMask, so that the logic to check if a shuffle mask is an interleave can be shared across the TTI, codegen and the interleaved access pass. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145971
2022-07-17[CodeGen] Qualify auto variables in for loops (NFC)Kazu Hirata
2022-07-10[InterleaveAccessPass] Handle multi-use binop shufflesDavid Green
D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419
2022-03-16Cleanup codegen includesserge-sans-paille
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-12Cleanup includes: DebugInfo & CodeGenserge-sans-paille
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332