summaryrefslogtreecommitdiff
path: root/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
AgeCommit message (Collapse)Author
2025-11-20[mlir][SCF] Add `scf::tileAndFuseConsumer` that tiles a consumer into a ↵MaheshRavishankar
given tiled loop nest. (#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-10-06[mlir] Simplify unreachable type switch cases. NFC. (#162032)Jakub Kuderski
Use `DefaultUnreachable` from https://github.com/llvm/llvm-project/pull/161970.
2025-09-30[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using ↵sebvince
CustomOp. (#161386) This PR adds to the generateLoopTerminatorFn callback the loops generated by GenerateLoopHeaderFn. This is needed to correctly set the insertion point with scf.forall ops.
2025-09-22[mlir][SCF] Allow using a custom operation to generate loops with ↵MaheshRavishankar
`mlir::tileUsingSCF`. (#159660) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to scf::SCFTilingOptions::LoopType::CustomOp, the method mlir::tileUsingSCF provides two callback functions First one to generate the header of the loop. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Note that this is duplicate of https://github.com/llvm/llvm-project/pull/159506 that was accidently committed and was reverted in https://github.com/llvm/llvm-project/pull/159598 to wait for reviews. Signed-off-by: MaheshRavishankar [mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com) --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-23[mlir][scf] Modify the return logic of generateLoopNestUsingForOp (NFC) ↵lonely eagle
(#159394) When loops is empty, avoid executing yieldTiledValuesFn and Add a test which all tile sizes are set to zero.
2025-09-18Revert "[mlir][SCF] Allow using a custom operation to generate loops with ↵MaheshRavishankar
`mlir::tileUsingSCF`." (#159598) Reverts llvm/llvm-project#159506 It was committed by accident. Reverting it for reviews.
2025-09-18[mlir][SCF] Allow using a custom operation to generate loops with ↵MaheshRavishankar
`mlir::tileUsingSCF`. (#159506) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to `scf::SCFTilingOptions::LoopType::CustomOp`, the method `mlir::tileUsingSCF` provides two callback functions 1. First one to generate the header of the loop. 2. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. 1) The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. 2) The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-08-26[mlir][scf] Expose isPerfectlyNestedForLoops (#152115)Shay Kleiman
The function `isPerfectlyNestedForLoops` is useful on its own and so I'm exposing it for downstream use.
2025-08-15[mlir][SCF] `scf.for`: Add support for unsigned integer comparison (#153379)Matthias Springer
Add a new unit attribute to allow for unsigned integer comparison. Example: ```mlir scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 { // body } ``` Discussion: https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655
2025-07-24[mlir][NFC] update `mlir/Dialect` create APIs (20/n) (#149927)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-23[mlir] Remove unused includes (NFC) (#150266)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-07-09[mlir][TilingInterface] Allow tile and fuse to work with ↵MaheshRavishankar
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy`. (#147593) Since `scf::tileUsingSCF` is the core method used for tiling the root operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the latter can fuse into any tiled loop generated using `scf::tileUsingSCF`. This patch adds a test for tiling a root operation using `ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and fusing producers with it. Since this strategy generates a rank-reducing extract slice `tensor::replaceExtractSliceWithTiledProducer` which is the core method used for the fusion was extended to handle the rank-reducing slices. Also fix a small bug in the computation of the reduction induction variable (which needs to use `floorDiv` instead of `ceilDiv`) Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)MaheshRavishankar
For consumer fusion cases of this form ``` %0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) { tensor.parallel_insert_slice ... into %arg0 tensor.parallel_insert_slice ... into %arg1 } %1 = linalg.generic ... ins(%0#0, %0#1) ``` the current consumer fusion that handles one slice at a time cannot fuse the consumer into the loop, since fusing along one slice will create and SSA violation on the other use from the `scf.forall`. The solution is to allow consumer fusion to allow considering multiple slices at once. This PR changes the `TilingInterface` methods related to consumer fusion, i.e. - `getTiledImplementationFromOperandTile` - `getIterationDomainFromOperandTile` to allow fusion while considering multiple operands. It is upto the `TilingInterface` implementation to return an error if a list of tiles of the operands cannot result in a consistent implementation of the tiled operation. The Linalg operation implementation of `TilingInterface` has been modified to account for these changes and allow cases where operand tiles that can result in a consistent tiling implementation are handled. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-23[mlir][PartialReductionTilingInterface] Add support for ↵MaheshRavishankar
`ReductionTilingStrategy::PartialReductionOuterParallel` in `tileUsingSCF`. (#143988) Following up from https://github.com/llvm/llvm-project/pull/143467, this PR adds support for `ReductionTilingStrategy::PartialReductionOuterParallel` to `tileUsingSCF`. The implementation of `PartialReductionTilingInterface` for `Linalg` ops has been updated to support this strategy as well. This makes the `tileUsingSCF` come on par with `linalg::tileReductionUsingForall` which will be deprecated subsequently. Changes summary - `PartialReductionTilingInterface` changes : - `tileToPartialReduction` method needed to get the induction variables of the generated tile loops. This was needed to keep the generated code similar to `linalg::tileReductionUsingForall`, specifically to create a simplified access for slicing the intermediate partial results tensor when tiled in `num_threads` mode. - `getPartialResultTilePosition` methods needs the induction varialbes for the generated tile loops for the same reason above, and also needs the `tilingStrategy` to be passed in to generate correct code. The tests in `transform-tile-reduction.mlir` testing the `linalg::tileReductionUsingForall` have been moved over to test `scf::tileUsingSCF` with `ReductionTilingStrategy::PartialReductionOuterParallel` strategy. Some of the test that were doing further cyclic distribution of the transformed code from tiling are removed. Those seem like two separate transformation that were merged into one. Ideally that would need to happen when resolving the `scf.forall` rather than during tiling. Please review only the top commit. Depends on https://github.com/llvm/llvm-project/pull/143467 Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-23[mlir][PartialReductionTilingInterface] Generalize implementation of ↵MaheshRavishankar
`tileUsingSCF` for `ReductionTilingStrategy::PartialOuterReduction`. (#143467) This is a precursor to generalizing the `tileUsingSCF` to handle `ReductionTilingStrategy::PartialOuterParallel` strategy. This change itself is generalizing/refactoring the current implementation that supports only `ReductionTilingStrategy::PartialOuterReduction`. Changes in this PR - Move the `ReductionTilingStrategy` enum out of `scf::SCFTilingOptions` and make them visible to `TilingInterface`. - `PartialTilingInterface` changes - Pass the `tilingStrategy` used for partial reduction to `tileToPartialReduction`. - Pass the reduction dimension along as `const llvm::SetVector<unsigned> &`. - Allow `scf::SCFTilingOptions` to set the reduction dimensions that are to be tiled. - Change `structured.tiled_reduction_using_for` to allow specification of the reduction dimensions to be partially tiled. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-10[mlir] Fix a warningKazu Hirata
This patch fixes: mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp:1066:62: error: missing field 'mergeOps' initializer [-Werror,-Wmissing-field-initializers] mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp:1076:75: error: missing field 'mergeOps' initializer [-Werror,-Wmissing-field-initializers]
2025-06-10[mlir][scf] Return `replacements` explicitly in `SCFTilingResult`. (#143217)MaheshRavishankar
In #120115 the replacements for the tiled operations were wrapped within the `MergeResult` object. That is a bit of an obfuscation and not immediately obvious where to get the replacements post tiling. This changes the `SCFTilingResult` to have `replacements` explicit (as it was before that change). `mergeOps` is added as a separate field of `SCFTilingResult`, which is empty when the reduction type is `FullReduction`. This is a API breaking change. All uses of `mergeResult.replacements` should be replaced with `replacements`. There was also an implicit assumption that `PartialReductionTilingInterface` is derived from `TilingInterface`, so all ops that implemented the `PartialReductionTilingInterface` were expected to implement the `TilingInterface` as well. This pre-dated the existence of derived inheritances. Make `PartialReductionTilingInterface` derive from `TilingInterface`. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-05-22[mlir] Fix unused-variable warningsKazu Hirata
This patch fixes warnings of the form: mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp:320:19: error: unused variable 'result' [-Werror,-Wunused-variable]
2025-05-22[MLIR] Change getBackwardSlice to return a logicalresult rather than crash ↵William Moses
(#140961) The current implementation of getBackwardSlice will crash if an operation in the dependency chain is defined by an operation with multiple regions or blocks. Crashing is bad (and forbids many analyses from using getBackwardSlice, as well as causing existing users of getBackwardSlice to fail for IR with this property). This PR instead causes the analysis to return a failure, rather than crash in the cases it cannot compute the full slice --------- Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
2025-05-20[mlir][NFC] Simplify constant checks with isOneInteger and renamed ↵Han-Chung Wang
isZeroInteger. (#139340) The revision adds isOneInteger helper, and simplifies the existing code with the two methods. It removes some lambda, which makes code cleaner. For downstream users, you can update the code with the below script. ```bash sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp ``` --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-05-05[mlir] Remove unused local variables (NFC) (#138481)Kazu Hirata
2025-04-24[mlir] add a fluent API to GreedyRewriterConfig (#137122)Oleksandr "Alex" Zinenko
This is similar to other configuration objects used across MLIR. Rename some fields to better reflect that they are no longer booleans. Reland 04d261101b4f229189463136a794e3e362a793af / #132253.
2025-04-18Revert "[mlir] add a fluent API to GreedyRewriterConfig (#132253)"Kazu Hirata
This reverts commit 63b8f1c9482ed0a964980df4aed89bef922b8078. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/172/builds/12083/steps/5/logs/stdio I've reproduced the error with a release build (-DCMAKE_BUILD_TYPE=Release).
2025-04-18[mlir] add a fluent API to GreedyRewriterConfig (#132253)Oleksandr "Alex" Zinenko
This is similar to other configuration objects used across MLIR.
2025-03-29[mlir] Use SetVector::insert_range (NFC) (#133595)Kazu Hirata
2025-03-24[mlir][TilingInterface] Make `tileAndFuseConsumerOfSlice` take surrounding ↵MaheshRavishankar
loops as an argument. (#132082) This gets the consumer fusion method in sync with the corresponding producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as input required use of complicated analysis to retrieve the surrounding loops which are very fragile. Just like the producer fusion method, the loops need to be taken in as an argument, with typically the loops being created by the tiling methods. Some utilities are added to check that the loops passed in are perfectly nested (in the case of an `scf.for` loop nest. This is change 1 of N to simplify the implementation of tile and fuse consumers. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-02[mlir][scf]-Fix reverse iterator overflow in loop traversal (#128421)Amir Bishara
Fix a bug in method `getUntiledProducerFromSliceSource` where address sanitizer fails compilation on heap buffer overflow for accessing value out of the iteration range. This PR fixes the issue and adds a lit test to reproduce it.
2024-12-27[mlir][scf] Add getPartialResultTilePosition to PartialReductionOpInterface ↵Kunwar Grover
(#120465) This PR adds a new interface method to PartialReductionOpInterface which allows it to query the result tile position for the partial result. Previously, tiling the reduction dimension with SplitReductionOuterReduction when the result has transposed parallel dimensions would produce wrong results. Other fixes that were needed to make this PR work: - Instead of ad-hoc logic to decide where to place the new reduction dimensions in the partial result based on the iteration space, the reduction dimensions are always appended to the partial result tensor. - Remove usage of PartialReductionOpInterface in Mesh dialect. The implementation was trying to just get a neutral element, but ended up trying to use PartialReductionOpInterface for it, which is not right. It was also passing the wrong sizes to it.
2024-12-24[mlir][scf] Track replacements using a listener in TileAndFuse (#120999)Kunwar Grover
This PR makes TileAndFuse explicitly track replacements using a listener instead of assuming that the results always come from the outer most tiling loop. scf::tileUsingInterface can introduce merge operations whose results are the actual replacements to use, instead of the outer most loop results.
2024-12-20[mlir] Enable decoupling two kinds of greedy behavior. (#104649)Jacques Pienaar
The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety) as the API arguments hasn't changed between the two.
2024-12-18[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation ↵Kunwar Grover
(#120115) This patch unifies the tiling implementation for tileUsingFor and tileReductionUsingFor. This is done by passing an addition option to SCFTilingOptions, allowing it to set how reduction dimensions should be tiled. Currently, there are 3 different options for reduction tiling: FullReduction (old tileUsingFor), PartialReductionOuterReduction (old tileReductionUsingFor) and PartialReductionOuterParallel (linalg::tileReductionUsingForall, this isn't implemented in this patch). The patch makes tileReductionUsingFor use the tileUsingFor implementation with the new reduction tiling options. There are no test changes because the implementation was doing almost the exactly same thing. This was also tested in IREE (which uses both these APIs heavily) and there were no test changes.
2024-11-13[mlir] Clamp UnPackOp tiling sizes from operand tile (#112429)Max191
The `getIterationDomainTileFromOperandTile` implementation for tensor.unpack did not clamp sizes when the unpack op had extract_slice semantics. This PR fixes the bug. The PR also makes a minor change to `tileAndFuseConsumerOfSlice`. When replacing DPS inits, the iteration domain is needed, and it is computed from the tiled version of the operation after the initial tiling transformation. This can result in some extra indexing computation, so the PR changes it to use the original full sized cloned consumer op. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2024-11-11[mlir][SCF] Fix condition for fusability in consumer fusion API (#115768)Quinn Dawkins
It was previously allowing either a tilable or dps op to be fused. Both are required for consumer fusion.
2024-11-06[mlir][scf] Extend consumer fusion to multiple tilable users (#111955)Yun-Fly
Before, consumer fusion expects single usage(or others are terminator op). This patch supports multiple tilable consumers fusion. E.g. ``` %0 = scf.for { ... %p = tiledProducer ... } %1 = tilableConsumer1 ins(%0 : ...) %2 = tilableConsumer2 ins(%0 : ...) ``` ===> ``` %0:3 = scf.for { ... %p = tiledProducer %1 = tiledConsumer1 ins(%p : ...) %2 = tiledConsumer2 ins(%p : ...) ... } ``` The key process is ensuring that the first user of loop should not dominate any define of consumer operand(s).
2024-10-04[mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554)Quinn Dawkins
The SCF helper for tiling an operation implementing the TilingInterface and greedily fusing consumers requires an uninterrupted chain of operations implementing the tiling interface to succeed. There can be cases with intermediate ops that don't implement the interface but have producers that could be fused if various canonicalization/simplification patterns could run in between fusion steps. This adds an option to SCFTileAndFuseOptions for a pattern set to run between fusion steps to the ops that result from fusion/tiling. Removed and newly inserted slices are tracked for continued fusion applications. See this RFC for more discussion: https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
2024-09-30[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer ↵Abhishek Varma
shared by terminator ops (#110105) -- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <abhvarma@amd.com>
2024-09-11[mlir][TilingInterface] Avoid looking at operands for getting slices to ↵MaheshRavishankar
continue tile + fuse. (#107882) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
2024-09-12[mlir][scf] Extend consumer fuse to single nested `scf.for` (#108318)Yun-Fly
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ``` Compared with #94190, this PR fix build failure by making C++17 happy.
2024-09-11Revert "[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)"Kazu Hirata
This reverts commit 2d4bdfba96d4cf88b12226b2b511bf55ee5e6559. A build breakage is reported at: https://lab.llvm.org/buildbot/#/builders/138/builds/3524
2024-09-12[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)Yun-Fly
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ```
2024-07-31[mlir][Linalg] Deprecate `linalg::tileToForallOp` and ↵MaheshRavishankar
`linalg::tileToForallOpUsingTileSizes` (#91878) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. **Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` **Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. **Changes to `transform.structured.tile_using_forall`** The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-06-28[mlir][scf] Extend option to yield replacement for multiple results case ↵Yun-Fly
(#93144) This patch extends the functionality of yielding replacement for multiple results case and adds another optional argument called `yieldResultNumber` indicating which result(s) need yield. If not given, all of results will be yield by default.
2024-06-18[mlir][TilingInterface] Update `PartialReductionOpInterface` to get it more ↵MaheshRavishankar
in line with `TilingInterface`. (#95460) The `TilingInterface` methods have return values that allow the interface implementation to return multiple operations, and also return tiled values explicitly. This is to avoid the assumption that the interface needs to return a single operation and this operations result are the expected tiled values. Make the `PartialReductionOpInterface::tileToPartialReduction` return `TilingResult` as well for the same reason. Similarly make the `PartialReductionOpInterface::mergeReductions` also return a list of generated operations and values to use as replacements. This is just a refactoring to allow for deprecation of `linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF` method.
2024-06-01[mlir][bazel] Add bazel build support for ↵MaheshRavishankar
https://github.com/llvm/llvm-project/commit/2b2ce50fe843b5b550806a0ab15b06cd5c405d48 (#94126) Also drop errant header include from `Linalg` dialect into `Dialect/SCF/Transforms/TileUsingInterface.cpp`
2024-06-01[MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)Abhishek Varma
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within scf.for/scf.forall loop. To support this two new methods are added to the `TilingInterface` - `getIterationDomainTileFromOperandTile` - `getTiledImplementationFromOperandTile`. Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer. Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time. Signed-off-by: Abhishek Varma <abhvarma@amd.com> --------- Signed-off-by: Abhishek Varma <abhvarma@amd.com> Signed-off-by: Abhishek Varma <avarma094@gmail.com> Co-authored-by: cxy <chenxunyu1993@gmail.com>
2024-05-22[mlir][TilingInterface] Allow multiple results in ↵Kunwar Grover
PartialReductionOpInterface (#92624) This patch adds support for reducing operations with multiple results using PartialReductionOpInterface. Also adds an implementation of PartialReductionOpInterface for multiple results for linalg.generic.
2024-01-25[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to ↵MaheshRavishankar
unify tiling with `scf.for` and `scf.forall`. (#77874) Using `LoopLikeOpInterface` as the basis for the implementation unifies all the tiling logic for both `scf.for` and `scf.forall`. The only difference is the actual loop generation. This is a follow up to https://github.com/llvm/llvm-project/pull/72178 Instead of many entry points for each loop type, the loop type is now passed as part of the options passed to the tiling method. This is a breaking change with the following changes 1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF` 2) The `scf::tileUsingSCFForallOp` is deprecated. The same functionality is obtained by using `scf::tileUsingSCF` and setting the loop type in `scf::SCFTilingOptions` passed into this method to `scf::SCFTilingOptions::LoopType::ForallOp` (using the `setLoopType` method). 3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing any strategy with the default callback implemeting the greedy fusion. 4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use `SmallVector<LoopLikeOpInterface>`. 5) To make `scf::ForallOp` implement the parts of `LoopLikeOpInterface` needed, the `getOutputBlockArguments()` method is replaced with `getRegionIterArgs()` These changes now bring the tiling and fusion capabilities using `scf.forall` on par with what was already supported by `scf.for`
2024-01-17[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)Matthias Springer
This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).
2024-01-11[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ↵MaheshRavishankar
ops. (#77204) In the process a couple of test transform dialect ops are added just for testing. These operations are not intended to use as full flushed out of transformation ops, but are rather operations added for testing. A separate operation is added to `LinalgTransformOps.td` to convert a `TilingInterface` operation to loops using the `generateScalarImplementation` method implemented by the operation. Eventually this and other operations related to tiling using the `TilingInterface` need to move to a better place (i.e. out of `Linalg` dialect)
2024-01-08[mlir][TilingInterface] Allow controlling what fusion is done within tile ↵MaheshRavishankar
and fuse (#76871) Currently the `tileConsumerAndFuseProducerGreedilyUsingSCFFor` method greedily fuses through all slices that are generated during the tile and fuse flow. That is not the normal use case. Ideally the caller would like to control which slices get fused and which dont. This patch introduces a new field to the `SCFTileAndFuseOptions` to specify this control. The contol function also allows the caller to specify if the replacement for the fused producer needs to be yielded from within the tiled computation. This allows replacing the fused producers in case they have other uses. Without this the original producers still survive negating the utility of the fusion. The change here also means that the name of the function `tileConsumerAndFuseProducerGreedily...` can be updated. Defering that to a later stage to reduce the churn of API changes.