llvm-project.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-11-20	[mlir][SCF] Add `scf::tileAndFuseConsumer` that tiles a consumer into a ↵	MaheshRavishankar
	given tiled loop nest. (#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-30	[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using ↵	sebvince
	CustomOp. (#161386) This PR adds to the generateLoopTerminatorFn callback the loops generated by GenerateLoopHeaderFn. This is needed to correctly set the insertion point with scf.forall ops.
2025-09-22	[mlir][SCF] Allow using a custom operation to generate loops with ↵	MaheshRavishankar
	`mlir::tileUsingSCF`. (#159660) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to scf::SCFTilingOptions::LoopType::CustomOp, the method mlir::tileUsingSCF provides two callback functions First one to generate the header of the loop. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Note that this is duplicate of https://github.com/llvm/llvm-project/pull/159506 that was accidently committed and was reverted in https://github.com/llvm/llvm-project/pull/159598 to wait for reviews. Signed-off-by: MaheshRavishankar [mahesh.ravishankar@gmail.com](mailto:mahesh.ravishankar@gmail.com) --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-09-18	Revert "[mlir][SCF] Allow using a custom operation to generate loops with ↵	MaheshRavishankar
	`mlir::tileUsingSCF`." (#159598) Reverts llvm/llvm-project#159506 It was committed by accident. Reverting it for reviews.
2025-09-18	[mlir][SCF] Allow using a custom operation to generate loops with ↵	MaheshRavishankar
	`mlir::tileUsingSCF`. (#159506) This change adds an option to use a custom operation to generate the inter-tile loops during tiling. When the loop type is set to `scf::SCFTilingOptions::LoopType::CustomOp`, the method `mlir::tileUsingSCF` provides two callback functions 1. First one to generate the header of the loop. 2. Second one to generate the terminator of the loop. These methods receive the information needed to generate the loops/terminator and expect to return information needed to generate the code for the intra-tile computation. See comments for more details. Presently this is adds support only for tiling. Subsequent commits will update this to add support for fusion as well. The PR is split into two commits. 1) The first commit is an NFC that just refactors the code (and cleans up some naming) to make it easier to add the support for custom loop operations. 2) The second commit adds the support for using a custom loop operation, as well as a test to exercise this path. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com> --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-07-09	[mlir][TilingInterface] Allow tile and fuse to work with ↵	MaheshRavishankar
	`ReductionTilingStrategy::PartialReductionOuterParallelStrategy`. (#147593) Since `scf::tileUsingSCF` is the core method used for tiling the root operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the latter can fuse into any tiled loop generated using `scf::tileUsingSCF`. This patch adds a test for tiling a root operation using `ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and fusing producers with it. Since this strategy generates a rank-reducing extract slice `tensor::replaceExtractSliceWithTiledProducer` which is the core method used for the fusion was extended to handle the rank-reducing slices. Also fix a small bug in the computation of the reduction induction variable (which needs to use `floorDiv` instead of `ceilDiv`) Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25	[mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)	MaheshRavishankar
	For consumer fusion cases of this form ``` %0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) { tensor.parallel_insert_slice ... into %arg0 tensor.parallel_insert_slice ... into %arg1 } %1 = linalg.generic ... ins(%0#0, %0#1) ``` the current consumer fusion that handles one slice at a time cannot fuse the consumer into the loop, since fusing along one slice will create and SSA violation on the other use from the `scf.forall`. The solution is to allow consumer fusion to allow considering multiple slices at once. This PR changes the `TilingInterface` methods related to consumer fusion, i.e. - `getTiledImplementationFromOperandTile` - `getIterationDomainFromOperandTile` to allow fusion while considering multiple operands. It is upto the `TilingInterface` implementation to return an error if a list of tiles of the operands cannot result in a consistent implementation of the tiled operation. The Linalg operation implementation of `TilingInterface` has been modified to account for these changes and allow cases where operand tiles that can result in a consistent tiling implementation are handled. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-10	[mlir][scf] Return `replacements` explicitly in `SCFTilingResult`. (#143217)	MaheshRavishankar
	In #120115 the replacements for the tiled operations were wrapped within the `MergeResult` object. That is a bit of an obfuscation and not immediately obvious where to get the replacements post tiling. This changes the `SCFTilingResult` to have `replacements` explicit (as it was before that change). `mergeOps` is added as a separate field of `SCFTilingResult`, which is empty when the reduction type is `FullReduction`. This is a API breaking change. All uses of `mergeResult.replacements` should be replaced with `replacements`. There was also an implicit assumption that `PartialReductionTilingInterface` is derived from `TilingInterface`, so all ops that implemented the `PartialReductionTilingInterface` were expected to implement the `TilingInterface` as well. This pre-dated the existence of derived inheritances. Make `PartialReductionTilingInterface` derive from `TilingInterface`. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-24	[mlir][TilingInterface] Make `tileAndFuseConsumerOfSlice` take surrounding ↵	MaheshRavishankar
	loops as an argument. (#132082) This gets the consumer fusion method in sync with the corresponding producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as input required use of complicated analysis to retrieve the surrounding loops which are very fragile. Just like the producer fusion method, the loops need to be taken in as an argument, with typically the loops being created by the tiling methods. Some utilities are added to check that the loops passed in are perfectly nested (in the case of an `scf.for` loop nest. This is change 1 of N to simplify the implementation of tile and fuse consumers. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-12-18	[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation ↵	Kunwar Grover
	(#120115) This patch unifies the tiling implementation for tileUsingFor and tileReductionUsingFor. This is done by passing an addition option to SCFTilingOptions, allowing it to set how reduction dimensions should be tiled. Currently, there are 3 different options for reduction tiling: FullReduction (old tileUsingFor), PartialReductionOuterReduction (old tileReductionUsingFor) and PartialReductionOuterParallel (linalg::tileReductionUsingForall, this isn't implemented in this patch). The patch makes tileReductionUsingFor use the tileUsingFor implementation with the new reduction tiling options. There are no test changes because the implementation was doing almost the exactly same thing. This was also tested in IREE (which uses both these APIs heavily) and there were no test changes.
2024-11-06	[mlir][scf] Extend consumer fusion to multiple tilable users (#111955)	Yun-Fly
	Before, consumer fusion expects single usage(or others are terminator op). This patch supports multiple tilable consumers fusion. E.g. ``` %0 = scf.for { ... %p = tiledProducer ... } %1 = tilableConsumer1 ins(%0 : ...) %2 = tilableConsumer2 ins(%0 : ...) ``` ===> ``` %0:3 = scf.for { ... %p = tiledProducer %1 = tiledConsumer1 ins(%p : ...) %2 = tiledConsumer2 ins(%p : ...) ... } ``` The key process is ensuring that the first user of loop should not dominate any define of consumer operand(s).
2024-09-11	[mlir][TilingInterface] Avoid looking at operands for getting slices to ↵	MaheshRavishankar
	continue tile + fuse. (#107882) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
2024-08-06	[mlir] Support DialectRegistry extension comparison (#101119)	Nikhil Kalra
	`PassManager::run` loads the dependent dialects for each pass into the current context prior to invoking the individual passes. If the dependent dialect is already loaded into the context, this should be a no-op. However, if there are extensions registered in the `DialectRegistry`, the dependent dialects are unconditionally registered into the context. This poses a problem for dynamic pass pipelines, however, because they will likely be executing while the context is in an immutable state (because of the parent pass pipeline being run). To solve this, we'll update the extension registration API on `DialectRegistry` to require a type ID for each extension that is registered. Then, instead of unconditionally registered dialects into a context if extensions are present, we'll check against the extension type IDs already present in the context's internal `DialectRegistry`. The context will only be marked as dirty if there are net-new extension types present in the `DialectRegistry` populated by `PassManager::getDependentDialects`. Note: this PR removes the `addExtension` overload that utilizes `std::function` as the parameter. This is because `std::function` is copyable and potentially allocates memory for the contained function so we can't use the function pointer as the unique type ID for the extension. Downstream changes required: - Existing `DialectExtension` subclasses will need a type ID to be registered for each subclass. More details on how to register a type ID can be found here: https://github.com/llvm/llvm-project/blob/8b68e06731e0033ed3f8d6fe6292ae671611cfa1/mlir/include/mlir/Support/TypeID.h#L30 - Existing uses of the `std::function` overload of `addExtension` will need to be refactored into dedicated `DialectExtension` classes with associated type IDs. The attached `std::function` can either be inlined into or called directly from `DialectExtension::apply`. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-07-31	[mlir][Linalg] Deprecate `linalg::tileToForallOp` and ↵	MaheshRavishankar
	`linalg::tileToForallOpUsingTileSizes` (#91878) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. Changes to `transform.structured.tile_using_forall` The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-06-19	[mlir][side effect] refactor(*): Include more precise side effects (#94213)	donald chen
	This patch adds more precise side effects to the current ops with memory effects, allowing us to determine which OpOperand/OpResult/BlockArgument the operation reads or writes, rather than just recording the reading and writing of values. This allows for convenient use of precise side effects to achieve analysis and optimization. Related discussions: https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
2024-06-01	[MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)	Abhishek Varma
	This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within scf.for/scf.forall loop. To support this two new methods are added to the `TilingInterface` - `getIterationDomainTileFromOperandTile` - `getTiledImplementationFromOperandTile`. Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer. Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time. Signed-off-by: Abhishek Varma <abhvarma@amd.com> --------- Signed-off-by: Abhishek Varma <abhvarma@amd.com> Signed-off-by: Abhishek Varma <avarma094@gmail.com> Co-authored-by: cxy <chenxunyu1993@gmail.com>
2024-03-20	[mlir] split transform interfaces into a separate library (#85221)	Oleksandr "Alex" Zinenko
	Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.
2024-01-25	[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to ↵	MaheshRavishankar
	unify tiling with `scf.for` and `scf.forall`. (#77874) Using `LoopLikeOpInterface` as the basis for the implementation unifies all the tiling logic for both `scf.for` and `scf.forall`. The only difference is the actual loop generation. This is a follow up to https://github.com/llvm/llvm-project/pull/72178 Instead of many entry points for each loop type, the loop type is now passed as part of the options passed to the tiling method. This is a breaking change with the following changes 1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF` 2) The `scf::tileUsingSCFForallOp` is deprecated. The same functionality is obtained by using `scf::tileUsingSCF` and setting the loop type in `scf::SCFTilingOptions` passed into this method to `scf::SCFTilingOptions::LoopType::ForallOp` (using the `setLoopType` method). 3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing any strategy with the default callback implemeting the greedy fusion. 4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use `SmallVector<LoopLikeOpInterface>`. 5) To make `scf::ForallOp` implement the parts of `LoopLikeOpInterface` needed, the `getOutputBlockArguments()` method is replaced with `getRegionIterArgs()` These changes now bring the tiling and fusion capabilities using `scf.forall` on par with what was already supported by `scf.for`
2024-01-11	[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ↵	MaheshRavishankar
	ops. (#77204) In the process a couple of test transform dialect ops are added just for testing. These operations are not intended to use as full flushed out of transformation ops, but are rather operations added for testing. A separate operation is added to `LinalgTransformOps.td` to convert a `TilingInterface` operation to loops using the `generateScalarImplementation` method implemented by the operation. Eventually this and other operations related to tiling using the `TilingInterface` need to move to a better place (i.e. out of `Linalg` dialect)