summaryrefslogtreecommitdiff
path: root/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
AgeCommit message (Collapse)Author
2025-11-19[mlir][vector] Missing indices on vectorization of 1-d reduction to 1-ranked ↵Simone Pellegrini
memref (#166959) Vectorization of a 1-d reduction where the output variable is a 1-ranked memref can generate an invalid `vector.transfer_write` with no indices for the memref, e.g.: vector.transfer_write"(%vec, %buff) <{...}> : (vector<f32>, memref<1xf32>) -> () This patch solves the problem by providing the expected amount of indices (i.e. matching the rank of the memref).
2025-11-11[mlir][vector] Simplify createReadOrMaskedRead (#163736)Andrzej Warzyński
Simplify `createReadOrMaskedRead` to only require _one_ argument to specify the vector type to read (passed as `VectorType`) instead of passing vector-sizes and scalable-flags independently (i.e. _two_ arguments). A simple overload is provided for users that wouldn't re-use the corresponding `VectorType` (and hence there's no point for them to create). While there are no users upstream for this overload, it's been helpful downstream.
2025-11-06[mlir][linalg] Update vectorization of linalg.pack (#163539)Andrzej Warzyński
This patch changes `vectorizeAsTensorPackOp` to require users to specify **all** write-side vector sizes for `linalg.pack` (not just the outer dimensions). This makes `linalg.pack` vectorization consistent with `linalg.unpack` (see https://github.com/llvm/llvm-project/pull/149293 for a similar change). Conceptually, `linalg.pack` consists of these high-level steps: * **Read** from the source tensor using `vector.transfer_read`. * **Re-associate** dimensions of the read value, as specified by the op (via `vector.shape_cast`) * **Transpose** the re-associated value according to the permutation in the `linalg.pack` op (via `vector.transpose`). * **Write** the result into the destination tensor via `vector.transfer_write`. Previously, the vector sizes provided by the user were interpreted as write-vector-sizes for PackOp **_outer_** dims (i.e. the final step above). These were used to: * Infer read-vector-sizes using the `inner_tiles` attribute of PackOp. * Deduce vector sizes for the transpose and shape cast operations. * Ultimately determine the vector shape for the read. However, this logic breaks when one or more tile sizes are dynamic (*). In such cases, `vectorizePackOpPrecondition` would currently fail (see `@pack_with_dynamic_dims_and_dynamic_inner_tile` added in this PR - without this change it will crash). This patch updates the contract: users now directly specify _all_ the "write-vector-sizes", which inherently encode all inner tile sizes - including dynamic ones. It becomes the user's responsibility to provide valid sizes. In practice, since `linalg.pack` is typically constructed, tiled, and vectorized by the same transformation pipeline, the necessary "write-vector-sizes" should be recoverable. Notes for reviewers: * See test updates for user-facing impact. * Review `vectorizeAsTensorPackOp` as a new implementation rather than a diff. * Comments and variable names were updated to align with `vectorizeAsTensorUnPackOp`. (*) As a concrete example, "scalable" tile sizes are represent as dynamic values. Note, support for "scalable" vectorisation will be added in a separate PR.
2025-10-30[mlir] Simplify Default cases in type switches. NFC. (#165767)Jakub Kuderski
Use default values instead of lambdas when possible. `std::nullopt` and `nullptr` can be used now because of https://github.com/llvm/llvm-project/pull/165724.
2025-10-29[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in Vectorization.cpp (NFC)Mehdi Amini
2025-10-09[mlir][linalg] set inbounds on `xfer_read/writes` for ↵Ege Beysel
`assumeDynamicDimsMatchVecSizes ` (#160839) The idea from #146531 was to introduce the flag `assumeDynamicDimsMatchVecSizes`, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, `xfer_read/write` ops are created without explicitly setting the inbounds attribute, which defaults to all-false. In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for `xfer_read/write` ops if the `assumeDynamicDimsMatchVecSizes` flag is set. --------- Signed-off-by: Ege Beysel <beyselege@gmail.com>
2025-09-30[MLIR] Apply clang-tidy fixes for readability-container-size-empty in ↵Mehdi Amini
Vectorization.cpp (NFC)
2025-09-23[mlir][linalg] Use ub.poison when vectorizing pack+unpack Ops (#159536)Andrzej Warzyński
This patch makes sure that in the absence of an explicit pad value in `linalg.pack`, the vectorizer will use `ub.poison` for the corresponding Xfer Op pad value (as opposed to e.g. `arith.constant 0`). Also, in the case of `linalg.unpack`, use `ub.poison` for the Xfer read operation. In this case, there is no mechanism for a user to specify the pad/pass-thru value.
2025-09-18[mlir][linalg] Update vectorization logic for linalg.pack (#149156) (#158926)Andrzej Warzyński
NOTE: See #149156 for a smilar change for `linalg.unpack` This PR makes sure that we don't generate unnecessary `tensor.empty` when vectorizing `linalg.pack`. To better visualize the changes implemented here, consider this IR: ```mlir func.func @example( %src: tensor<64x4xf32>, %dest: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> { %pack = linalg.pack %src outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 2] into %dest : tensor<64x4xf32> -> tensor<2x4x16x2xf32> return %pack : tensor<2x4x16x2xf32> } ``` Below is the output after vectorization, BEFORE and AFTER this PR. BEFORE (note `tensor.empty` and the fact that `%arg1` is not used): ```mlir func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> { %cst = arith.constant 0.000000e+00 : f32 %c0 = arith.constant 0 : index %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32> %1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32> %2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32> %3 = tensor.empty() : tensor<2x4x16x2xf32> %c0_0 = arith.constant 0 : index %4 = vector.transfer_write %2, %3[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32> return %4 : tensor<2x4x16x2xf32> } ``` AFTER (note that `%arg1` is correctly used): ```mlir func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> { %cst = arith.constant 0.000000e+00 : f32 %c0 = arith.constant 0 : index %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32> %1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32> %2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32> %c0_0 = arith.constant 0 : index %3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32> return %3 : tensor<2x4x16x2xf32> } ``` ADDITIONAL CHANGES: * Adds missing `CHECK-LABEL` in tests. * Capitalize LIT test variables names.
2025-08-14[mlir][linalg] Add support for scalable vectorization of ↵Ege Beysel
`linalg.batch_mmt4d` (#152984) This PR builds upon the previous #146531 and enables scalable vectorization for `batch_mmt4d` as well. --------- Signed-off-by: Ege Beysel <beyselege@gmail.com>
2025-08-08[MLIR][Linalg] Remove matmul_transpose variants (#147961)Renato Golin
Removes the `(batch_)matmul_transpose_{a|b}` variants from OpDSL and replace it with `matmul affine_maps [...]` whenever appropriate. This is in line with the [plan](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863), and can be done since #104783 merged. See: https://discourse.llvm.org/t/deprecate-batch-matmul-transpose-a-b-linalg-operations/87245 Issues investigated: * pad transform tests that could use `matmul` instead, so change to that. * ArmSME test using transpose actually needed it, so changed to `matmul` + affine maps. Arm tests validated by @banach-space (thanks!!).
2025-08-06[mlir][linalg] Enable scalable vectorization of linalg.unpack (#149293)Andrzej Warzyński
This patch updates `vectorizeAsTensorUnpackOp` to support scalable vectorization by requiring user-specified vector sizes for the _read_ operation (rather than the _write_ operation) in `linalg.unpack`. Conceptually, `linalg.unpack` consists of these high-level steps: * **Read** from the source tensor using `vector.transfer_read`. * **Transpose** the read value according to the permutation in the `linalg.unpack` op (via `vector.transpose`). * **Re-associate** dimensions of the transposed value, as specified by the op (via `vector.shape_cast`) * **Write** the result into the destination tensor via `vector.transfer_write`. Previously, the vector sizes provided by the user were interpreted as write-vector sizes. These were used to: * Infer read-vector sizes using the `inner_tiles` attribute of the unpack op. * Deduce vector sizes for the transpose and shape cast operations. * Ultimately determine the vector shape for the write. However, this logic breaks when one or more tile sizes are dynamic. In such cases, `vectorizeUnPackOpPrecondition` fails, and vectorization is rejected. This patch switches the contract: users now directly specify the "read-vector-sizes", which inherently encode all inner tile sizes - including dynamic ones. It becomes the user's responsibility to provide valid sizes. In practice, since `linalg.unpack` is typically constructed, tiled, and vectorized by the same transformation pipeline, the necessary "read-vector-sizes" should be recoverable.
2025-08-01[mlir][linalg] Add getCollapsedVecType and update vectorization of ↵Andrzej Warzyński
linalg.unpack (#151503) This patch introduces a new helper, `getCollapsedVecType`, and updates `vectorizeAsTensorUnpackOp` to use it. The motivation stems from improving how `vector.shape_cast` operations are generated when vectorizing `linalg.unpack`. Previously, the vectorizer relied on * `tensor::CollapseShapeOp::inferCollapsedType` to compute the collapsed vector type. This approach is suboptimal because: * `inferCollapsedType` lacks awareness of scalable vector flags. * Linalg vectorization should not depend on Tensor dialect utilities. Instead of relocating `inferCollapsedType`, we introduce `getCollapsedVecType` — a lightweight, specialized hook that: * Assumes no dynamic sizes. * Handles scalable flags alongside shape dimensions. This change also reduces temporary variables in `vectorizeAsTensorUnpackOp` and paves the way for a cleaner update in #149293.
2025-07-30[mlir][linalg][nfc] Clean-up leftover code post #149156 (#151334)Andrzej Warzyński
In https://github.com/llvm/llvm-project/pull/149156, I ensured that we no longer generate spurious `tensor.empty` ops when vectorizing `linalg.unpack`. This follow-up removes leftover code that is now redundant but was missed in the original PR and in #150602 that was also meant to clean-up left-over code. Note, this is removing code to compute "write-vector-sizes". Instead, these are fully inferred from previous Ops.
2025-07-28[mlir][linalg][nfc] Clean-up leftover code post #149156 (#150602)Andrzej Warzyński
In https://github.com/llvm/llvm-project/pull/149156, I ensured that we no longer generate spurious `tensor.empty` ops when vectorizing `linalg.unpack`. This follow-up removes leftover code that is now redundant but was missed in the original PR.
2025-07-25[mlir][NFC] update `mlir/Dialect` create APIs (32/n) (#150657)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25[mlir][NFC] update `mlir/Dialect` create APIs (27/n) (#150638)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25[mlir] Switch to new LDBG macro (#150616)Jacques Pienaar
Change local variants to use new central one.
2025-07-24[mlir][NFC] update `mlir/Dialect` create APIs (17/n) (#149924)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-22[mlir][linalg] Vectorize directly to a named contraction (#147296)Adam Siemieniuk
Extends linalg vectorizer with a path to lower contraction ops directly into `vector.contract`. The direct rewriting preserves high-level op semantics and provides more progressive lowering compared to reconstructing contraction back from multi dimensional reduction. The added lowering focuses on named linalg ops and leverages their well defined semantics to avoid complex precondition verification. The new path is optional and disabled by default to avoid changing the default vectorizer behavior.
2025-07-17[mlir][linalg] Add support for scalable vectorization of linalg.mmt4d (#146531)Andrzej Warzyński
This patch adds support for scalable vectorization of linalg.mmt4d. The key design change is the introduction of a new vectorizer state variable: * `assumeDynamicDimsMatchVecSizes` ...along with the corresponding Transform dialect attribute: * `assume_dynamic_dims_match_vec_sizes`. This flag instructs the vectorizer to assume that dynamic memref/tensor dimensions match the corresponding vector sizes (fixed or scalable). With this assumption, masking becomes unnecessary, which simplifies the lowering pipeline significantly. While this assumption is not universally valid, it typically holds for `linalg.mmt4d`. Inputs and outputs are explicitly packed using `linalg.pack`, and this packing includes padding, ensuring that dimension sizes align with vector sizes (*). * Related discussion: https://github.com/llvm/llvm-project/issues/143920 An upcoming patch will include an end-to-end test that leverages scalable vectorization of linalg.mmt4d to demonstrate the newly enabled functionality. This would not be feasible without the changes introduced here, as it would otherwise require additional logic to handle complex - but ultimately redundant - masks. (*) This holds provided that the tile sizes used for packing match the vector sizes used during vectorization. It is the user’s responsibility to enforce this.
2025-07-17[mlir][linalg] Update vectorization logic for linalg.unpack (#149156)Andrzej Warzyński
This PR makes sure that we don't generate unnecessary `tensor.empty` when vectorizing `linalg.unpack`. To better visualize the changes implemented here, consider this IR: ```mlir func.func @example( %source: tensor<8x4x16x16xf32>, %dest: tensor<64x127xf32>) -> tensor<64x127xf32> { %res = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %dest : tensor<8x4x16x16xf32> -> tensor<64x127xf32> return %res : tensor<64x127xf32> } ``` Below is the output after vectorization, BEFORE and AFTER this PR. BEFORE (note `tensor.empty` and the fact that `%arg1` is not used): ```mlir func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> { %cst = arith.constant 0.000000e+00 : f32 %c0 = arith.constant 0 : index %0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32> %1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32> %2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32> %3 = tensor.empty() : tensor<64x127xf32> %c0_0 = arith.constant 0 : index %4 = vector.transfer_write %2, %3[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32> return %4 : tensor<64x127xf32> } ``` AFTER (note that `%arg1` is correctly used): ```mlir func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> { %cst = arith.constant 0.000000e+00 : f32 %c0 = arith.constant 0 : index %0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32> %1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32> %2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32> %c0_0 = arith.constant 0 : index %3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32> return %3 : tensor<64x127xf32> } ```
2025-07-07[mlir] Add `isStatic`* size check for `ShapedType`s. NFCI. (#147085)Jakub Kuderski
The motivation is to avoid having to negate `isDynamic*` checks, avoid double negations, and allow for `ShapedType::isStaticDim` to be used in ADT functions without having to wrap it in a lambda performing the negation. Also add the new functions to C and Python bindings.
2025-07-06[mlir] Remove unused includes (NFC) (#147206)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-07-07[mlir] Fix Wparentheses warning (#146893)Longsheng Mou
warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] 265 | isa<VectorType>(operandType) && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 266 | "Unexpected non-vector ShapedType"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-07-02[mlir][linalg] Use `ub.poison` in linalg vectorizer instead of `0` for some ↵Fabian Mora
transfer ops (#146544) This patch is a follow up to https://github.com/llvm/llvm-project/pull/146088 and changes the padding value in the linalg vectorizer from `0` to `ub.poison` in `vector.transfer_read`s created for extracting slices or when vectorizing a generic. Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
2025-06-30[mlir][vector] Avoid setting padding by default to `0` in ↵Fabian Mora
`vector.transfer_read` prefer `ub.poison` (#146088) Context: `vector.transfer_read` always requires a padding value. Most of its builders take no `padding` value and assume the safe value of `0`. However, this should be a conscious choice by the API user, as it makes it easy to introduce bugs. For example, I found several occasions while making this patch that the padding value was not getting propagated (`vector.transfer_read` was transformed into another `vector.transfer_read`). These bugs, were always caused because of constructors that don't require specifying padding. Additionally, using `ub.poison` as a possible default value is better, as it indicates the user "doesn't care" about the actual padding value, forcing users to specify the actual padding semantics they want. With that in mind, this patch changes the builders in `vector.transfer_read` to always having a `std::optional<Value> padding` argument. This argument is never optional, but for convenience users can pass `std::nullopt`, padding the transfer read with `ub.poison`. --------- Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
2025-06-26[mlir] Use llvm::is_contained instead of llvm::all_of (NFC) (#145845)Kazu Hirata
llvm::is_contained is shorter than llvm::all_of plus a lambda.
2025-06-24[mlir] Return vectorized values instead of replacing (#144158)Max191
Updates the linalg::vectorize function to return a `FailureOr<VectorizationResult>` containing the values to replace the original operation, instead of directly replacing the original operation. This aligns better with the style of transforms used with the TilingInterface, and gives more control to users over the lowering, since it allows for additional transformation of the IR before replacement. There was already a `VectorizationResult` defined, which was used for the internal vectorize implementation using `CustomVectorizationHook`s, so the old struct is renamed to `VectorizationHookResult`. Note for integration: The replacement of the original operation is now the responsibility of the caller, so wherever `linalg::vectorize` is used, the caller must also do `rewriter.replaceOp(vectorizeResults->replacements)`. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2025-06-24[mlir][Interface] Factor out common IndexingMapOpInterface behavior in a new ↵Nicolas Vasilache
generic interface (#145313) Refactor the verifiers to make use of the common bits and make `vector.contract` also use this interface. In the process, the confusingly named getStaticShape has disappeared. Note: the verifier for IndexingMapOpInterface is currently called manually from other verifiers as it was unclear how to avoid it taking precedence over more meaningful error messages
2025-06-08[mlir][linalg] Simplify `createWriteOrMaskedWrite` (NFC) (#141567)Andrzej Warzyński
This patch removes `inputVecSizesForLeadingDims` from the parameter list of `createWriteOrMaskedWrite`. That argument is unnecessary - vector sizes can be obtained from the `vecToStore` parameter. Since this doesn't change behavior or test results, it's marked as NFC. Additional cleanups: * Renamed `vectorToStore` to `vecToStore` for consistency and brevity. * Rewrote a conditional at the end of the function to use early exit, improving readability: ```cpp // BEFORE: if (maskingRequried) { Value maskForWrite = ...; write = maskOperation(write, maskForWrite); } return write; // AFTER if (!maskingRequried) return write; Value maskFroWrite = ...; return vector::maskOperation(builder, write, maskForWrite); ```
2025-06-08[mlir] Strip away lambdas (NFC) (#143280)Kazu Hirata
We don't need lambdas here.
2025-06-07[mlir][linalg] Refactor vectorization hooks to improve code reuse (#141244)Andrzej Warzyński
This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * https://github.com/llvm/llvm-project/pull/122927 * https://github.com/llvm/llvm-project/pull/123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * https://github.com/llvm/llvm-project/issues/142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see #138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address #142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
2025-05-14[mlir] Fix a warningKazu Hirata
This patch fixes: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1544:14: error: unused variable 'destType' [-Werror,-Wunused-variable]
2025-05-14[mlir][vector] Refactor `createWriteOrMaskedWrite` (#138137)Andrzej Warzyński
This patch updates `createWriteOrMaskedWrite` to make it consistent with `createReadOrMaskedRead`. Before diving into the details: note that these utilities are currently implemented in different files — "VectorUtils.cpp" (Vector) and "Vectorization.cpp" (Linalg). In a subsequent patch, I plan to move `createWriteOrMaskedWrite` into "VectorUtils.cpp". SUMMARY OF CHANGES: The main change is to remove the logic that creates the destination tensor, which previously looked like: ```cpp Value dest = builder.create<tensor::EmptyOp>(loc, destSizes, inputType.getElementType()); ``` With this patch, createWriteOrMaskedWrite now simply generates: ```mlir %res = vector.transfer_write %vectorToStore into %dest ``` This replaces the previous form: ```mlir %dest = tensor.empty(%destSizes) %res = vector.transfer_write %vectorToStore into %dest ``` In other words, the destination value `%dest` is now passed as an input parameter. This makes `createWriteOrMaskedWrite` re-usable in contexts where the destination tensor is already known — for example, in `vectorizeAsInsertSliceOp`, which I will update in a follow-up patch. OTHER CHANGES: * Added comments and clarified TODOs. * Updated tests: since destination sizes are now computed independently inside `createWriteOrMaskedWrite`, some additional `tensor.dim` ops appear. These will be cleaned up by CSE + canonicalization.
2025-05-12[mlir][vector] Standardize `base` Naming Across Vector Ops (NFC) (#137859)Andrzej Warzyński
[mlir][vector] Standardize base Naming Across Vector Ops (NFC) This change standardizes the naming convention for the argument representing the value to read from or write to in Vector ops that interface with Tensors or MemRefs. Specifically, it ensures that all such ops use the name `base` (i.e., the base address or location to which offsets are applied). Updated operations: * `vector.transfer_read`, * `vector.transfer_write`. For reference, these ops already use `base`: * `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`, `vector.expandload`, `vector.compressstore`, `vector.maskedstore`, `vector.maskedload`. This is a non-functional change (NFC) and does not alter the semantics of these operations. However, it does require users of the XFer ops to switch from `op.getSource()` to `op.getBase()`. To ease the transition, this PR temporarily adds a `getSource()` interface method for compatibility. This is intended for downstream use only and should not be relied on upstream. The method will be removed prior to the LLVM 21 release. Implements #131602
2025-04-15[mlir][linalg][vector] Refine create{Read|Write}OrMasked{Read|Write} (nfc) ↵Andrzej Warzyński
(#135350) The semantics of `createReadOrMaskedRead` and `createWriteOrMaskedWrite` are currently a bit inconsistent and not fully documented: * The input vector sizes are passed as `readShape` and `inputVectorSizes`, respectively — inconsistent naming. * Currently, the input vector sizes in `createWriteOrMaskedWrite` are not required to be complete: any missing trailing sizes are inferred from the destination tensor. This only works when the destination tensor is statically shaped. * Unlike `createReadOrMaskedRead`, the documentation for `createWriteOrMaskedWrite` does not specify that write offsets are hard-coded to 0. This PR only updates the documentation and unifies the naming. As such, it is NFC. A follow-up PR will generalize and unify the implementation to support, for example, dynamically shaped destination tensors — a requirement for enabling scalable vectorization of `linalg.pack` and `linalg.unpack`.
2025-04-07[mlir][vector] Standardise `valueToStore` Naming Across Vector Ops (NFC) ↵Andrzej Warzyński
(#134206) This change standardises the naming convention for the argument representing the value to store in various vector operations. Specifically, it ensures that all vector ops storing a value—whether into memory, a tensor, or another vector — use `valueToStore` for the corresponding argument name. Updated operations: * `vector.transfer_write`, `vector.insert`, `vector.scalable_insert`, `vector.insert_strided_slice`. For reference, here are operations that currently use `valueToStore`: * `vector.store` `vector.scatter`, `vector.compressstore`, `vector.maskedstore`. This change is non-functional (NFC) and does not affect the functionality of these operations. Implements #131602
2025-04-02[mlir] Vectorize tensor.pad with low padding for unit dims (#133808)Nirvedh Meshram
We currently do not have masked vectorization support for tenor.pad with low padding. However, we can allow this in the special case where the result dimension after padding is a unit dim. The reason is when we actually have a low pad on a unit dim, the input size of that dimension will be (or should be for correct IR) dynamically zero and hence we will create a zero mask which is correct. If the low pad is dynamically zero then the lowering is correct as well. --------- Signed-off-by: Nirvedh <nirvedh@gmail.com>
2025-03-17[MLIR] Refactor to create vectorization convOp precondition check (#130181)Zhuoran Yin
In corner situations, the vectorization pass may face to lower a conv2d op and assert in a completely irrelevant location in vectorizeConvolution() subroutine. ~~This PR rejects the conv2d op early and make the asserted routine to return failure as a defensive workaround.~~ In addressing this, the PR moved all condition check away from the `Conv1dGenerator` into the `convOpPreconditionCheck()` function. This makes the unsupported ops such as conv2d to be rejected early and leave a cleaner `Conv1dGenerator` constructor.
2025-02-17[mlir][tensor][linalg] Move Pack/UnPack Ops to Linalg (#123902)Andrzej Warzyński
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg This change involves significant churn but only relocates existing code - no new functionality is added. **Note for Downstream Users** Downstream users must update references to `PackOp` and `UnPackOp` as follows: * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g` * Tests: `s/tensor.(un)pack/linalg.(un)pack/g` No other modifications should be required.
2025-02-07[mlir][linalg] Add support for masked vectorization of `tensor.insert_slice` ↵Andrzej Warzyński
(2/N) (#123031) For context, recall that `tensor.insert_slice` is vectorised using the `vector.transfer_read` + `vector.transfer_write` pair. An unmasked example is shown below: ```mlir // BEFORE VECTORIZATION %res = tensor.insert_slice %slice into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32> // AFTER VECTORIZATION %read = vector.transfer_read %source[%c0, %c0], %pad : tensor<5x1xi32>, vector<8x1xi32> %res = vector.transfer_write %read, %dest[%c0, %c2] : vector<8x1xi32>, tensor<5x3xi32> ``` This PR extends `vectorizeAsInsertSliceOp` to add masking support for the `vector.transfer_write` operation. This complements the changes in #122927, which introduced masking for the `vector.transfer_read`.
2025-02-02[mlir][linalg] Add support for masked vectorization of `tensor.insert_slice` ↵Andrzej Warzyński
(1/N) (#122927) For context, `tensor.insert_slice` is vectorized using a `vector.transfer_read` + `vector.transfer_write` pair. An unmasked example is shown below: ```mlir // BEFORE VECTORIZATION %res = tensor.insert_slice %slice into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32> // AFTER VECTORIZATION %read = vector.transfer_read %source[%c0, %c0], %pad : tensor<5x1xi32>, vector<8x1xi32> %res = vector.transfer_write %read, %dest[%c0, %c2] : vector<8x1xi32>, tensor<5x3xi32> ``` This PR refactors `InsertSliceVectorizePattern` (which is used to vectorize `tensor.extract_slice`) to enable masked vectorization. ATM, only `vector.transfer_read` is masked. If `vector.transfer_write` also requires masking, the vectorizer will bail out. This will be addressed in a sub-sequent PR. Summary of changes: * Added an argument to specify vector sizes (behavior remains unchanged if vector sizes are not specified). * Renamed `InsertSliceVectorizePattern` to `vectorizeAsInsertSliceOp` and integrated into (alongside other hooks for vectorization) in `linalg::vectorize`. * Removed `populateInsertSliceVectorizationPatterns`, as `InsertSliceVectorizePattern` was its only pattern. * Updated `vectorizeAsInsertSliceOp` to support masking for the "read" operation. * Updated `@pad_and_insert_slice_dest` in "vectorization-pad-patterns.mlir" to reflect the removal of `populateInsertSliceVectorizationPatterns` from `ApplyPadVectorizationPatternsOps`.
2025-01-21[mlir][NFC] Avoid using braced initializer lists to call a constructor. ↵Han-Chung Wang
(#123714) In the LLVM style guide, we prefer not using braced initializer lists to call a constructor. Also, we prefer using an equal before the open curly brace if we use a braced initializer list when initializing a variable. See https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor for more details. The style guide does not explain the reason well. There is an article from abseil, which mentions few benefits. E.g., we can avoid the most vexing parse, etc. See https://abseil.io/tips/88 for more details. Signed-off-by: hanhanW <hanhan0912@gmail.com>
2024-12-11[mlir][linalg] Enable Vectorization of 0-D tensor.extract (#119079)Andrzej Warzyński
This patch removes an assert in `vectorizeTensorExtract` that was blocking the vectorization of 0-D tensor.extract operations, e.g.: ```mlir %1 = tensor.extract %src[] : tensor<f32> ``` As demonstrated by the included tests, this case is already effectively supported. **Context** The removed assert was introduced in #109580 as a guard, pending proper support and testing for 0-D tensors. This PR addresses that previously undocumented TODO. Apologies for the oversight! **Updates and Tests** * Revised the existing test `@negative_no_loop` to ensure the `vectorize_nd_extract` attribute is included, allowing the vectorizer to process it. The test was renamed and variables updated for clarity. * Added a new test `@extract_scalar_from_0d_into_1d` to cover "mixed" 0-D/1-D tensor extraction, e.g.: ```mlir %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel"] } outs(%init : tensor<1xf32>) { ^bb0(%in: f32): %1 = tensor.extract %src[] : tensor<f32> linalg.yield %1 : f32 } -> tensor<1xf32> return %res : tensor<1xf32> ``` **Additional updates** I also took the liberty and improved test coverage for 0-D tensor in the vectorizer tests: * Added a specific test for "0D linalg.generic" in "vectorization-with-patterns.mlir". * Renamed several tests in "vectorization-with-patterns.mlir" to clarify that the 0-D case is now covered.
2024-12-05[mlir][linalg] Fix vectorization of tensor.extract (#118105)Andrzej Warzyński
The example below demonstrates a "scalar read followed by a broadcast" pattern for `tensor.extract`: ```mlir #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> func.func @scalar_broadcast( %init : tensor<1x1x3xi32>, %src: tensor<1x3x2x4xi32>, %idx :index) -> tensor<1x1x3xi32> { %c0 = arith.constant 0 :index %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%init : tensor<1x1x3xi32>) { ^bb0(%out: i32): %val = tensor.extract %src[%idx, %idx, %idx, %idx] : tensor<1x3x2x4xi32> linalg.yield %val : i32 } -> tensor<1x1x3xi32> return %res : tensor<1x1x3xi32> } ``` The default masking path within the Linalg vectorizer, which assumes an identity masking map, is not suitable here. Indeed: * identity != broadcast. This patch ensures masking is handled in the `vectorizeTensorExtract` hook, which has the necessary context for proper handling. Fixes #116197
2024-11-29[mlir][linalg] Relax scalable vectorization restrictions (#117991)Andrzej Warzyński
Currently, the Linalg vectorizer disallows non-trailing parallel dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (*), for cases like: ```mlir %0 = linalg.fill ins(%arg0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32> ``` This restriction exists to avoid generating "scalable" arrays of aggregates, which LLVM does not support (multi-dim vectors are lowered into arrays of aggregates at the LLVM level). This patch relaxes that restriction when the trailing parallel vector dimension is `1`, e.g., for `vector_sizes [[8], 1]`. Such cases are safe since trailing unit dimensions can be collapsed. This relaxation is necessary to support scalable vectorization for tensor.pack, where inner tile sizes are `[8]` (scalable) and `1` (scalar). (*) Transform Dialect notation
2024-11-26[mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone ↵Andrzej Warzyński
transformation (#117329) Currently, `GeneralizePadOpPattern` is grouped under `populatePadOpVectorizationPatterns`. However, as noted in #111349, this transformation "decomposes" rather than "vectorizes" `tensor.pad`. As such, it functions as: * a vectorization _pre-processing_ transformation, not * a vectorization transformation itself. To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a standalone transformation by: * introducing a dedicated `populateDecomposePadPatterns` method, * adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op, * removing it from `populatePadOpVectorizationPatterns`. In addition, to better reflect its role, it is renamed as "decomposition" rather then "generalization". This is in line with the recent renaming of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439.
2024-11-13[mlir][Vector] Remove trivial uses of ↵Kunwar Grover
vector.extractelement/vector.insertelement (1/N) (#116053) This patch removes trivial usages of vector.extractelement/vector.insertelement. These operations can be fully represented by vector.extract/vector.insert. See https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116 for more information. Further patches will remove more usages of these ops.
2024-11-07[MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing ↵Md Asghar Ahmad Shahid
obsolete OpDSL tests. (#115319) The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which introduces transpose and broadcast semantic to linalg.matmul was reverted due to two failing OpDSL test for linalg.matmul. Since linalg.matmul is now defined using TableGen ODS instead of Python-based OpDSL, these test started failing and needs to be removed/updated. This commit removes/updates the failing obsolete tests from below files. All other files were part of earlier PR and just cherry picked. "mlir/test/python/integration/dialects/linalg/opsrun.py" "mlir/test/python/integration/dialects/transform.py" --------- Co-authored-by: Renato Golin <rengolin@systemcall.eu>