llvm-project.git/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp, branch main

[mlir][vector] Missing indices on vectorization of 1-d reduction to 1-ranked memref (#166959)

2025-11-19T14:52:27+00:00

Vectorization of a 1-d reduction where the output variable is a 1-ranked
memref can generate an invalid `vector.transfer_write` with no indices
for the memref, e.g.:

vector.transfer_write"(%vec, %buff) <{...}> : (vector,
memref<1xf32>) -> ()

This patch solves the problem by providing the expected amount of
indices (i.e. matching the rank of the memref).

[mlir][vector] Simplify createReadOrMaskedRead (#163736)

2025-11-11T17:39:56+00:00

Simplify `createReadOrMaskedRead` to only require _one_ argument to
specify the vector type to read (passed as `VectorType`) instead of
passing vector-sizes and scalable-flags independently (i.e. _two_
arguments).

A simple overload is provided for users that wouldn't re-use the
corresponding `VectorType` (and hence there's no point for them
to create). While there are no users upstream for this overload,
it's been helpful downstream.

[mlir][linalg] Update vectorization of linalg.pack (#163539)

2025-11-06T13:53:56+00:00

This patch changes `vectorizeAsTensorPackOp` to require users to specify
**all** write-side vector sizes for `linalg.pack` (not just the outer
dimensions). This makes `linalg.pack` vectorization consistent with
`linalg.unpack` (see https://github.com/llvm/llvm-project/pull/149293
for a similar change).

Conceptually, `linalg.pack` consists of these high-level steps:
  * **Read** from the source tensor using `vector.transfer_read`.
  * **Re-associate** dimensions of the read value, as specified by
    the op (via `vector.shape_cast`)
  * **Transpose** the re-associated value according to the permutation
    in the `linalg.pack` op (via `vector.transpose`).
  * **Write** the result into the destination tensor via
    `vector.transfer_write`.

Previously, the vector sizes provided by the user were interpreted as
write-vector-sizes for PackOp **_outer_** dims (i.e. the final step
above). These were used to:
  * Infer read-vector-sizes using the `inner_tiles` attribute of PackOp.
  * Deduce vector sizes for the transpose and shape cast operations.
  * Ultimately determine the vector shape for the read.

However, this logic breaks when one or more tile sizes are dynamic (*).
In such cases, `vectorizePackOpPrecondition` would currently fail (see
`@pack_with_dynamic_dims_and_dynamic_inner_tile` added in this PR -
without this change it will crash).

This patch updates the contract: users now directly specify _all_ the
"write-vector-sizes", which inherently encode all inner tile sizes -
including dynamic ones. It becomes the user's responsibility to provide
valid sizes.

In practice, since `linalg.pack` is typically constructed, tiled, and
vectorized by the same transformation pipeline, the necessary
"write-vector-sizes" should be recoverable.

Notes for reviewers:
  * See test updates for user-facing impact.
  * Review `vectorizeAsTensorPackOp` as a new implementation rather than
    a diff.
  * Comments and variable names were updated to align with
    `vectorizeAsTensorUnPackOp`.

(*) As a concrete example, "scalable" tile sizes are represent as
dynamic values. Note, support for "scalable" vectorisation will be added
in a separate PR.

[mlir] Simplify Default cases in type switches. NFC. (#165767)

2025-10-30T19:10:59+00:00

Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in Vectorization.cpp (NFC)

2025-10-30T06:48:05+00:00

[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes ` (#160839)

2025-10-09T16:23:18+00:00

The idea from #146531 was to introduce the flag
`assumeDynamicDimsMatchVecSizes`, to signal the vectorizer that the
access should not be masked and is in-bounds. Though the masking part is
handled, `xfer_read/write` ops are created without explicitly setting
the inbounds attribute, which defaults to all-false.

In the existence of scalable tile sizes, subsequent patterns tend to
overwrite the inbounds attribute and introduce masks further down when
lowered to loads and stores. This PR explicitly sets the inbounds
attribute to all-true for `xfer_read/write` ops if the
`assumeDynamicDimsMatchVecSizes` flag is set.

---------

Signed-off-by: Ege Beysel

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in Vectorization.cpp (NFC)

2025-09-30T08:26:40+00:00

[mlir][linalg] Use ub.poison when vectorizing pack+unpack Ops (#159536)

2025-09-23T11:27:48+00:00

This patch makes sure that in the absence of an explicit pad value in
`linalg.pack`, the vectorizer will use `ub.poison` for the corresponding
Xfer Op pad value (as opposed to e.g. `arith.constant 0`).

Also, in the case of `linalg.unpack`, use `ub.poison` for the Xfer read
operation. In this case, there is no mechanism for a user to specify the
pad/pass-thru value.

[mlir][linalg] Update vectorization logic for linalg.pack (#149156) (#158926)

2025-09-18T09:32:46+00:00

NOTE: See #149156 for a smilar change for `linalg.unpack`

This PR makes sure that we don't generate unnecessary `tensor.empty`
when vectorizing `linalg.pack`.

To better visualize the changes implemented here, consider this IR:
```mlir
func.func @example(
    %src: tensor<64x4xf32>,
    %dest: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {

  %pack = linalg.pack %src
    outer_dims_perm = [1, 0]
    inner_dims_pos = [0, 1]
    inner_tiles = [16, 2]
    into %dest : tensor<64x4xf32> -> tensor<2x4x16x2xf32>

  return %pack : tensor<2x4x16x2xf32>
}
```

Below is the output after vectorization, BEFORE and AFTER this PR.

BEFORE (note `tensor.empty` and the fact that `%arg1` is not used):
```mlir
  func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %c0 = arith.constant 0 : index
    %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32>
    %1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32>
    %2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32>
    %3 = tensor.empty() : tensor<2x4x16x2xf32>
    %c0_0 = arith.constant 0 : index
    %4 = vector.transfer_write %2, %3[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32>
    return %4 : tensor<2x4x16x2xf32>
  }
```

AFTER (note that `%arg1` is correctly used):
```mlir
func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {
  %cst = arith.constant 0.000000e+00 : f32
  %c0 = arith.constant 0 : index
  %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32>
  %1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32>
  %2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32>
  %c0_0 = arith.constant 0 : index
  %3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32>
  return %3 : tensor<2x4x16x2xf32>
}
```

ADDITIONAL CHANGES:
  * Adds missing `CHECK-LABEL` in tests.
  * Capitalize LIT test variables names.

[mlir][linalg] Add support for scalable vectorization of `linalg.batch_mmt4d` (#152984)

2025-08-14T09:47:51+00:00

This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.

---------

Signed-off-by: Ege Beysel