| Age | Commit message (Collapse) | Author |
|
memref (#166959)
Vectorization of a 1-d reduction where the output variable is a 1-ranked
memref can generate an invalid `vector.transfer_write` with no indices
for the memref, e.g.:
vector.transfer_write"(%vec, %buff) <{...}> : (vector<f32>,
memref<1xf32>) -> ()
This patch solves the problem by providing the expected amount of
indices (i.e. matching the rank of the memref).
|
|
Simplify `createReadOrMaskedRead` to only require _one_ argument to
specify the vector type to read (passed as `VectorType`) instead of
passing vector-sizes and scalable-flags independently (i.e. _two_
arguments).
A simple overload is provided for users that wouldn't re-use the
corresponding `VectorType` (and hence there's no point for them
to create). While there are no users upstream for this overload,
it's been helpful downstream.
|
|
This patch changes `vectorizeAsTensorPackOp` to require users to specify
**all** write-side vector sizes for `linalg.pack` (not just the outer
dimensions). This makes `linalg.pack` vectorization consistent with
`linalg.unpack` (see https://github.com/llvm/llvm-project/pull/149293
for a similar change).
Conceptually, `linalg.pack` consists of these high-level steps:
* **Read** from the source tensor using `vector.transfer_read`.
* **Re-associate** dimensions of the read value, as specified by
the op (via `vector.shape_cast`)
* **Transpose** the re-associated value according to the permutation
in the `linalg.pack` op (via `vector.transpose`).
* **Write** the result into the destination tensor via
`vector.transfer_write`.
Previously, the vector sizes provided by the user were interpreted as
write-vector-sizes for PackOp **_outer_** dims (i.e. the final step
above). These were used to:
* Infer read-vector-sizes using the `inner_tiles` attribute of PackOp.
* Deduce vector sizes for the transpose and shape cast operations.
* Ultimately determine the vector shape for the read.
However, this logic breaks when one or more tile sizes are dynamic (*).
In such cases, `vectorizePackOpPrecondition` would currently fail (see
`@pack_with_dynamic_dims_and_dynamic_inner_tile` added in this PR -
without this change it will crash).
This patch updates the contract: users now directly specify _all_ the
"write-vector-sizes", which inherently encode all inner tile sizes -
including dynamic ones. It becomes the user's responsibility to provide
valid sizes.
In practice, since `linalg.pack` is typically constructed, tiled, and
vectorized by the same transformation pipeline, the necessary
"write-vector-sizes" should be recoverable.
Notes for reviewers:
* See test updates for user-facing impact.
* Review `vectorizeAsTensorPackOp` as a new implementation rather than
a diff.
* Comments and variable names were updated to align with
`vectorizeAsTensorUnPackOp`.
(*) As a concrete example, "scalable" tile sizes are represent as
dynamic values. Note, support for "scalable" vectorisation will be added
in a separate PR.
|
|
Use default values instead of lambdas when possible. `std::nullopt` and
`nullptr` can be used now because of
https://github.com/llvm/llvm-project/pull/165724.
|
|
|
|
`assumeDynamicDimsMatchVecSizes ` (#160839)
The idea from #146531 was to introduce the flag
`assumeDynamicDimsMatchVecSizes`, to signal the vectorizer that the
access should not be masked and is in-bounds. Though the masking part is
handled, `xfer_read/write` ops are created without explicitly setting
the inbounds attribute, which defaults to all-false.
In the existence of scalable tile sizes, subsequent patterns tend to
overwrite the inbounds attribute and introduce masks further down when
lowered to loads and stores. This PR explicitly sets the inbounds
attribute to all-true for `xfer_read/write` ops if the
`assumeDynamicDimsMatchVecSizes` flag is set.
---------
Signed-off-by: Ege Beysel <beyselege@gmail.com>
|
|
Vectorization.cpp (NFC)
|
|
This patch makes sure that in the absence of an explicit pad value in
`linalg.pack`, the vectorizer will use `ub.poison` for the corresponding
Xfer Op pad value (as opposed to e.g. `arith.constant 0`).
Also, in the case of `linalg.unpack`, use `ub.poison` for the Xfer read
operation. In this case, there is no mechanism for a user to specify the
pad/pass-thru value.
|
|
NOTE: See #149156 for a smilar change for `linalg.unpack`
This PR makes sure that we don't generate unnecessary `tensor.empty`
when vectorizing `linalg.pack`.
To better visualize the changes implemented here, consider this IR:
```mlir
func.func @example(
%src: tensor<64x4xf32>,
%dest: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {
%pack = linalg.pack %src
outer_dims_perm = [1, 0]
inner_dims_pos = [0, 1]
inner_tiles = [16, 2]
into %dest : tensor<64x4xf32> -> tensor<2x4x16x2xf32>
return %pack : tensor<2x4x16x2xf32>
}
```
Below is the output after vectorization, BEFORE and AFTER this PR.
BEFORE (note `tensor.empty` and the fact that `%arg1` is not used):
```mlir
func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {
%cst = arith.constant 0.000000e+00 : f32
%c0 = arith.constant 0 : index
%0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32>
%1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32>
%2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32>
%3 = tensor.empty() : tensor<2x4x16x2xf32>
%c0_0 = arith.constant 0 : index
%4 = vector.transfer_write %2, %3[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32>
return %4 : tensor<2x4x16x2xf32>
}
```
AFTER (note that `%arg1` is correctly used):
```mlir
func.func @example(%arg0: tensor<64x4xf32>, %arg1: tensor<2x4x16x2xf32>) -> tensor<2x4x16x2xf32> {
%cst = arith.constant 0.000000e+00 : f32
%c0 = arith.constant 0 : index
%0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<64x4xf32>, vector<64x4xf32>
%1 = vector.shape_cast %0 : vector<64x4xf32> to vector<4x16x2x2xf32>
%2 = vector.transpose %1, [2, 0, 1, 3] : vector<4x16x2x2xf32> to vector<2x4x16x2xf32>
%c0_0 = arith.constant 0 : index
%3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0, %c0_0, %c0_0] {in_bounds = [true, true, true, true]} : vector<2x4x16x2xf32>, tensor<2x4x16x2xf32>
return %3 : tensor<2x4x16x2xf32>
}
```
ADDITIONAL CHANGES:
* Adds missing `CHECK-LABEL` in tests.
* Capitalize LIT test variables names.
|
|
`linalg.batch_mmt4d` (#152984)
This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.
---------
Signed-off-by: Ege Beysel <beyselege@gmail.com>
|
|
Removes the `(batch_)matmul_transpose_{a|b}` variants from OpDSL and
replace it with `matmul affine_maps [...]` whenever appropriate. This is
in line with the
[plan](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863),
and can be done since #104783 merged.
See:
https://discourse.llvm.org/t/deprecate-batch-matmul-transpose-a-b-linalg-operations/87245
Issues investigated:
* pad transform tests that could use `matmul` instead, so change to
that.
* ArmSME test using transpose actually needed it, so changed to `matmul`
+ affine maps.
Arm tests validated by @banach-space (thanks!!).
|
|
This patch updates `vectorizeAsTensorUnpackOp` to support scalable
vectorization by requiring user-specified vector sizes for the _read_ operation
(rather than the _write_ operation) in `linalg.unpack`.
Conceptually, `linalg.unpack` consists of these high-level steps:
* **Read** from the source tensor using `vector.transfer_read`.
* **Transpose** the read value according to the permutation in the
`linalg.unpack` op (via `vector.transpose`).
* **Re-associate** dimensions of the transposed value, as specified by the op
(via `vector.shape_cast`)
* **Write** the result into the destination tensor via
`vector.transfer_write`.
Previously, the vector sizes provided by the user were interpreted as
write-vector sizes. These were used to:
* Infer read-vector sizes using the `inner_tiles` attribute of the unpack op.
* Deduce vector sizes for the transpose and shape cast operations.
* Ultimately determine the vector shape for the write.
However, this logic breaks when one or more tile sizes are dynamic. In such
cases, `vectorizeUnPackOpPrecondition` fails, and vectorization is rejected.
This patch switches the contract: users now directly specify the
"read-vector-sizes", which inherently encode all inner tile sizes - including
dynamic ones. It becomes the user's responsibility to provide valid sizes.
In practice, since `linalg.unpack` is typically constructed, tiled, and
vectorized by the same transformation pipeline, the necessary
"read-vector-sizes" should be recoverable.
|
|
linalg.unpack (#151503)
This patch introduces a new helper, `getCollapsedVecType`, and updates
`vectorizeAsTensorUnpackOp` to use it. The motivation stems from improving how
`vector.shape_cast` operations are generated when vectorizing `linalg.unpack`.
Previously, the vectorizer relied on
* `tensor::CollapseShapeOp::inferCollapsedType`
to compute the collapsed vector type. This approach is suboptimal
because:
* `inferCollapsedType` lacks awareness of scalable vector flags.
* Linalg vectorization should not depend on Tensor dialect utilities.
Instead of relocating `inferCollapsedType`, we introduce
`getCollapsedVecType` — a lightweight, specialized hook that:
* Assumes no dynamic sizes.
* Handles scalable flags alongside shape dimensions.
This change also reduces temporary variables in
`vectorizeAsTensorUnpackOp` and paves the way for a cleaner update in
#149293.
|
|
In https://github.com/llvm/llvm-project/pull/149156, I ensured that we
no longer generate spurious `tensor.empty` ops when vectorizing
`linalg.unpack`.
This follow-up removes leftover code that is now redundant but was
missed in the original PR and in #150602 that was also meant to clean-up
left-over code.
Note, this is removing code to compute "write-vector-sizes". Instead,
these are fully inferred from previous Ops.
|
|
In https://github.com/llvm/llvm-project/pull/149156, I ensured that we
no longer generate spurious `tensor.empty` ops when vectorizing
`linalg.unpack`.
This follow-up removes leftover code that is now redundant but was
missed in the original PR.
|
|
See https://github.com/llvm/llvm-project/pull/147168 for more info.
|
|
See https://github.com/llvm/llvm-project/pull/147168 for more info.
|
|
Change local variants to use new central one.
|
|
See https://github.com/llvm/llvm-project/pull/147168 for more info.
|
|
Extends linalg vectorizer with a path to lower contraction ops directly
into `vector.contract`.
The direct rewriting preserves high-level op semantics and provides more
progressive lowering compared to reconstructing contraction back from
multi dimensional reduction.
The added lowering focuses on named linalg ops and leverages their well
defined semantics to avoid complex precondition verification.
The new path is optional and disabled by default to avoid changing the
default vectorizer behavior.
|
|
This patch adds support for scalable vectorization of linalg.mmt4d. The
key design change is the introduction of a new vectorizer state variable:
* `assumeDynamicDimsMatchVecSizes`
...along with the corresponding Transform dialect attribute:
* `assume_dynamic_dims_match_vec_sizes`.
This flag instructs the vectorizer to assume that dynamic memref/tensor
dimensions match the corresponding vector sizes (fixed or scalable). With this
assumption, masking becomes unnecessary, which simplifies the lowering pipeline
significantly.
While this assumption is not universally valid, it typically holds for
`linalg.mmt4d`. Inputs and outputs are explicitly packed using `linalg.pack`,
and this packing includes padding, ensuring that dimension sizes align with
vector sizes (*).
* Related discussion: https://github.com/llvm/llvm-project/issues/143920
An upcoming patch will include an end-to-end test that leverages scalable
vectorization of linalg.mmt4d to demonstrate the newly enabled functionality.
This would not be feasible without the changes introduced here, as it would
otherwise require additional logic to handle complex - but ultimately redundant
- masks.
(*) This holds provided that the tile sizes used for packing match the vector
sizes used during vectorization. It is the user’s responsibility to enforce
this.
|
|
This PR makes sure that we don't generate unnecessary `tensor.empty`
when vectorizing `linalg.unpack`.
To better visualize the changes implemented here, consider this IR:
```mlir
func.func @example(
%source: tensor<8x4x16x16xf32>,
%dest: tensor<64x127xf32>) -> tensor<64x127xf32> {
%res = linalg.unpack %source
outer_dims_perm = [1, 0]
inner_dims_pos = [0, 1]
inner_tiles = [16, 16]
into %dest : tensor<8x4x16x16xf32> -> tensor<64x127xf32>
return %res : tensor<64x127xf32>
}
```
Below is the output after vectorization, BEFORE and AFTER this PR.
BEFORE (note `tensor.empty` and the fact that `%arg1` is not used):
```mlir
func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> {
%cst = arith.constant 0.000000e+00 : f32
%c0 = arith.constant 0 : index
%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32>
%1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32>
%2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32>
%3 = tensor.empty() : tensor<64x127xf32>
%c0_0 = arith.constant 0 : index
%4 = vector.transfer_write %2, %3[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32>
return %4 : tensor<64x127xf32>
}
```
AFTER (note that `%arg1` is correctly used):
```mlir
func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> {
%cst = arith.constant 0.000000e+00 : f32
%c0 = arith.constant 0 : index
%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32>
%1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32>
%2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32>
%c0_0 = arith.constant 0 : index
%3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32>
return %3 : tensor<64x127xf32>
}
```
|
|
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
265 | isa<VectorType>(operandType) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
266 | "Unexpected non-vector ShapedType");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
transfer ops (#146544)
This patch is a follow up to https://github.com/llvm/llvm-project/pull/146088 and changes the padding value in the linalg vectorizer from `0` to `ub.poison` in `vector.transfer_read`s created for extracting slices or when vectorizing a generic.
Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
|
|
`vector.transfer_read` prefer `ub.poison` (#146088)
Context:
`vector.transfer_read` always requires a padding value. Most of its
builders take no `padding` value and assume the safe value of `0`.
However, this should be a conscious choice by the API user, as it makes
it easy to introduce bugs.
For example, I found several occasions while making this patch that the
padding value was not getting propagated (`vector.transfer_read` was
transformed into another `vector.transfer_read`). These bugs, were
always caused because of constructors that don't require specifying
padding.
Additionally, using `ub.poison` as a possible default value is better,
as it indicates the user "doesn't care" about the actual padding value,
forcing users to specify the actual padding semantics they want.
With that in mind, this patch changes the builders in
`vector.transfer_read` to always having a `std::optional<Value> padding`
argument. This argument is never optional, but for convenience users can
pass `std::nullopt`, padding the transfer read with `ub.poison`.
---------
Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
|
|
llvm::is_contained is shorter than llvm::all_of plus a lambda.
|
|
Updates the linalg::vectorize function to return a
`FailureOr<VectorizationResult>` containing the values to replace the
original operation, instead of directly replacing the original
operation. This aligns better with the style of transforms used with the
TilingInterface, and gives more control to users over the lowering,
since it allows for additional transformation of the IR before
replacement.
There was already a `VectorizationResult` defined, which was used for
the internal vectorize implementation using `CustomVectorizationHook`s,
so the old struct is renamed to `VectorizationHookResult`.
Note for integration: The replacement of the original operation is now
the responsibility of the caller, so wherever `linalg::vectorize` is
used, the caller must also do
`rewriter.replaceOp(vectorizeResults->replacements)`.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
|
|
generic interface (#145313)
Refactor the verifiers to make use of the common bits and make
`vector.contract` also use this interface.
In the process, the confusingly named getStaticShape has disappeared.
Note: the verifier for IndexingMapOpInterface is currently called
manually from other verifiers as it was unclear how to avoid it taking
precedence over more meaningful error messages
|
|
This patch removes `inputVecSizesForLeadingDims` from the parameter list
of `createWriteOrMaskedWrite`. That argument is unnecessary - vector
sizes can be obtained from the `vecToStore` parameter. Since this doesn't
change behavior or test results, it's marked as NFC.
Additional cleanups:
* Renamed `vectorToStore` to `vecToStore` for consistency and brevity.
* Rewrote a conditional at the end of the function to use early exit,
improving readability:
```cpp
// BEFORE:
if (maskingRequried) {
Value maskForWrite = ...;
write = maskOperation(write, maskForWrite);
}
return write;
// AFTER
if (!maskingRequried)
return write;
Value maskFroWrite = ...;
return vector::maskOperation(builder, write, maskForWrite);
```
|
|
We don't need lambdas here.
|
|
This patch refactors two vectorization hooks in Vectorization.cpp:
* `createWriteOrMaskedWrite` gains a new parameter for write indices,
aligning it with its counterpart `createReadOrMaskedRead`.
* `vectorizeAsInsertSliceOp` is updated to reuse both of the above
hooks, rather than re-implementing similar logic.
CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
* https://github.com/llvm/llvm-project/pull/122927
* https://github.com/llvm/llvm-project/pull/123031
At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.
CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
vector to store, via named variables like `destType`/`vecToStoreType`,
`destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
`in_bounds`. For example, the size of the `in_bounds` attr now matches
the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
hold in many real examples.
* Deduces mask dimensions from `vecToStoreShape` (vector) instead of
`destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)
NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.
Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
{in_bounds = [true, true, true]}
: vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```
Also, without this hook, tests are more complicated and require more
matching.
VECTORIZATION BEHAVIOUR
-----------------------
This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
available.
* The vectorizer continues to infer vector sizes where needed.
Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* https://github.com/llvm/llvm-project/issues/142107
This will be addressed separately.
TEST CHANGES
-----------
Only affects vectorization of:
* `tensor.insert_slice` (now refactored to use shared hooks)
Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see #138265).
NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.
TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
* Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
VectorUtils.cpp)
* Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
* Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
updated test in transform-vector.mlir for an example that would
benefit from this.
* Address #142107
(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
|
|
This patch fixes:
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1544:14: error:
unused variable 'destType' [-Werror,-Wunused-variable]
|
|
This patch updates `createWriteOrMaskedWrite` to make it consistent with
`createReadOrMaskedRead`.
Before diving into the details: note that these utilities are currently
implemented in different files — "VectorUtils.cpp" (Vector) and
"Vectorization.cpp" (Linalg). In a subsequent patch, I plan to move
`createWriteOrMaskedWrite` into "VectorUtils.cpp".
SUMMARY OF CHANGES:
The main change is to remove the logic that creates the destination
tensor, which previously looked like:
```cpp
Value dest = builder.create<tensor::EmptyOp>(loc, destSizes,
inputType.getElementType());
```
With this patch, createWriteOrMaskedWrite now simply generates:
```mlir
%res = vector.transfer_write %vectorToStore into %dest
```
This replaces the previous form:
```mlir
%dest = tensor.empty(%destSizes)
%res = vector.transfer_write %vectorToStore into %dest
```
In other words, the destination value `%dest` is now passed as an input
parameter. This makes `createWriteOrMaskedWrite` re-usable in contexts
where the destination tensor is already known — for example, in
`vectorizeAsInsertSliceOp`, which I will update in a follow-up patch.
OTHER CHANGES:
* Added comments and clarified TODOs.
* Updated tests: since destination sizes are now computed independently
inside `createWriteOrMaskedWrite`, some additional `tensor.dim` ops
appear. These will be cleaned up by CSE + canonicalization.
|
|
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)
This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).
Updated operations:
* `vector.transfer_read`,
* `vector.transfer_write`.
For reference, these ops already use `base`:
* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
`vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
`vector.maskedload`.
This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.
To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.
Implements #131602
|
|
(#135350)
The semantics of `createReadOrMaskedRead` and `createWriteOrMaskedWrite`
are currently a bit inconsistent and not fully documented:
* The input vector sizes are passed as `readShape` and
`inputVectorSizes`, respectively — inconsistent naming.
* Currently, the input vector sizes in `createWriteOrMaskedWrite` are
not required to be complete: any missing trailing sizes are inferred
from the destination tensor. This only works when the destination
tensor is statically shaped.
* Unlike `createReadOrMaskedRead`, the documentation for
`createWriteOrMaskedWrite` does not specify that write offsets are
hard-coded to 0.
This PR only updates the documentation and unifies the naming. As such,
it is NFC.
A follow-up PR will generalize and unify the implementation to support,
for example, dynamically shaped destination tensors — a requirement for
enabling scalable vectorization of `linalg.pack` and `linalg.unpack`.
|
|
(#134206)
This change standardises the naming convention for the argument
representing the value to store in various vector operations.
Specifically, it ensures that all vector ops storing a value—whether
into memory, a tensor, or another vector — use `valueToStore` for the
corresponding argument name.
Updated operations:
* `vector.transfer_write`, `vector.insert`, `vector.scalable_insert`,
`vector.insert_strided_slice`.
For reference, here are operations that currently use `valueToStore`:
* `vector.store` `vector.scatter`, `vector.compressstore`,
`vector.maskedstore`.
This change is non-functional (NFC) and does not affect the
functionality of these operations.
Implements #131602
|
|
We currently do not have masked vectorization support for tenor.pad with
low padding. However, we can allow this in the special case where the
result dimension after padding is a unit dim. The reason is when we
actually have a low pad on a unit dim, the input size of that dimension
will be (or should be for correct IR) dynamically zero and hence we will
create a zero mask which is correct. If the low pad is dynamically zero
then the lowering is correct as well.
---------
Signed-off-by: Nirvedh <nirvedh@gmail.com>
|
|
In corner situations, the vectorization pass may face to lower a conv2d
op and assert in a completely irrelevant location in
vectorizeConvolution() subroutine.
~~This PR rejects the conv2d op early and make the asserted routine to
return failure as a defensive workaround.~~
In addressing this, the PR moved all condition check away from the
`Conv1dGenerator` into the `convOpPreconditionCheck()` function. This
makes the unsupported ops such as conv2d to be rejected early and leave
a cleaner `Conv1dGenerator` constructor.
|
|
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This change involves significant churn but only relocates existing code - no new
functionality is added.
**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
* Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
* Tests: `s/tensor.(un)pack/linalg.(un)pack/g`
No other modifications should be required.
|
|
(2/N) (#123031)
For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:
```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
%slice into %dest[0, %c2]
[5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>
// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
: tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
: vector<8x1xi32>, tensor<5x3xi32>
```
This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in #122927, which introduced masking for the `vector.transfer_read`.
|
|
(1/N) (#122927)
For context, `tensor.insert_slice` is vectorized using a
`vector.transfer_read` + `vector.transfer_write` pair.
An unmasked example is shown below:
```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
%slice into %dest[0, %c2]
[5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>
// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
: tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
: vector<8x1xi32>, tensor<5x3xi32>
```
This PR refactors `InsertSliceVectorizePattern` (which is used to
vectorize `tensor.extract_slice`) to enable masked vectorization. ATM,
only `vector.transfer_read` is masked. If `vector.transfer_write` also
requires masking, the vectorizer will bail out. This will be addressed
in a sub-sequent PR.
Summary of changes:
* Added an argument to specify vector sizes (behavior remains
unchanged if vector sizes are not specified).
* Renamed `InsertSliceVectorizePattern` to `vectorizeAsInsertSliceOp`
and integrated into (alongside other hooks for vectorization) in
`linalg::vectorize`.
* Removed `populateInsertSliceVectorizationPatterns`, as
`InsertSliceVectorizePattern` was its only pattern.
* Updated `vectorizeAsInsertSliceOp` to support masking for the
"read" operation.
* Updated `@pad_and_insert_slice_dest` in
"vectorization-pad-patterns.mlir" to reflect the removal of
`populateInsertSliceVectorizationPatterns` from
`ApplyPadVectorizationPatternsOps`.
|
|
(#123714)
In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.
See
https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.
The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
|
|
This patch removes an assert in `vectorizeTensorExtract` that was
blocking
the vectorization of 0-D tensor.extract operations, e.g.:
```mlir
%1 = tensor.extract %src[] : tensor<f32>
```
As demonstrated by the included tests, this case is already effectively
supported.
**Context**
The removed assert was introduced in #109580 as a guard, pending proper
support
and testing for 0-D tensors. This PR addresses that previously
undocumented
TODO. Apologies for the oversight!
**Updates and Tests**
* Revised the existing test `@negative_no_loop` to ensure the
`vectorize_nd_extract` attribute is included, allowing the vectorizer
to process it. The test was renamed and variables updated for clarity.
* Added a new test `@extract_scalar_from_0d_into_1d` to cover "mixed"
0-D/1-D tensor extraction, e.g.:
```mlir
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel"]
} outs(%init : tensor<1xf32>) {
^bb0(%in: f32):
%1 = tensor.extract %src[] : tensor<f32>
linalg.yield %1 : f32
} -> tensor<1xf32>
return %res : tensor<1xf32>
```
**Additional updates**
I also took the liberty and improved test coverage for 0-D tensor in the
vectorizer tests:
* Added a specific test for "0D linalg.generic" in
"vectorization-with-patterns.mlir".
* Renamed several tests in "vectorization-with-patterns.mlir" to clarify
that the 0-D case is now covered.
|
|
The example below demonstrates a "scalar read followed by a broadcast"
pattern for `tensor.extract`:
```mlir
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
func.func @scalar_broadcast(
%init : tensor<1x1x3xi32>,
%src: tensor<1x3x2x4xi32>,
%idx :index) -> tensor<1x1x3xi32> {
%c0 = arith.constant 0 :index
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel", "parallel"]}
outs(%init : tensor<1x1x3xi32>) {
^bb0(%out: i32):
%val = tensor.extract %src[%idx, %idx, %idx, %idx] : tensor<1x3x2x4xi32>
linalg.yield %val : i32
} -> tensor<1x1x3xi32>
return %res : tensor<1x1x3xi32>
}
```
The default masking path within the Linalg vectorizer, which assumes an
identity masking map, is not suitable here. Indeed:
* identity != broadcast.
This patch ensures masking is handled in the `vectorizeTensorExtract`
hook, which has the necessary context for proper handling.
Fixes #116197
|
|
Currently, the Linalg vectorizer disallows non-trailing parallel
dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (*), for cases
like:
```mlir
%0 = linalg.fill ins(%arg0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32>
```
This restriction exists to avoid generating "scalable" arrays of
aggregates, which LLVM does not support (multi-dim vectors are lowered
into arrays of aggregates at the LLVM level).
This patch relaxes that restriction when the trailing parallel vector
dimension is `1`, e.g., for `vector_sizes [[8], 1]`. Such cases are safe
since trailing unit dimensions can be collapsed. This relaxation is
necessary to support scalable vectorization for tensor.pack, where inner
tile sizes are `[8]` (scalable) and `1` (scalar).
(*) Transform Dialect notation
|
|
transformation (#117329)
Currently, `GeneralizePadOpPattern` is grouped under
`populatePadOpVectorizationPatterns`. However, as noted in #111349, this
transformation "decomposes" rather than "vectorizes" `tensor.pad`. As
such, it functions as:
* a vectorization _pre-processing_ transformation, not
* a vectorization transformation itself.
To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a
standalone transformation by:
* introducing a dedicated `populateDecomposePadPatterns` method,
* adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op,
* removing it from `populatePadOpVectorizationPatterns`.
In addition, to better reflect its role, it is renamed as "decomposition"
rather then "generalization". This is in line with the recent renaming
of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439.
|
|
vector.extractelement/vector.insertelement (1/N) (#116053)
This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations can be
fully represented by vector.extract/vector.insert. See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116
for more information.
Further patches will remove more usages of these ops.
|
|
obsolete OpDSL tests. (#115319)
The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which
introduces
transpose and broadcast semantic to linalg.matmul was reverted due to
two failing
OpDSL test for linalg.matmul.
Since linalg.matmul is now defined using TableGen ODS instead of
Python-based OpDSL,
these test started failing and needs to be removed/updated.
This commit removes/updates the failing obsolete tests from below files.
All other files
were part of earlier PR and just cherry picked.
"mlir/test/python/integration/dialects/linalg/opsrun.py"
"mlir/test/python/integration/dialects/transform.py"
---------
Co-authored-by: Renato Golin <rengolin@systemcall.eu>
|