| Age | Commit message (Collapse) | Author |
|
See https://github.com/llvm/llvm-project/pull/147168 for more info.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
isZeroInteger. (#139340)
The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.
For downstream users, you can update the code with the below script.
```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
|
|
In cases where there is no padding on a dim, we do not need to compute
new offsets, lengths and padding, for example the new test case added
can just be lowered to
```
%extracted_slice = tensor.extract_slice %arg0[%arg2, 1, 2] [%arg2, 2, 1] [1, 1, 1] : tensor<3x4x5xf32> to tensor<?x2x1xf32>
```
without this PR we will have affine maps like
```
#map = affine_map<()[s0] -> (3, s0)>
#map1 = affine_map<()[s0, s1] -> (-s0 + 3, s1)>
%0 = affine.min #map()[%arg2]
%1 = affine.min #map1()[%0, %arg2]
%extracted_slice = tensor.extract_slice %arg0[%0, 1, 2] [%1, 2, 1] [1, 1, 1] : tensor<3x4x5xf32> to tensor<?x2x1xf32>
```
which are unnecessary
Signed-off-by: Nirvedh <nirvedh@gmail.com>
|
|
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg
This change involves significant churn but only relocates existing code - no new
functionality is added.
**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
* Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
* Tests: `s/tensor.(un)pack/linalg.(un)pack/g`
No other modifications should be required.
|
|
|
|
|
|
Missed commiting clang-fomrat in
[#19903](https://github.com/llvm/llvm-project/pull/119039)
|
|
The current calculations calculate ending location of the new length and
then subtract the new offset from that location. It is possible to
directly calculate new length. Along with requiring less operations
(which can matter in dynamic case) this also has the advantage that the
values are upper bounded by length rather than source size which is more
friendly for range analysis. I believe the change is already being
tested by
`test/Dialect/Linalg/subtensor-of-padtensor.mlir` and
`test/Dialect/Linalg/tile-and-fuse-tensors.mlir`
---------
Signed-off-by: Nirvedh <nirvedh@gmail.com>
|
|
The `getIterationDomainTileFromOperandTile` implementation for
tensor.unpack did not clamp sizes when the unpack op had extract_slice
semantics. This PR fixes the bug.
The PR also makes a minor change to `tileAndFuseConsumerOfSlice`. When
replacing DPS inits, the iteration domain is needed, and it is computed
from the tiled version of the operation after the initial tiling
transformation. This can result in some extra indexing computation, so
the PR changes it to use the original full sized cloned consumer op.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
|
|
This fixes a bug in the tiling implementation of tensor.unpack that was
causing an infinite loop when certain unpack ops get tiled and fused as
a producer. The tiled implementation of tensor.unpack sometimes needs to
create an additional tensor.extract_slice on the result of the tiled
unpack op, but this slice was getting added to the `generatedSlices` of
the tiling result. The `generatedSlices` are used to find the next
producers to fuse, so it caused an infinite loop of fusing the same
unpack op after it was already in the loop. This fixes the bug by adding
the slice of the source instead of the result.
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
|
|
continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.
This is a breaking change
- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.
Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
|
|
`outer_dims_perm` attribute (#106687)
|
|
This adds implementations for the two TilingInterface methods required
for fusion to `tensor.pad`: `getIterationDomainTileFromResultTile` and
`generateResultTileValue`, allowing fusion of pad with a tiled consumer.
|
|
Add missing `getIterationDomainTileFromOperandTile` and `getTiledImplementationFromOperandTile` to `tensor.pack` and enable fusing it as a consumer. NOTE that, it only expects perfect tiling scenario without padding semantic currently.
|
|
|
|
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.
To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.
Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.
Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
---------
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
|
|
Reverts llvm/llvm-project#85528. This was committed without tests,
despite reviewers requesting tests to be added. The post-commit
discussion leans towards revert, which would be consistent with the
policy.
|
|
This patch adds support for consumer fusion to the tiling interface, and
implements fuse consumers on FuseIntoContainingOp.
- Add interface method 'getIterDomainTilePositionFromOperandPosition' to
tiling interface which get iteration domain position from operand
position.
- Add interface method 'getTiledImplementationFromOperandPosition' to
tiling interface which generate tiled implementation according to
operand position.
- Implemented the above two methods and supported consumer fusion for
FuseIntoContainingOp.
Signed-off-by: Donald Chen
|
|
This commit generalizes and cleans up the `ValueBoundsConstraintSet`
API. The API used to provide function overloads for comparing/computing
bounds of:
- index-typed SSA value
- dimension of shaped value
- affine map + operands
This commit removes all overloads. There is now a single entry point for
each `compare` variant and each `computeBound` variant. These functions
now take a `Variable`, which is internally represented as an affine map
and map operands.
This commit also adds support for computing bounds for an affine map +
operands. There was previously no public API for that.
|
|
We are able to fuse the pack op only if inner tiles are not tiled or
they are fully used. Otherwise, it could generate a sequence of
non-trivial ops.
Differential Revision: https://reviews.llvm.org/D157932
|
|
* Remove duplicate functions. `tensor::getMixedSize` and `tensor::getMixedSizes` should be used.
* Use `tensor::getMixedSize` instead of `createOrFold<tensor::DimOp>`. This is more efficient. `createOrFold` will create an op an immediately try to fold it. In case of a static dimension size, an attribute can be used directly.
Differential Revision: https://reviews.llvm.org/D153332
|
|
Minor cleanup to take full advantage of OpFoldResult.
Differential Revision: https://reviews.llvm.org/D153341
|
|
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D149120
|
|
This cleanup aligns the affine dialect with all the other dialects.
Differential Revision: https://reviews.llvm.org/D148687
|
|
Also, properly propagate the nofold attribute.
Differential Revision: https://reviews.llvm.org/D148114
|
|
Use `reifyValueBound` instead, which is more general not hard-coded to a specific list supported ops.
Also add a `closedUB` parameter to the ValueBoundsOpInterface API.
Differential Revision: https://reviews.llvm.org/D146356
|
|
the created `pad` operation.
Pad tiling implementation only needs to return the tiled pad
operation. The rest of the generated code is related to handling
boundary conditions.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D146439
|
|
the state of the transformed IR.
Currently the `getTiledImplementation` and `generateResultTileValue`
return just `SmallVector<Operation *>` and `FailureOr<Value>`.
- For `getTiledImplementation` returning empty implies tiling wasnt
done. There is also an implicit assumption that the tiled operation
results correspond to the tiled values of the result of the original
operation. This cannot handle cases where the tiled implementation
might use multiple operations to compute the tiled value for the
results of the untiled operation. Sometimes, the tiled operation
might not directly give the tiled values, and might require casts,
etc to get a replacement.
- For `generateResultTileValue`, it is assumed that the op defining
the returned `Value` is the operation that represents the tiled
computation. Again presence of casts, etc violate this.
Instead make these methods return
```
struct TilingResult {
SmallVector<Operation *> tiledOps;
SmallVector<Value> tiledValues;
};
```
The `tiledOps` represent the operations generated that are relevant
for subsequent transformations. The `tiledValues` represent the tiled
values for the results of the original operation. This better
transmits the state of the transformed IR.
As a consequence the following methods also return `FailureOr<TilingResult>`
- `tensor::replaceExtractSliceWithTiledProducer`
- `tensor::bubbleUpPadSlice`
Differential Revision: https://reviews.llvm.org/D145133
|
|
/data/llvm-project/mlir/lib/Dialect/Tensor/Utils/Utils.cpp:62:11: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
if (dim >= shape.size())
~~~ ^ ~~~~~~~~~~~~
1 error generated.
/data/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp:484:8: error: unused variable 'appendIndex' [-Werror,-Wunused-variable]
auto appendIndex = [&](Value val, SmallVector<Value> &dynIndices,
^
1 error generated.
|
|
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D145135
|
|
This change adds a new helper function `mlir::reifyResultShapes` that calls the corresponding interface method and also checks the result produced by the implementation when running in debug mode. Bugs due to incorrect interface implementations can be difficult to debug.
This helper function also reduces the amount of code needed at call sites: the cast to `ReifyRankedShapedTypeOpInterface` is done in the helper function.
Differential Revision: https://reviews.llvm.org/D145777
|
|
`reifyResultShapes` now returns `OpFoldResult`s instead of `Value`s. This is often more efficient because many transformations immediately attempt to extract a constant from the reified values.
Differential Revision: https://reviews.llvm.org/D145250
|
|
The sizes of input operands need being clampled only when there are
incomplete tiles, i.e., the padding value is set. The shape input slice
can be folded into constants when they are static shapes and tiling
sizes.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D144604
|
|
Do not create an Affine ops for expanded size because the affine op is
too complicated which would hit an assertion in affine ops
simplification.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D144151
|
|
The inner tiling sizes could be dynamic (which are Values). In this
context, they should be added to tiledOperand when cloning the op.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D143978
|
|
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D141977
|
|
Instead, use the builder and infer the return type based on the inner `yield` ops.
Also, fix uses that do not create the terminator as required for the callback builders.
Differential Revision: https://reviews.llvm.org/D142056
|
|
An outer dim permutation requires adjusting the offsets and sizes of the
`tensor.extract_slice` operations generated during tiling. Originally
this was done by computing an inverse permutation of the outer
permutation for both `tensor.pack` and `tensor.unpack`. For packing, the
tiling is applied on interchanged dimensions; thus, it is correct to
compute the inverse. For unpacking, on the other hand, tiling involves
the output tensor that does not have interchanged dimensions, and no
inverse is required.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D141688
|
|
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665
|
|
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D141151
|
|
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
|
|
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
|
|
The main issue of tiling unpack op is about incomplete tile. Since all
the dimensions are orthogonal, discussing 1-d unpack case is enough. The
core idea is to make the input slice have complete tiles. In this case,
a larger unpacked tile will be created. We'll need an extract_slice op
to shift and truncate the output.
Take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The
coordinates of second tile (i.e., result[15..31]) are [(1, 7), (2, 0,),
(2, 1) ... (3, 6), (3, 7)]. The first row and the last row are
incomplete in terms of inputs. It's impossible to represent an unpack op
using the coordinates. Because the input has higher rank and the math
computation of coordinate is using mod and ceilDiv. That's very tricky.
To represent the unpack op, we have to complete the rows. I.e., the
input coordinates would start with (1, 0); end with (3, 7). In this
context, the tiled unpack produces a (3 * n) elements because there are
3 rows in total. Follow by a tensor.extract_slice op, we can get the
actual result.
If the tiling sizes are multiple of inner tile sizes, it is a perfect
tiling case. In this context, the larger input and output is not needed.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D139362
|
|
Post clean-up after merger of kDynamicSize and kDynamicStrideOrOffset.
Differential Revision: https://reviews.llvm.org/D139929
|
|
We can compute the offsets and sizes for the slice of input because the
iteration domain is defined over outer loops. If the dimension is tiled,
the i-th index is the product of offset_i and inner_tile_i.
Different from tiling a pad op, we do not have to deal with reading zero
data from input. Because the tiling sizes are indicated to packed outer
dimensions. We will read either the entire tile or partial tile for each
packed tile. The scf.if and tensor.generate ops are not needed in this
context.
Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>
Reviewed By: rengolin, mravishankar
Differential Revision: https://reviews.llvm.org/D138631
|
|
resolve conflicts
Differential Revision: https://reviews.llvm.org/D138282
|
|
`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
|
|
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).
This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.
RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101
Differential Revision: https://reviews.llvm.org/D135129
|
|
Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22.
Tested with:
`ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples`
and `bazel build --config=generic_clang @llvm-project//mlir:all`.
Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini
Differential Revision: https://reviews.llvm.org/D134762
|