| Age | Commit message (Collapse) | Author |
|
As discussed before, this PR adds the basic infrastructure/boiler plate
for loop ordering strategies to be implemented.
If this looks ok, I wanted to also mention some of the heuristics that I
would implement next, if they sound reasonable to you guys:
- Parallel first : prioritize parallel loops over reduction loops
- Dense outer : prioritize the most dense loops first
- Sparse outer : the opposite, potentially useful in some cases?
There is another that I am considering, stride/memory aware, which would
prioritize loops with better stride patterns (like sequential or
linear). Not sure how well this carries over to Sparse Tensor though.
Are there any ideas/heuristics that I should definitely try to
implement?
As we discussed, I will try to incrementally add heuristics. Sorry for
the delay on my end, and thank you so much for the feedback!
---------
Co-authored-by: Aart Bik <ajcbik@google.com>
|
|
conversion (#121054)
Use the regular dialect conversion driver instead of the 1:N dialect
conversion driver. The 1:N dialect conversion driver will be removed
soon.
|
|
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
|
|
**DO NOT MERGE** until https://github.com/llvm/llvm-project/pull/89003
|
|
|
|
|
|
In order to support various external frameworks (JAX vs PyTorch) we need
a bit more flexibility in [dis]assembling external buffers to and from
sparse tensors in MLIR land. This PR adds a direct-out option that
avoids the rigid pre-allocated for copy-out semantics.
Note that over time, we expect the [dis]assemble operations to converge
into something that supports all sorts of external frameworks. Until
then, this option helps in experimenting with different options.
|
|
Operations must be created with the supplied builder. Otherwise, the
dialect conversion / greedy pattern rewrite driver can break.
This commit fixes a crash in the dialect conversion:
```
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: error: failed to legalize operation 'tosa.add'
%0 = tosa.add %1, %arg2 : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
^
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: note: see current operation: %9 = "tosa.add"(%8, %arg2) : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
mlir-opt: llvm-project/mlir/include/mlir/IR/UseDefLists.h:198: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
```
This commit is the proper fix for #87297 (which was reverted).
|
|
|
|
Similar to the emit_c_interface, this pull request adds a pass that
converts public entry methods that use sparse tensors as input
parameters and/or output return values into wrapper functions that
[dis]assemble the individual tensors that constitute the actual storage
used externally into MLIR sparse tensors. This pass can be used to
prepare the public entry methods of a program that is compiled by the
MLIR sparsifier to interface with an external runtime, e.g., when
passing sparse tensors as numpy arrays from and to Python. Note that
eventual bufferization decisions (e.g. who [de]allocates the underlying
memory) should be resolved in agreement with the external runtime
(Python, PyTorch, JAX, etc.)
|
|
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
|
|
(#71840)
…ffine subscript expressions.
|
|
(#71880)
Note that the (dis)assemble operations still make some simplfying
assumptions (e.g. trailing 2-D COO in AoS format) but now at least both
the direct IR and support library path behave exactly the same.
Generalizing the ops is still TBD.
|
|
The flag seems to be doing practically the same thing for zero cost and
pinned dma. In addition, the register host is not truly the right zero
cost mechanism according to Thomas. So we are simplifying the setup for
now, until we have a better definition for what to implement and test.
https://github.com/llvm/llvm-project/issues/64316
|
|
|
|
|
|
The interesting stuff is of course still coming ;-)
|
|
As a side effect of the change, it also unifies the convertOp
implementation between lib/codegen path.
|
|
(#68436)
…to simple steps
|
|
Rationale: the operation does not always sort COO tensors (also used for
sparse_tensor.compress for example).
|
|
The use cases of the two operations are largely overlapped, let's
simplify it and only use one of them.
|
|
Many frontends canonicalize scalar into 0-ranked tensor, it change will
hopefully make the operation easier to use for those cases.
|
|
strategies (regular dma default)
Differential Revision: https://reviews.llvm.org/D155352
|
|
These two headers both contained a strange mix of definitions related to
both patterns and non-pattern transforms. Put patterns and "populate"
functions into Patterns.h and standalone transforms into Transforms.h.
Depends On: D155223
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155454
|
|
The sparse compiler now has two prototype strategies for GPU acceleration:
* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls
This revision introduces the first steps required for the second approach.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150170
|
|
This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.
The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D147483
|
|
create-deallocs in BufferizationOption.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D147010
|
|
reduction-based codegen
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D142927
|
|
Fixes https://github.com/llvm/llvm-project/issues/59970
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D142290
|
|
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D140122
|
|
Reviewed By: aartbik, wrengr
Differential Revision: https://reviews.llvm.org/D139591
|
|
This brings back previous SIMD functionality, but in a separate pass.
The idea is to improve this new pass incrementally, going beyond for-loops
to while-loops for co-iteration as welll (masking), while introducing new
abstractions to make the lowering more progressive. The separation of
sparsification and vectorization is a very good first step on this journey.
Also brings back ArmSVE support
Still to be fine-tuned:
+ use of "index" in SIMD loop (viz. a[i] = i)
+ check that all ops really have SIMD support
+ check all forms of reductions
+ chain reduction SIMD values
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D138236
|
|
PostSparsificationRewrite.
Reviewed By: aartbik, wrengr
Differential Revision: https://reviews.llvm.org/D138153
|
|
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D138054
|
|
Refactor the rewriting of sparse_tensor.sort to support the implementation of
sparse_tensor.sort_coo.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137522
|
|
sparse-tensor-codegen pass.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137733
|
|
memory buffers for sparse tensors to support debugging.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137592
|
|
Sparse compiler used to generate vectorized code for sparse tensors computation, but it should really be delegated to other vectorization passes for better progressive lowering.
https://discourse.llvm.org/t/rfc-structured-codegen-beyond-rectangular-arrays/64707
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D136183
|
|
rules for operators foreach and convert.
This is to help simplify FileCheck tests for sparse-tensor-rewrite.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D136093
|
|
Makes individual testing and debugging easier.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D135319
|
|
buffer rewriter.
Reviewed By: aartbik, Peiming
Differential Revision: https://reviews.llvm.org/D134777
|
|
Add sparse-buffer-rewrite pass to rewrite sparse primitives on buffers to MLIR
implementation.
Add sparse rewrite rule for the sort operator.
Add FileCheck test and integration test.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D134627
|
|
Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22.
Tested with:
`ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples`
and `bazel build --config=generic_clang @llvm-project//mlir:all`.
Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini
Differential Revision: https://reviews.llvm.org/D134762
|
|
This revision also adds convenience methods to test the
dim level type/property (with the codegen being first client)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D134776
|
|
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D134090
|
|
pipeline
Add new option (enable-runtime-library) to sparse compiler pipeline, it allows us to decide whether we need to rewrite operations (e.g., concatenate, reshape) within sparsification (when using codegen) or convert them after sparsification (when using runtime library).
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D133597
|
|
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D133454
|
|
sparse tensors.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D133390
|
|
To address comment in https://reviews.llvm.org/D133241
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D133363
|
|
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D133241
|