summaryrefslogtreecommitdiff
path: root/mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
AgeCommit message (Collapse)Author
2025-10-06[mlir] Simplify unreachable type switch cases. NFC. (#162032)Jakub Kuderski
Use `DefaultUnreachable` from https://github.com/llvm/llvm-project/pull/161970.
2025-09-14[mlir][LLVM] Add LLVMAddrSpaceAttrInterface and NVVMMemorySpaceAttr (#157339)Fabian Mora
This patch introduces the `LLVMAddrSpaceAttrInterface` for defining compatible LLVM address space attributes To test this interface, this patch also adds: - Adds NVVMMemorySpaceAttr implementing both LLVMAddrSpaceAttrInterface and MemorySpaceAttrInterface - Converts NVVM memory space constants from enum to MLIR enums - Updates all NVVM memory space references to use new attribute system - Adds support for NVVM memory spaces in ptr dialect translation Example: ```mlir llvm.func @nvvm_ptr_address_space( !ptr.ptr<#nvvm.memory_space<global>>, !ptr.ptr<#nvvm.memory_space<shared>>, !ptr.ptr<#nvvm.memory_space<constant>>, !ptr.ptr<#nvvm.memory_space<local>>, !ptr.ptr<#nvvm.memory_space<tensor>>, !ptr.ptr<#nvvm.memory_space<shared_cluster>> ) -> !ptr.ptr<#nvvm.memory_space<generic>> ``` Translating the above code to LLVM produces: ```llvm declare ptr @nvvm_ptr_address_space(ptr addrspace(1), ptr addrspace(3), ptr addrspace(4), ptr addrspace(5), ptr addrspace(6), ptr addrspace(7)) ``` To convert the memory space enum to the new enum class use: ```bash grep -r . -e "NVVMMemorySpace::kGenericMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGenericMemorySpace/NVVMMemorySpace::Generic/g" grep -r . -e "NVVMMemorySpace::kGlobalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGlobalMemorySpace/NVVMMemorySpace::Global/g" grep -r . -e "NVVMMemorySpace::kSharedMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedMemorySpace/NVVMMemorySpace::Shared/g" grep -r . -e "NVVMMemorySpace::kConstantMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kConstantMemorySpace/NVVMMemorySpace::Constant/g" grep -r . -e "NVVMMemorySpace::kLocalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kLocalMemorySpace/NVVMMemorySpace::Local/g" grep -r . -e "NVVMMemorySpace::kTensorMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kTensorMemorySpace/NVVMMemorySpace::Tensor/g" grep -r . -e "NVVMMemorySpace::kSharedClusterMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedClusterMemorySpace/NVVMMemorySpace::SharedCluster/g" ``` NOTE: A future patch will add support for ROCDL, it wasn't added here to keep the patch small.
2025-08-24[mlir][amdgpu] Promote gpu.shuffle to amdgpu.permlane_swap (#154933)Tim Gymnich
- promote `gpu.shuffle %src xor {16,32} 64` to `amdgpu.permlane_swap %src {16,32}`
2025-07-25[mlir] Switch to new LDBG macro (#150616)Jacques Pienaar
Change local variants to use new central one.
2025-07-21[mlir][NFC] update `mlir/Dialect` create APIs (16/n) (#149922)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-07[mlir][GPU][transform] Add gpu_to_rocdl conversion pattern (#146962)Nicolas Vasilache
Co-authored-by: Son Tuan Vu <vuson@google.com>
2025-07-07[mlir][SCF][GPU] Add DeviceMaskingAttrInterface (#146943)Nicolas Vasilache
This revision adds DeviceMaskingAttrInterface and extends DeviceMappingArrayAttr to accept a union of DeviceMappingAttrInterface and DeviceMaskingAttrInterface. Support is added to GPUTransformOps to take advantage of this information and lower to block/warpgroup/warp/thread specialization when mapped to linear ids. The revision also connects to scf::ForallOp and uses the new attribute to implement warp specialization. The implementation is in the form of a GPUMappingMaskAttr, which can be additionally passed to the scf.forall.mapping attribute to specify a mask on compute resources that should be active. In the first implementation the masking is a bitfield that specifies for each processing unit whether it is active or not. In the future, we may want to implement this as a symbol to refer to dynamically defined values. Extending op semantics with an operand is deemed too intrusive at this time. --------- Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
2025-07-07[mlir] NFC - refactor id builder and avoid leaking impl details (#146922)Nicolas Vasilache
2025-07-07[mlir][gpu][transforms] Add support for mapping to lanes (#146912)Nicolas Vasilache
This revision adds a new attribute for mapping `scf.forall` to linear lane ids. Example: ``` // %arg2 and %arg3 map to lanes [0, 6) and are turned into epxressions // involving threadIdx.x/y by the map_nested_forall_to_threads // transformation. This results in a if (linear_thread_id < 6) conditional. scf.forall (%arg2, %arg3) in (2, 3) { ... } {mapping = [#gpu.lane<linear_dim_0>, #gpu.lane<linear_dim_1>]} ``` --------- Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
2025-05-13[mlir][gpu] Pattern to promote `gpu.shuffle` to specialized AMDGPU ops (#137109)Ivan Butygin
Only swizzle promotion for now, may add DPP ops support later.
2025-04-18[mlir][gpu] Clean up prints in GPU dialect. NFC. (#136250)Jakub Kuderski
Clean up printing code by switching to `llvm::interleaved` from https://github.com/llvm/llvm-project/pull/135517. Also make some minor readability & performance fixes.
2025-03-20[mlir] Use *Set::insert_range (NFC) (#132326)Kazu Hirata
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2025-02-24[mlir][GPUToNVVM] Add `benefit` to `populate` functions (#128484)Matthias Springer
Certain GPU->NVVM patterns compete with Arith->LLVM patterns. (The ones that lower to libdevice.) Add an optional `benefit` parameter to all `populate` functions so that users can give preference to GPU->NVVM patterns.
2024-08-06[mlir] Support DialectRegistry extension comparison (#101119)Nikhil Kalra
`PassManager::run` loads the dependent dialects for each pass into the current context prior to invoking the individual passes. If the dependent dialect is already loaded into the context, this should be a no-op. However, if there are extensions registered in the `DialectRegistry`, the dependent dialects are unconditionally registered into the context. This poses a problem for dynamic pass pipelines, however, because they will likely be executing while the context is in an immutable state (because of the parent pass pipeline being run). To solve this, we'll update the extension registration API on `DialectRegistry` to require a type ID for each extension that is registered. Then, instead of unconditionally registered dialects into a context if extensions are present, we'll check against the extension type IDs already present in the context's internal `DialectRegistry`. The context will only be marked as dirty if there are net-new extension types present in the `DialectRegistry` populated by `PassManager::getDependentDialects`. Note: this PR removes the `addExtension` overload that utilizes `std::function` as the parameter. This is because `std::function` is copyable and potentially allocates memory for the contained function so we can't use the function pointer as the unique type ID for the extension. Downstream changes required: - Existing `DialectExtension` subclasses will need a type ID to be registered for each subclass. More details on how to register a type ID can be found here: https://github.com/llvm/llvm-project/blob/8b68e06731e0033ed3f8d6fe6292ae671611cfa1/mlir/include/mlir/Support/TypeID.h#L30 - Existing uses of the `std::function` overload of `addExtension` will need to be refactored into dedicated `DialectExtension` classes with associated type IDs. The attached `std::function` can either be inlined into or called directly from `DialectExtension::apply`. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-04-19Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)Christian Sigg
This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-01[mlir][NFC] Simplify type checks with isa predicates (#87183)Jakub Kuderski
For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.
2024-03-20[mlir] split transform interfaces into a separate library (#85221)Oleksandr "Alex" Zinenko
Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.
2024-03-10Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)Justin Lebar
For some reason this was missing from STLExtras.
2023-11-14[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269)long.chen
detail see the docment: https://mlir.llvm.org/deprecation/ Not all changes are made manually, most of them are made through a clang tool I wrote https://github.com/lipracer/cpp-refactor.
2023-11-10[mlir][gpu] Separate the barrier elimination code from transform ops (#71762)spaceotter
Allows the barrier elimination code to be run from C++ as well. The code from transforms dialect is copied as-is, the pass and populate functions have beed added at the end. Co-authored-by: Eric Eaton <eric@nod-labs.com>
2023-09-05[MLIR][NFC] Mark barrier elimination helper static (#65303)Lukas Sommer
Make local helper functions static to avoid symbol name collision.
2023-09-04[mlir][gpu][transform] Provide better error messages and avoid crashing in ↵Nicolas Vasilache
MapForallToBlocks. This revision addresses issues surfaced in https://reviews.llvm.org/D159093
2023-08-10[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TDNicolas Vasilache
This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617
2023-07-26[mlir][transforms] Revamp the implementation of mapping loops to GPUsNicolas Vasilache
This revision significantly simplifies the specification and implementation of mapping loops to GPU ids. Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes: 1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication may occur on a per-dimension 3-D sub-rectangle basis. 2. a n-D linearized mode, on which predication may only occur on a linear basis. In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages. The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference. Differential Revision: https://reviews.llvm.org/D155941
2023-07-25[mlir][GPU] Add op for unrolling contractions to a native sizeQuinn Dawkins
Adds `apply_patterns.gpu.unroll_vectors_subgroup_mma` which allows specifying a native MMA shape of `m`, `n`, and `k` to unroll to, greedily unrolling the inner most dimension of contractions and other vector operations based on expected usage. Differential Revision: https://reviews.llvm.org/D156079
2023-07-25[mlir][linalg] NFC - Move some utils in preparation for revamping mapping of ↵Nicolas Vasilache
scf.forall
2023-07-07[mlir] add a simple gpu barrier elimination mechanismAlex Zinenko
GPU code generation, and specifically the shared memory copy insertion may introduce spurious barriers guarding read-after-read dependencies or read-after-write on non-aliasing data, which degrades performance due to unnecessary synchronization. Add a pattern and transform op that removes such barriers by analyzing memory effects that the barrier actually guards that are not also guarded by other barriers. The code is adapted from the Polygeist incubator project. Co-authored-by: William Moses <gh@wsmoses.com> Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp> Reviewed By: nicolasvasilache, wsmoses Differential Revision: https://reviews.llvm.org/D154720
2023-06-22[mlir][affine] More efficient `makeComposedFolded...` helpersMatthias Springer
The old code used to materialize constants as ops, immediately folded them into the resulting affine map and then deleted the constant ops again. Instead, directly fold the attributes into the affine map. Furthermore, all helpers accept `OpFoldResult` instead of `Value` now. This makes the code at call sites more efficient, because it is no longer necessary to materialize a `Value`, just to be able to use these helper functions. Note: The API has changed (accepts OpFoldResult instead of Value), otherwise this change is NFC. Differential Revision: https://reviews.llvm.org/D153324
2023-06-20[mlir][transform] Add TransformRewriterMatthias Springer
All `apply` functions now have a `TransformRewriter &` parameter. This rewriter should be used to modify the IR. It has a `TrackingListener` attached and updates the internal handle-payload mappings based on rewrites. Implementations no longer need to create their own `TrackingListener` and `IRRewriter`. Error checking is integrated into `applyTransform`. Tracking listener errors are reported only for ops with the `ReportTrackingListenerFailuresOpTrait` trait attached, allowing for a gradual migration. Furthermore, errors can be silenced with an op attribute. Additional API will be added to `TransformRewriter` in subsequent revisions. This revision just adds an "empty" `TransformRewriter` class and updates all `apply` implementations. Differential Revision: https://reviews.llvm.org/D152427
2023-05-17[mlir] don't hardcode PDL_Operation in Transform dialect extensionsAlex Zinenko
Update operations in Transform dialect extensions defined in the Affine, GPU, MemRef and Tensor dialects to use the more generic `TransformHandleTypeInterface` type constraint instead of hardcoding `PDL_Operation`. See https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702 for motivation. Remove the dependency on PDLDialect from these extensions. Update tests to use `!transform.any_op` instead of `!pdl.operation`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D150781
2023-05-12[mlir] Move casting calls from methods to function callsTres Popp
The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\ -header-filter=mlir/ mlir/* -fix rm -rf $BUILD_DIR/tools/mlir/**/*.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123
2023-04-20[mlir] Prevent implicit downcasting to interfacesRahul Kayaith
Currently conversions to interfaces may happen implicitly (e.g. `Attribute -> TypedAttr`), failing a runtime assert if the interface isn't actually implemented. This change marks the `Interface(ValueT)` constructor as explicit so that a cast is required. Where it was straightforward to I adjusted code to not require casts, otherwise I just made them explicit. Depends on D148491, D148492 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D148493
2023-04-20[mlir][Affine][NFC] Wrap dialect in "affine" namespaceMatthias Springer
This cleanup aligns the affine dialect with all the other dialects. Differential Revision: https://reviews.llvm.org/D148687
2023-03-20Revert "[mlir][Linalg][Transform] Avoid FunctionalStyleTransformOpTrait ↵Nicolas Vasilache
where unnecesseary to improve usability" This reverts commit 31aa8ea252c0b6acdcb362c1d0f01cc4b810d6d0. This is currently not in a good state as we have some footguns due to missing listeners.
2023-03-20[mlir][Transform] Fix support for mapping to GPU warps and to linear idsNicolas Vasilache
c59465e1203dd78d06e15f7ddf62141807dbd5a7 introduced mapping to warps and linear GPU ids. In the implementation, the delinearization basis is reversed from [x, y, z] to [z, y x] order to properly compute the strides and allow delinearization. Prior to this commit, we forgot to reverse it back to [x, y, z] order before materializing the indices. Fix this oversight.
2023-03-20[mlir][Linalg][Transform] Avoid FunctionalStyleTransformOpTrait where ↵Nicolas Vasilache
unnecesseary to improve usability Differential Revision: https://reviews.llvm.org/D146305
2023-03-20[mlir][Transform] Add support for mapping to GPU warps and to linear idsNicolas Vasilache
This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified. `warp_dims` is currently specified along with `block_dims` as a transform attribute. Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads. An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme. Various implementation and transform op semantics cleanups are also applied. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D146130
2023-03-15Use *{Map,Set}::contains (NFC)Kazu Hirata
Differential Revision: https://reviews.llvm.org/D146104
2023-03-15[mlir][Transform] NFC - Refactor forall mapping to threads and blocks into ↵Nicolas Vasilache
one thing Differential Revision: https://reviews.llvm.org/D146095
2023-03-14[mlir][GPUTransforms] NFC - Refactor GPUTransforms.cpp in preparation for ↵Nicolas Vasilache
improvements. Depends on: D145977 Differential Revision: https://reviews.llvm.org/D145980
2023-03-14[mlir][Transform] NFC - Various API cleanups and use RewriterBase in lieu of ↵Nicolas Vasilache
PatternRewriter Depends on: D145685 Differential Revision: https://reviews.llvm.org/D145977
2023-03-13[mlir] Use llvm::is_contained (NFC)Kazu Hirata
2023-02-17[mlir][scf] Rename ForeachThreadOp->ForallOp, ↵Alexander Belyaev
PerformConcurrentlyOp->InParallelOp. Differential Revision: https://reviews.llvm.org/D144242
2023-02-17[mlir] Add loop bounds to scf.foreach_thread.Alexander Belyaev
https://discourse.llvm.org/t/rfc-parallel-loops-on-tensors-in-mlir/68332 Differential Revision: https://reviews.llvm.org/D144072
2023-02-17[mlir][gpu] NFC let user pick the threadID values when distributing ↵Thomas Raoux
foreach_thread Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D144219
2023-02-14[mlir][NFC] Remove unused variable 'indexType' in GPUTransformOps.cppJie Fu
/data/jiefu/llvm-project/mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp:430:13: error: unused variable 'indexType' [-Werror,-Wunused-variable] IndexType indexType = rewriter.getIndexType(); ^ 1 error generated.
2023-02-14[mlir][gpu] NFC change to pass threadID ops to ↵Thomas Raoux
rewriteOneForeachThreadToGpuThreads This allows user to give both the thread ids and dimension of the threads we want to distribute on. This means we can use it to distribute on warps as well. Reviewed By: harsh Differential Revision: https://reviews.llvm.org/D143950
2023-02-04[mlir][gpu] Allow distributing to different level of IDs without failingThomas Raoux
Change map_nested_foreach_to_threads to ignore foreach_thread not mapping to threads, this will allow us to call mapNestedForeachToThreadsImpl with different set of ids to lower multiple levels. Also adds warpIds attributes. Differential Revision: https://reviews.llvm.org/D143298
2023-01-19[mlir] simpler transform dialect silenceable failuresAlex Zinenko
Simplify the handling of silenceable failures in the transform dialect. Previously, the logic of `TransformEachOpTrait` required that `applyToEach` returned a list of null pointers when a silenceable failure was emitted. This was not done consistently and also crept into ops without this trait although they did not require it. Handle this case earlier in the interpreter and homogeneously associated preivously unset transform dialect values (both handles and parameters) with empty lists of the matching kind. Ignore the results of `applyToEach` for the targets for which it produced a silenceable failure. As a result, one never needs to set results to lists containing nulls. Furthermore, the objects associated with transform dialect values must never be null. Depends On D140980 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D141305
2023-01-17[mlir][gpu] Improve foreach_thread distributionThomas Raoux
Replace Ids with 0 when block dim is 1 when distributing foreach_thread. Differential Revision: https://reviews.llvm.org/D141718