summaryrefslogtreecommitdiff
path: root/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
AgeCommit message (Collapse)Author
2025-09-18[mlir] Move vector.{to_elements,from_elements} unrolling to ↵Erick Ochoa Lopez
`VectorUnroll.cpp` (#159118) This PR moves the patterns that unroll vector.to_elements and vector.from_elements into the file with other vector unrolling operations. This PR also adds these unrolling patterns into the `populateVectorUnrollPatterns`. And renames `populateVectorToElementsLoweringPatterns` `populateVectorFromElementsLoweringPatterns` to `populateVectorToElementsUnrollPatterns` `populateVectorFromElementsUnrollPatterns`.
2025-08-21[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)Yang Bai
### Problem PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests: - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir The issue occurs because the new canonicalization pattern can generate multi-dimensional `vector.from_elements` operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM. ### Fix This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`: - `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass. - `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass. Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-07-25[mlir][NFC] update `mlir/lib` create APIs (35/n) (#150708)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-22[mlir][NFC] update `Conversion` create APIs (5/n) (#149887)Maksim Levental
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-04[mlir] Remove unused includes (NFC) (#147101)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-03-03Fix bug in gpu.memcpy lowering for dynamically shaped operands. (#128820)Arnab Dutta
Compute the number of elements to be copied by multiplying dim sizes along all the dimensions.
2025-01-21[mlir][spirv] Add GpuToLLVM cconv suited to Vulkan, migrate last tests (#123384)Andrea Faulds
This commit is a follow-up to 99a562b3cb17e89273ba0fe77129f2fb17a19381, which migrated some of the mlir-vulkan-runner tests to mlir-cpu-runner using a new pipeline and set of wrappers. That commit could not migrate all the tests, because the existing calling conventions/ABIs for kernel arguments generated by GPUToLLVMConversionPass were not a good fit for the Vulkan runtime. This commit fixes this and migrates the remaining tests. With this commit, mlir-vulkan-runner and many related components are now unused, and they will be removed in a later commit (see #73457). The old calling conventions require both the caller (host LLVM code) and callee (device code) to have compile-time knowledge of the precise argument types. This works for CUDA, ROCm and SYCL, where there is a C-like calling convention agreed between the host and device code, and the runtime passes through arguments as raw data without comprehension. For Vulkan, however, the interface declared by the shader/kernel is in a more abstract form, so the device code has indirect access to the argument data, and the runtime must process the arguments to set up and bind appropriately-sized buffer descriptors. This commit introduces a new calling convention option to meet the Vulkan runtime's needs. It lowers memref arguments to {void*, size_t} pairs, which can be trivially interpreted by the runtime without it needing to know the original argument types. Unlike the stopgap measure in the previous commit, this system can support memrefs of various ranks and element types, which unblocked migrating the remaining tests.
2025-01-16[mlir][spirv] Add mgpu* wrappers for Vulkan runtime, migrate some tests ↵Andrea Faulds
(#123114) This commit adds new wrappers around the MLIR Vulkan runtime which implement the mgpu* APIs (as generated by GPUToLLVMConversionPass), adds an optional LLVM lowering to the Vulkan runner mlir-opt pipeline based on GPUToLLVMConversionPass, and migrates several of the mlir-vulkan-runner tests to use mlir-cpu-runner instead, together with the new pipeline and wrappers. This is a further incremental step towards eliminating mlir-vulkan-runner and its associated pipeline, passes and wrappers (#73457). This commit does not migrate all of the tests to the new system, because changes to the mgpuLaunchKernel ABI will be necessary to support the tests that use multi-dimensional memref arguments.
2024-12-20[mlir] Fix a warningKazu Hirata
This patch fixes: mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp:535:13: error: 'applyPatternsAndFoldGreedily' is deprecated: Use applyPatternsGreedily() instead [-Werror,-Wdeprecated-declarations]
2024-12-17[mlir][Vector] Clean up `populateVectorToLLVMConversionPatterns` (#119975)Matthias Springer
Clean up `populateVectorToLLVMConversionPatterns` so that it populates only conversion patterns. All rewrite patterns that do not lower to LLVM should be populated into a separate greedy pattern rewrite. The current combination of rewrite patterns and conversion patterns triggered an edge case when merging the 1:1 and 1:N dialect conversions. Depends on #119973.
2024-11-24[mlir][LLVM] Add the `ConvertToLLVMAttrInterface` and ↵Fabian Mora
`ConvertToLLVMOpInterface` interfaces (#99566) This patch adds the `ConvertToLLVMAttrInterface` and `ConvertToLLVMOpInterface` interfaces. It also modifies the `convert-to-llvm` pass to use these interfaces when available. The `ConvertToLLVMAttrInterface` interfaces allows attributes to configure conversion to LLVM, including the conversion target, LLVM type converter, and populating conversion patterns. See the `NVVMTargetAttr` implementation of this interface for an example of how this interface can be used to configure conversion to LLVM. The `ConvertToLLVMOpInterface` interface collects all convert to LLVM attributes stored in an operation. Finally, the `convert-to-llvm` pass was modified to use these interfaces when available. This allows applying `convert-to-llvm` to GPU modules and letting the `NVVMTargetAttr` decide which patterns to populate.
2024-06-10[mlir][gpu] Update LaunchFuncOp lowering in GPU to LLVM (#94991)Fabian Mora
This patch updates the lowering of `LaunchFuncOp` in GPU to LLVM to only legalize the operation with the converted operands, effectively removing the lowering used by the old serialization pipeline. It also removes all remaining uses of the old gpu serialization infrastructure in `gpu-to-llvm`. See [Compilation overview | 'gpu' Dialect - MLIR docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for additional information on the target attributes compilation pipeline that replaced the old serialization pipeline.
2024-05-08[mlir] Use StringRef::operator== instead of StringRef::equals (NFC) (#91560)Kazu Hirata
I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 10 under mlir/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-04-19Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)Christian Sigg
This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-01-26[mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562)Andrei Golubev
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing conversion warnings on MSVC. Some recent examples of such are: ``` mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'size_t' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'unsigned int' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] ``` Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
2024-01-17[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)Matthias Springer
This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).
2023-12-20Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke ↵Paul C Fuqua
test included. (#75756)
2023-12-07Apply clang-tidy fixes for readability-identifier-naming in ↵Mehdi Amini
GPUToLLVMConversion.cpp (NFC)
2023-11-29[mlir] Adopt `ConvertToLLVMPatternInterface` GpuToLLVMConversionPass to ↵Mehdi Amini
align with `convert-to-llvm` (#73761) This is a follow-up to the introduction of `convert-to-llvm`: it is supposed to be a unifying pass through the `ConvertToLLVMPatternInterface`, but some specific conversion (like the GPU target) aren't vanilla LLVM target. Instead they need extra customizations that are specific to LLVM-on-GPUs and our custom runtime wrappers. This change make the GpuToLLVMConversionPass just as pluggable as the `convert-to-llvm` by using the same mechanism.
2023-11-27[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)Guray Ozen
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.
2023-11-20Apply clang-tidy fixes for llvm-else-after-return in GPUToLLVMConversion.cpp ↵Mehdi Amini
(NFC)
2023-10-31[MLIR][GPUCommon] Remove typed pointer support (#70735)Christian Ulmann
This commit removes the GPUCommon's lowering support for typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-10-25[MLIR] Modify lowering of gpu.alloc op to llvm (#69969)Nishant Patel
If gpu.alloc has no asyn deependency ( in case if gpu.alloc has hostShared allocation), create a new stream & synchronize. This PR is follow up to #66401
2023-09-27[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509)Aart Bik
This adds two cuSparse formats to the GPU dialect support. Together with proper lowering and runtime cuda support. Also fixes a few minor omissions.
2023-09-26[MLIR] Pass hostShared flag in gpu.alloc op to runtime wrappers (#66401)Nishant Patel
This PR is a breakdown of the big PR https://github.com/llvm/llvm-project/pull/65539 which enables intel gpu integration. In this PR we pass hostShared flag to runtime wrappers (required by SyclRuntimeWrappers which will come in subsequent PR) to indicate if the allocation is done on host shared gpu memory or device only memory.
2023-09-26[MLIR] Pass count of parameters & gpu binary size to runtime wrappers (#66154)Nishant Patel
This PR is a breakdown of the big PR #65539 which enables intel gpu integration. In this PR we pass count of parameters and size of gpu binary to runtime wrappers since the SyclRuntimeWrappers (which will come in subsequent PR) requires the spirv size for compilation and also the number of parameters to iterate over the params.
2023-09-25[mlir][llvm] Replace NullOp by ZeroOp (#67183)Tobias Gysi
This revision replaces the LLVM dialect NullOp by the recently introduced ZeroOp. The ZeroOp is more generic in the sense that it represents zero values of any LLVM type rather than null pointers only. This is a follow to https://github.com/llvm/llvm-project/pull/65508
2023-08-23[mlir] Apply ClangTidy fixes (NFC)Adrian Kuegel
Prefer to use .empty() instead of checking size().
2023-08-18[mlir][GPU][NFC] Remove type converter hackMatthias Springer
Remove `dangerousSetOptions` and call `promoteOperands` with the correct arguments directly. Differential Revision: https://reviews.llvm.org/D158175
2023-08-14[mlir][sparse][gpu] minor code cleanup for sparse gpu opsAart Bik
Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922
2023-08-14[mlir][Conversion] Store const type converter in ConversionPatternMatthias Springer
ConversionPatterns do not (and should not) modify the type converter that they are using. * Make `ConversionPattern::typeConverter` const. * Make member functions of the `LLVMTypeConverter` const. * Conversion patterns take a const type converter. * Various helper functions (that are called from patterns) now also take a const type converter. Differential Revision: https://reviews.llvm.org/D157601
2023-08-12[mlir][gpu] Add GPU target support to `gpu-to-llvm`.Fabian Mora
**For an explanation of these patches see D154153.** This patch modifies the lowering of `gpu.module` & `gpu.launch_func` in the `gpu-to-llvm` pass, allowing the usage of the new GPU compilation mechanism in the patch series ending in D154153. Instead of removing Modules, this patch preserves the module if it has target attributes so that the `gpu-module-to-binary` pass can later serialize them. Instead of lowering the kernel calls to the LLVM dialect, this patch primarily updates the operation's arguments, leaving the job of converting the operation into LLVM instructions to the translation stage. The reason for not lowering the operation to LLVM at this stage is that kernel launches do not have a single one-to-one representation in LLVM. For example, a kernel launch can be represented by a call to a kernel stub, like in CUDA or HIP. Kernel launches are also intrinsically linked to the binary associated with the call, and the binaries are converted during translation. Depends on D154149 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154152
2023-08-10[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugsAart Bik
Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645
2023-08-09[mlir][sparse][gpu] only support default algorithm for SpGEMMAart Bik
Rationale: This is the approach taken for all the others too (SpMV, SpMM, SDDMM), so it is more consistent to follow the same path (until we have a need for more algorithms). Also, in a follow up revision, this will allow us to remove some unused GEMM ops. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157542
2023-08-08[mlir][sparse][gpu] reduce boilerplate class declarationsAart Bik
Macro is used to avoid repeating same pattern many times. Also fixed the ordering of ops to be consistent. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157419
2023-08-08[mlir][sparse][gpu] add spgemm operatorKun Wu
Differential Revision: https://reviews.llvm.org/D152981
2023-08-04Revert "Foo"Alex Zinenko
This reverts commit 3c9aa10c57cf0833ff108ecf9ffbb512bd96cc89. No proper description of the commit.
2023-08-04FooNicolas Vasilache
2023-08-02[mlir][LLVM] NFC - Remove createIndexConstant methodNicolas Vasilache
This revision removes the createIndexConstant method, which implicitly creates constants of the getIndexType type and updates all uses to the more explicit createIndexAttrConstant which requires an explicit Type parameter. This is an NFC step towards entangling index type conversion in LLVM lowering. The selection of which index type to use requires finer granularity than the existing implementations which all rely on pass level flags and end up in mismatches, especially on GPUs with multiple address spaces of different capacities. This revision also includes an NFC fix to MemRefToLLVM.cpp that prevents a crash in cases where an integer memory space cannot be derived for a MemRef. Differential Revision: https://reviews.llvm.org/D156854
2023-08-01[mlir][sparse][gpu] add 2:4 spmm prune_and_check flagKun Wu
Differential Revision: https://reviews.llvm.org/D155909
2023-07-21[mlir][nvgpu] Add `tma.create.descriptor` to create tensor map descriptorGuray Ozen
The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension. The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html Depends on D155453 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155680
2023-07-13[mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLtAart Bik
Also makes some minor consistency edits in the cuSparseLt wrapper lib. Reviewed By: Peiming, K-Wu Differential Revision: https://reviews.llvm.org/D155139
2023-06-30[mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetimeKun Wu
Differential Revision: https://reviews.llvm.org/D153173
2023-06-19[mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output typeKun Wu
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D153188
2023-06-17[MLIR] Add support for bare pointer calling convention in gpu-to-llvmUday Bondhugula
Add support for the bare pointer calling convention in the gpu-to-llvm pass. This wasn't being exposed and is needed when GPU-compiled MLIR is to be called with this convention. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152477
2023-06-12[mlir][sparse][gpu]fix various cusparseLt bugsKun Wu
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152489
2023-06-09[mlir][sparse][gpu] unify dnmat and dnvec handle and opsKun Wu
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152465
2023-06-08[MLIR][GPU] Add 16-bit version of cudaMemset in cudaRuntimeWrappersNavdeep Katel
Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D151642
2023-06-06[mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity supportKun Wu
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151775
2023-06-06[mlir][sparse][gpu] add AoS COO support to cuSPARSEAart Bik
Even though this feature was deprecated in release 11.2, any library before this version still supports the feature, which is why we are making it available under a macro. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D152290