llvm-project.git/mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp, branch main

[mlir][gpu] Support outlining nested `gpu.launch` (#152696)

2025-08-13T03:42:52+00:00

This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes #149318.

[mlir][gpu] Update attribute definitions in `gpu::LaunchOp` (#152106)

2025-08-08T03:43:21+00:00

`gpu::LaunchOp` is updated the following way:
- Change the attribute type of kernel function and module from
`SymbolRefAttr` to `FlatSymbolRefAttr` to avoid nested symbol
references.
- Rename variables from camel case (kernelFunc, kernelModule) to lower
case (function, module) and update the syntax.
- `LaunchOp::build` support passing `module` and `function` attributes.

[mlir][NFC] update `mlir/Dialect` create APIs (16/n) (#149922)

2025-07-21T23:57:30+00:00

See https://github.com/llvm/llvm-project/pull/147168 for more info.

[mlir] Remove unused includes (NFC) (#147455)

2025-07-08T06:40:44+00:00

These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.

[mlir] Use range constructors of *Set (NFC) (#137563)

2025-04-28T00:52:41+00:00

[mlir] Use *Set::insert_range (NFC) (#132326)

2025-03-21T05:24:17+00:00

DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.

[MLIR][NFC] Retire let constructor for GPU (#129849)

2025-03-06T10:48:24+00:00

`let constructor` is legacy (do not use in tree!) since the table gen
backend emits most of the glue logic to build a pass.

[MLIR] Create GPU utils library & move distribution utils (#119264)

2024-12-13T09:26:57+00:00

Continue the move of `warp_execute_on_lane_0` op to the gpu dialect
(#116994). This patch creates a utils library in GPU and moves generic
helper functions there.

[mlir][gpu] Add optional attributes of kernelModule and kernelFunc for outlining kernels. (#118861)

2024-12-06T20:33:34+00:00

Adding optional attributes so we can specify the kernel function names
and the kernel module names generated.

[mlir][GPU] Improve handling of GPU bounds (#95166)

2024-06-18T04:47:38+00:00

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini