llvm-project.git/mlir/test/Conversion/SCFToGPU, branch main

[MLIR][SCFToGPU] Guard operands before AffineApplyOp::create to avoid crash (#167959)

2025-11-20T14:20:34+00:00

This fixes a crash in SCF→GPU when building the per‑dim index for mapped
scf.parallel.

**Change**:
- Map step/lb through cloningMap, then run ensureLaunchIndependent.
- If either is still unavailable at launch scope, emit a match‑failure;
otherwise build the affine.apply.

**Why this is correct:**
- Matches how the pass already handles launch bounds; avoids creating an
op with invalid operands and replaces a segfault with a clear
diagnostic.

**Tests**:
- Added two small regressions that lower to gpu.launch and exercise the
affine.apply path.

Fixes :  #167654

Signed-off-by: Shashi Shankar

[mlir][gpu] Loose the condition to convert scf.parallel to gpu.launch (#164978)

2025-10-30T13:43:49+00:00

Use LocalAliasAnalysis to improve handling of side effects in nested
scf.parallel. If the written memory outside nested scf.parallel is not
alias to the memory accessed inside the nested loop, we can convert it
to gpu.launch.

[SCFToGPU] Convert scf.parallel+scf.reduce to gpu.all_reduce (#122782)

2025-01-23T12:47:36+00:00

Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op
combination is now converted to a `gpu.all_reduce` op.

[mlir][gpu] Add Support for Cluster of Thread Blocks in `gpu.launch` (#76924)

2024-01-06T10:17:01+00:00

[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314)

2023-12-20T02:06:27+00:00

This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)

[mlir] fix a crash when lower parallel loop to gpu (#75811) (#75946)

2023-12-20T01:13:15+00:00

[mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (#73328)

2023-11-27T08:05:54+00:00

When converting affine.for to GPU launch operator, we have to calculate
the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero
as the division result. However, the block dimension and thread
dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim).
So, we will get the wrong result if we use DivSIOp. In this patch, we
replace it with CeilDivSIOp to get the correct block and thread
dimension values.

[mlir][Pass] Include anchor op in -pass-pipeline

2022-11-03T15:36:12+00:00

In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900

[mlir] use strided layout in structured codegen-related tests

2022-09-17T06:11:28+00:00

All relevant operations have been switched to primarily use the strided
layout, but still support the affine map layout. Update the relevant
tests to use the strided format instead for compatibility with how ops
now print by default.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134045

[MLIR][GPU] Detect bounds with `arith.minsi ` in loops-to-gpu

2022-08-22T09:14:04+00:00

Previously, `arith.constant`, `arith.muli` and `affine.min` were supported when deriving upper loop bounds when converting parallel loops to GPU.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D132354