<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/mlir/test/Conversion/SCFToGPU, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[MLIR][SCFToGPU] Guard operands before AffineApplyOp::create to avoid crash (#167959)</title>
<updated>2025-11-20T14:20:34+00:00</updated>
<author>
<name>Shashi Shankar</name>
<email>shashishankar1687@gmail.com</email>
</author>
<published>2025-11-20T14:20:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5d0bfd1bf8ac6b1ceb37c7f30058d0f62e636036'/>
<id>5d0bfd1bf8ac6b1ceb37c7f30058d0f62e636036</id>
<content type='text'>
This fixes a crash in SCF→GPU when building the per‑dim index for mapped
scf.parallel.

**Change**:
- Map step/lb through cloningMap, then run ensureLaunchIndependent.
- If either is still unavailable at launch scope, emit a match‑failure;
otherwise build the affine.apply.

**Why this is correct:**
- Matches how the pass already handles launch bounds; avoids creating an
op with invalid operands and replaces a segfault with a clear
diagnostic.

**Tests**:
- Added two small regressions that lower to gpu.launch and exercise the
affine.apply path.

Fixes :  #167654

Signed-off-by: Shashi Shankar &lt;shashishankar1687@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes a crash in SCF→GPU when building the per‑dim index for mapped
scf.parallel.

**Change**:
- Map step/lb through cloningMap, then run ensureLaunchIndependent.
- If either is still unavailable at launch scope, emit a match‑failure;
otherwise build the affine.apply.

**Why this is correct:**
- Matches how the pass already handles launch bounds; avoids creating an
op with invalid operands and replaces a segfault with a clear
diagnostic.

**Tests**:
- Added two small regressions that lower to gpu.launch and exercise the
affine.apply path.

Fixes :  #167654

Signed-off-by: Shashi Shankar &lt;shashishankar1687@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][gpu] Loose the condition to convert scf.parallel to gpu.launch (#164978)</title>
<updated>2025-10-30T13:43:49+00:00</updated>
<author>
<name>Hsiangkai Wang</name>
<email>hsiangkai.wang@arm.com</email>
</author>
<published>2025-10-30T13:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9d5c35408e7a38b3062667bbebb3c0953fa2fae4'/>
<id>9d5c35408e7a38b3062667bbebb3c0953fa2fae4</id>
<content type='text'>
Use LocalAliasAnalysis to improve handling of side effects in nested
scf.parallel. If the written memory outside nested scf.parallel is not
alias to the memory accessed inside the nested loop, we can convert it
to gpu.launch.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use LocalAliasAnalysis to improve handling of side effects in nested
scf.parallel. If the written memory outside nested scf.parallel is not
alias to the memory accessed inside the nested loop, we can convert it
to gpu.launch.</pre>
</div>
</content>
</entry>
<entry>
<title>[SCFToGPU] Convert scf.parallel+scf.reduce to gpu.all_reduce (#122782)</title>
<updated>2025-01-23T12:47:36+00:00</updated>
<author>
<name>Tuomas Kärnä</name>
<email>tuomas.karna@intel.com</email>
</author>
<published>2025-01-23T12:47:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0e944a30954e666cba2bf17497fafe835e4b3519'/>
<id>0e944a30954e666cba2bf17497fafe835e4b3519</id>
<content type='text'>
Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op
combination is now converted to a `gpu.all_reduce` op.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op
combination is now converted to a `gpu.all_reduce` op.</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][gpu] Add Support for  Cluster of Thread Blocks in `gpu.launch` (#76924)</title>
<updated>2024-01-06T10:17:01+00:00</updated>
<author>
<name>Guray Ozen</name>
<email>guray.ozen@gmail.com</email>
</author>
<published>2024-01-06T10:17:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5b33cff39753c790ecc6847435664592abe40415'/>
<id>5b33cff39753c790ecc6847435664592abe40415</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314)</title>
<updated>2023-12-20T02:06:27+00:00</updated>
<author>
<name>Matthias Springer</name>
<email>me@m-sp.org</email>
</author>
<published>2023-12-20T02:06:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=10056c821a56a19cef732129e4e0c5883ae1ee49'/>
<id>10056c821a56a19cef732129e4e0c5883ae1ee49</id>
<content type='text'>
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -&gt; f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref&lt;100xf32&gt;
  %elem_to_reduce2 = load %buffer2[%iv] : memref&lt;100xf32&gt;
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -&gt; f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref&lt;100xf32&gt;
  %elem_to_reduce2 = load %buffer2[%iv] : memref&lt;100xf32&gt;
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir] fix a crash when lower parallel loop to gpu (#75811) (#75946)</title>
<updated>2023-12-20T01:13:15+00:00</updated>
<author>
<name>long.chen</name>
<email>lipracer@gmail.com</email>
</author>
<published>2023-12-20T01:13:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=227bfa1fb14ac6023499b4740401e5e980bfd426'/>
<id>227bfa1fb14ac6023499b4740401e5e980bfd426</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (#73328)</title>
<updated>2023-11-27T08:05:54+00:00</updated>
<author>
<name>Hsiangkai Wang</name>
<email>hsiangkai.wang@arm.com</email>
</author>
<published>2023-11-27T08:05:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=477c0b67a3ab30e74f3563b3f0b9d4d53caba465'/>
<id>477c0b67a3ab30e74f3563b3f0b9d4d53caba465</id>
<content type='text'>
When converting affine.for to GPU launch operator, we have to calculate
the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero
as the division result. However, the block dimension and thread
dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim).
So, we will get the wrong result if we use DivSIOp. In this patch, we
replace it with CeilDivSIOp to get the correct block and thread
dimension values.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When converting affine.for to GPU launch operator, we have to calculate
the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero
as the division result. However, the block dimension and thread
dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim).
So, we will get the wrong result if we use DivSIOp. In this patch, we
replace it with CeilDivSIOp to get the correct block and thread
dimension values.</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][Pass] Include anchor op in -pass-pipeline</title>
<updated>2022-11-03T15:36:12+00:00</updated>
<author>
<name>rkayaith</name>
<email>rkayaith@gmail.com</email>
</author>
<published>2022-10-18T18:44:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=13bd41096286305ee603428f6adf161f52981827'/>
<id>13bd41096286305ee603428f6adf161f52981827</id>
<content type='text'>
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir] use strided layout in structured codegen-related tests</title>
<updated>2022-09-17T06:11:28+00:00</updated>
<author>
<name>Alex Zinenko</name>
<email>zinenko@google.com</email>
</author>
<published>2022-09-16T13:36:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f3fae035c7a16e1f4c7d96b115212714375a3d38'/>
<id>f3fae035c7a16e1f4c7d96b115212714375a3d38</id>
<content type='text'>
All relevant operations have been switched to primarily use the strided
layout, but still support the affine map layout. Update the relevant
tests to use the strided format instead for compatibility with how ops
now print by default.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134045
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All relevant operations have been switched to primarily use the strided
layout, but still support the affine map layout. Update the relevant
tests to use the strided format instead for compatibility with how ops
now print by default.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134045
</pre>
</div>
</content>
</entry>
<entry>
<title>[MLIR][GPU] Detect bounds with `arith.minsi ` in loops-to-gpu</title>
<updated>2022-08-22T09:14:04+00:00</updated>
<author>
<name>Christian Sigg</name>
<email>csigg@google.com</email>
</author>
<published>2022-08-22T08:39:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=459fd3fb342d565bbaff48673838c5ea138128f8'/>
<id>459fd3fb342d565bbaff48673838c5ea138128f8</id>
<content type='text'>
Previously, `arith.constant`, `arith.muli` and `affine.min` were supported when deriving upper loop bounds when converting parallel loops to GPU.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D132354
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, `arith.constant`, `arith.muli` and `affine.min` were supported when deriving upper loop bounds when converting parallel loops to GPU.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D132354
</pre>
</div>
</content>
</entry>
</feed>
