<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)</title>
<updated>2025-11-22T22:03:14+00:00</updated>
<author>
<name>tyb0807</name>
<email>sontuan.vu119@gmail.com</email>
</author>
<published>2025-11-22T22:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=29d1e1857d445ca9a6e60c69fe2e1e5b30767e62'/>
<id>29d1e1857d445ca9a6e60c69fe2e1e5b30767e62</id>
<content type='text'>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Ignore wavefront barrier latency during scheduling DAG mutation (#168500)</title>
<updated>2025-11-19T08:49:14+00:00</updated>
<author>
<name>Carl Ritson</name>
<email>carl.ritson@amd.com</email>
</author>
<published>2025-11-19T08:49:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=711a2954799e597c71b86aed8c93167765a5255f'/>
<id>711a2954799e597c71b86aed8c93167765a5255f</id>
<content type='text'>
Do not add latency for wavefront and singlethread scope fences during
barrier latency DAG mutation.
These scopes do not typically introduce any latency and adjusting
schedules based on them significantly impacts latency hiding.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not add latency for wavefront and singlethread scope fences during
barrier latency DAG mutation.
These scopes do not typically introduce any latency and adjusting
schedules based on them significantly impacts latency hiding.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals (#165692)</title>
<updated>2025-11-17T04:38:40+00:00</updated>
<author>
<name>Chaitanya</name>
<email>Krishna.Sankisa@amd.com</email>
</author>
<published>2025-11-17T04:38:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=49d5bb0ad0cb31410184c462801c5049ad671517'/>
<id>49d5bb0ad0cb31410184c462801c5049ad671517</id>
<content type='text'>
This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by #114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See #161827 for discussion on this topic.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by #114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See #161827 for discussion on this topic.</pre>
</div>
</content>
</entry>
<entry>
<title>[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020)</title>
<updated>2025-11-02T00:12:33+00:00</updated>
<author>
<name>Jakub Kuderski</name>
<email>jakub@nod-labs.com</email>
</author>
<published>2025-11-02T00:12:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4c21d0cb14806fe1f5f42abd9d7e772013f625cb'/>
<id>4c21d0cb14806fe1f5f42abd9d7e772013f625cb</id>
<content type='text'>
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819)</title>
<updated>2025-10-30T07:02:32+00:00</updated>
<author>
<name>Pankaj Dwivedi</name>
<email>pankajkumar.divedi@amd.com</email>
</author>
<published>2025-10-30T07:02:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4d7093b80618e63af91a64c7a01a7c423b12841c'/>
<id>4d7093b80618e63af91a64c7a01a7c423b12841c</id>
<content type='text'>
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.

see the PR:https://github.com/llvm/llvm-project/pull/116953</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.

see the PR:https://github.com/llvm/llvm-project/pull/116953</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass  (#165265)</title>
<updated>2025-10-29T06:26:43+00:00</updated>
<author>
<name>Pankaj Dwivedi</name>
<email>pankajkumar.divedi@amd.com</email>
</author>
<published>2025-10-29T06:26:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=20532c0aab266b39cd6e1047f5985585e8c92551'/>
<id>20532c0aab266b39cd6e1047f5985585e8c92551</id>
<content type='text'>
There has been an issue(using function analysis inside the module pass
in OPM) integrating this pass into the LLC pipeline, which currently
lacks NPM support. I tried finding a way to get the per-function
analysis, but it seems that in OPM, we don't have that option.

So the best approach would be to make it a function pass.

Ref: https://github.com/llvm/llvm-project/pull/116953</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There has been an issue(using function analysis inside the module pass
in OPM) integrating this pass into the LLC pipeline, which currently
lacks NPM support. I tried finding a way to get the per-function
analysis, but it seems that in OPM, we don't have that option.

So the best approach would be to make it a function pass.

Ref: https://github.com/llvm/llvm-project/pull/116953</pre>
</div>
</content>
</entry>
<entry>
<title>[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC (#165121)</title>
<updated>2025-10-26T04:20:20+00:00</updated>
<author>
<name>Sam Clegg</name>
<email>sbc@chromium.org</email>
</author>
<published>2025-10-26T04:20:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=7ebc3dbe8bbf1f7a6ae5af531d02dcfe745d92ef'/>
<id>7ebc3dbe8bbf1f7a6ae5af531d02dcfe745d92ef</id>
<content type='text'>
- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.</pre>
</div>
</content>
</entry>
<entry>
<title>[Passes] Report error when pass requires target machine (#142550)</title>
<updated>2025-10-23T04:57:03+00:00</updated>
<author>
<name>paperchalice</name>
<email>liujunchang97@outlook.com</email>
</author>
<published>2025-10-23T04:57:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f3df058b03867e64af0195001d1e455257a81603'/>
<id>f3df058b03867e64af0195001d1e455257a81603</id>
<content type='text'>
Fixes #142146
Do nullptr check when pass accept `const TargetMachine &amp;` in
constructor, but it is still not exhaustive.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes #142146
Do nullptr check when pass accept `const TargetMachine &amp;` in
constructor, but it is still not exhaustive.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Add DAG mutation to improve scheduling before barriers (#142716)</title>
<updated>2025-10-21T04:28:52+00:00</updated>
<author>
<name>Carl Ritson</name>
<email>carl.ritson@amd.com</email>
</author>
<published>2025-10-21T04:28:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=af6fa77a351e64563ef8abe4af2ab65e6aa669fa'/>
<id>af6fa77a351e64563ef8abe4af2ab65e6aa669fa</id>
<content type='text'>
Add scheduler DAG mutation to add data dependencies between atomic
fences and preceding memory reads. This allows some modelling of the
impact an atomic fence can have on outstanding memory accesses.

This is beneficial when a fence would cause wait count insertion, as
more instructions will be scheduled before the fence hiding memory
latency.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add scheduler DAG mutation to add data dependencies between atomic
fences and preceding memory reads. This allows some modelling of the
impact an atomic fence can have on outstanding memory accesses.

This is beneficial when a fence would cause wait count insertion, as
more instructions will be scheduled before the fence hiding memory
latency.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (#116953)</title>
<updated>2025-10-09T07:14:56+00:00</updated>
<author>
<name>Pankaj Dwivedi</name>
<email>pankajkumar.divedi@amd.com</email>
</author>
<published>2025-10-09T07:14:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=53aad35208d00c8382b62b1d23005938aea77469'/>
<id>53aad35208d00c8382b62b1d23005938aea77469</id>
<content type='text'>
This pass introduces optimizations for AMDGPU intrinsics by leveraging
the uniformity of their arguments. When an intrinsic's arguments are
detected as uniform, redundant computations are eliminated, and the
intrinsic calls are simplified accordingly.

By utilizing the UniformityInfo analysis, this pass identifies cases
where intrinsic calls are uniform across all lanes, allowing
transformations that reduce unnecessary operations and improve the IR's
efficiency.

These changes enhance performance by streamlining intrinsic usage in
uniform scenarios without altering the program's semantics.

For background, see PR #99878</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This pass introduces optimizations for AMDGPU intrinsics by leveraging
the uniformity of their arguments. When an intrinsic's arguments are
detected as uniform, redundant computations are eliminated, and the
intrinsic calls are simplified accordingly.

By utilizing the UniformityInfo analysis, this pass identifies cases
where intrinsic calls are uniform across all lanes, allowing
transformations that reduce unnecessary operations and improve the IR's
efficiency.

These changes enhance performance by streamlining intrinsic usage in
uniform scenarios without altering the program's semantics.

For background, see PR #99878</pre>
</div>
</content>
</entry>
</feed>
