llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp, branch main

[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)

2025-11-22T22:03:14+00:00

- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.

[AMDGPU] Ignore wavefront barrier latency during scheduling DAG mutation (#168500)

2025-11-19T08:49:14+00:00

Do not add latency for wavefront and singlethread scope fences during
barrier latency DAG mutation.
These scopes do not typically introduce any latency and adjusting
schedules based on them significantly impacts latency hiding.

[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals (#165692)

2025-11-17T04:38:40+00:00

This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by #114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See #161827 for discussion on this topic.

[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020)

2025-11-02T00:12:33+00:00

Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.

[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819)

2025-10-30T07:02:32+00:00

This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.

see the PR:https://github.com/llvm/llvm-project/pull/116953

[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265)

2025-10-29T06:26:43+00:00

There has been an issue(using function analysis inside the module pass
in OPM) integrating this pass into the LLC pipeline, which currently
lacks NPM support. I tried finding a way to get the per-function
analysis, but it seems that in OPM, we don't have that option.

So the best approach would be to make it a function pass.

Ref: https://github.com/llvm/llvm-project/pull/116953

[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC (#165121)

2025-10-26T04:20:20+00:00

- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.

[Passes] Report error when pass requires target machine (#142550)

2025-10-23T04:57:03+00:00

Fixes #142146
Do nullptr check when pass accept `const TargetMachine &` in
constructor, but it is still not exhaustive.

[AMDGPU] Add DAG mutation to improve scheduling before barriers (#142716)

2025-10-21T04:28:52+00:00

Add scheduler DAG mutation to add data dependencies between atomic
fences and preceding memory reads. This allows some modelling of the
impact an atomic fence can have on outstanding memory accesses.

This is beneficial when a fence would cause wait count insertion, as
more instructions will be scheduled before the fence hiding memory
latency.

[AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (#116953)

2025-10-09T07:14:56+00:00

This pass introduces optimizations for AMDGPU intrinsics by leveraging
the uniformity of their arguments. When an intrinsic's arguments are
detected as uniform, redundant computations are eliminated, and the
intrinsic calls are simplified accordingly.

By utilizing the UniformityInfo analysis, this pass identifies cases
where intrinsic calls are uniform across all lanes, allowing
transformations that reduce unnecessary operations and improve the IR's
efficiency.

These changes enhance performance by streamlining intrinsic usage in
uniform scenarios without altering the program's semantics.

For background, see PR #99878