summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
AgeCommit message (Collapse)Author
2025-11-22[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)tyb0807
- Support serialization of the number of allocated preload kernarg SGPRs - Support serialization of the first preload kernarg SGPR allocated Together they enable reconstructing correctly MIR with preload kernarg SGPRs.
2025-11-19[AMDGPU] Ignore wavefront barrier latency during scheduling DAG mutation ↵Carl Ritson
(#168500) Do not add latency for wavefront and singlethread scope fences during barrier latency DAG mutation. These scopes do not typically introduce any latency and adjusting schedules based on them significantly impacts latency hiding.
2025-11-17[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals ↵Chaitanya
(#165692) This PR introduces `amdgpu-lower-exec-sync` pass which specifically lowers named-barrier LDS globals introduced by #114550 . Changes include: - Moving the logic of lowering named-barrier LDS globals from `amdgpu-lower-module-lds` pass to this new pass. - This PR adds the pass to pipeline, remove the existing lowering logic for named-barrier LDS in `amdgpu-lower-module-lds` See #161827 for discussion on this topic.
2025-11-02[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020)Jakub Kuderski
Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-10-30[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819)Pankaj Dwivedi
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline. Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag to enable/disable the pass. see the PR:https://github.com/llvm/llvm-project/pull/116953
2025-10-29[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265)Pankaj Dwivedi
There has been an issue(using function analysis inside the module pass in OPM) integrating this pass into the LLC pipeline, which currently lacks NPM support. I tried finding a way to get the per-function analysis, but it seems that in OPM, we don't have that option. So the best approach would be to make it a function pass. Ref: https://github.com/llvm/llvm-project/pull/116953
2025-10-25[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC ↵Sam Clegg
(#165121) - On targets that don't require the Triple, don't pass it. - Use `.value_or` to where possible.
2025-10-23[Passes] Report error when pass requires target machine (#142550)paperchalice
Fixes #142146 Do nullptr check when pass accept `const TargetMachine &` in constructor, but it is still not exhaustive.
2025-10-21[AMDGPU] Add DAG mutation to improve scheduling before barriers (#142716)Carl Ritson
Add scheduler DAG mutation to add data dependencies between atomic fences and preceding memory reads. This allows some modelling of the impact an atomic fence can have on outstanding memory accesses. This is beneficial when a fence would cause wait count insertion, as more instructions will be scheduled before the fence hiding memory latency.
2025-10-09[AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine ↵Pankaj Dwivedi
uniform AMDGPU lane Intrinsics. (#116953) This pass introduces optimizations for AMDGPU intrinsics by leveraging the uniformity of their arguments. When an intrinsic's arguments are detected as uniform, redundant computations are eliminated, and the intrinsic calls are simplified accordingly. By utilizing the UniformityInfo analysis, this pass identifies cases where intrinsic calls are uniform across all lanes, allowing transformations that reduce unnecessary operations and improve the IR's efficiency. These changes enhance performance by streamlining intrinsic usage in uniform scenarios without altering the program's semantics. For background, see PR #99878
2025-10-08[AMDGPU] Add the missing enabling check of AMDGPUAttributor (#162420)Shilei Tian
2025-10-08AMDGPU: skip AMDGPUAttributor pass on R600 some more. (#162418)James Y Knight
This is a follow-up for #162207, where I neglected to skip the second use of AMDGPUAttributor for R600 targets. This use is covered by the test lld/test/ELF/lto/r600.ll.
2025-10-07AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. ↵James Y Knight
(#162207) These passes call `getSubtarget<GCNSubtarget>`, which doesn't work on R600 targets, as that uses an `R600Subtarget` type, instead. Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked static_cast to `ST&`, which makes it easy for this error to go undetected. The modifications here were verified by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream. These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.
2025-10-01[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove ↵Gang Chen
the fixme (#161531) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.
2025-09-11[llvm] Move data layout string computation to TargetParser (#157612)Reid Kleckner
Clang and other frontends generally need the LLVM data layout string in order to generate LLVM IR modules for LLVM. MLIR clients often need it as well, since MLIR users often lower to LLVM IR. Before this change, the LLVM datalayout string was computed in the LLVM${TGT}CodeGen library in the relevant TargetMachine subclass. However, none of the logic for computing the data layout string requires any details of code generation. Clients who want to avoid duplicating this information were forced to link in LLVMCodeGen and all registered targets, leading to bloated binaries. This happened in PR #145899, which measurably increased binary size for some of our users. By moving this information to the TargetParser library, we can delete the duplicate datalayout strings in Clang, and retain the ability to generate IR for unregistered targets. This is intended to be a very mechanical LLVM-only change, but there is an immediately obvious follow-up to clang, which will be prepared separately. The vast majority of data layouts are computable with two inputs: the triple and the "ABI name". There is only one exception, NVPTX, which has a cl::opt to enable short device pointers. I invented a "shortptr" ABI name to pass this option through the target independent interface. Everything else fits. Mips is a bit awkward because it uses a special MipsABIInfo abstraction, which includes members with codegen-like concepts like ABI physical registers that can't live in TargetParser. I think the string logic of looking for "n32" "n64" etc is reasonable to duplicate. We have plenty of other minor duplication to preserve layering. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-05[AMDGPU] Register amdgpu-lower-vgpr-encoding pass in npm (#156971)Stanislav Mekhanoshin
2025-09-04[AMDGPU] High VGPR lowering on gfx1250 (#156965)Stanislav Mekhanoshin
2025-08-28AMDGPU: Refactor lowering of s_barrier to split barriers (#154648)Nicolai Hähnle
Let's do the lowering of non-split into split barriers in a new IR pass, AMDGPULowerIntrinsics. That way, there is no code duplication between SelectionDAG and GlobalISel. This simplifies some upcoming extensions to the code.
2025-08-22[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)Ivan Kosarev
Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.
2025-08-14[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548)Shoreshen
Fixes #153150
2025-07-30Reapply "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#151098)Vikram Hegde
Reapplies https://github.com/llvm/llvm-project/pull/148114 includes shared lib build failure fixes for AMDGPU and X86.
2025-07-28[HIPSTDPAR] Add handling for math builtins (#140158)Alex Voicu
When compiling in `--hipstdpar` mode, the builtins corresponding to the standard library might end up in code that is expected to execute on the accelerator (e.g. by using the `std::` prefixed functions from `<cmath>`). We do not have uniform handling for this in AMDGPU, and the errors that obtain are quite arcane. Furthermore, the user-space changes required to work around this tend to be rather intrusive. This patch adds an additional `--hipstdpar` specific pass which forwards to the run time component of HIPSTDPAR the intrinsics / libcalls which result from the use of the math builtins, and which are not properly handled. In the long run we will want to stop relying on this and handle things in the compiler, but it is going to be a rather lengthy journey, which makes this medium term escape hatch necessary. The paired change in the run time component is here <https://github.com/ROCm/rocThrust/pull/551>.
2025-07-28Revert "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#150883)Vikram Hegde
Reverts llvm/llvm-project#148114 will update with fixed PR.
2025-07-28[CodeGen][NPM] Stitch up loop passes in codegen pipeline (#148114)Vikram Hegde
same as https://github.com/llvm/llvm-project/pull/133050 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-18AMDGPU: Add pass to replace constant materialize with AV pseudos (#149292)Matt Arsenault
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate, replace it with a pseudo which writes to the combined AV_* class. This relaxes the operand constraints, which will allow the allocator to inflate the register class to AV_* to potentially avoid spilling. The allocator does not know how to replace an instruction to enable the change of register class. I originally tried to do this by changing all of the places we introduce v_mov_b32 with immediate, but it's along tail of niche cases that require manual updating. Plus we can restrict this to only run on functions where we know we will be allocating AGPRs.
2025-07-17[AMDGPU][NPM] Fill in addPreSched2 passes (#148112)Vikram Hegde
same as https://github.com/llvm/llvm-project/pull/139516 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-10[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959)Vikram Hegde
2025-07-09[CodeGen][NPM] Differentiate pipeline-required and opt-required passes (#135752)Akshat Oke
"Required" passes relate to actually running the pass on the IR, regardless of whether they are in the pipeline. CGPassBuilder was mistakenly still adding them to the pipeline. The test `llc -stop-after=greedy -enable-new-pm` would still add `greedy` to the pipeline otherwise.
2025-07-09[AMDGPU][NPM] Complete optimized regalloc pipeline (#138491)Akshat Oke
Also fill in some other passes.
2025-07-09[CodeGen][NPM] Support CodeGenSCCOrder in pipeline (#136818)Akshat Oke
Wrap passes into Post order CGSCC pass manager in codegen pass builder. I am adding the pipeline test in this but it is not yet complete.
2025-06-27AMDGPU: Introduce a pass to replace VGPR MFMAs with AGPR (#145024)Matt Arsenault
In gfx90a-gfx950, it's possible to emit MFMAs which use AGPRs or VGPRs for vdst and src2. We do not want to do use the AGPR form, unless required by register pressure as it requires cross bank register copies from most other instructions. Currently we select the AGPR or VGPR version depending on a crude heuristic for whether it's possible AGPRs will be required. We really need the register allocation to be complete to make a good decision, which is what this pass is for. This adds the pass, but does not yet remove the selection patterns for AGPRs. This is a WIP, and NFC-ish. It should be a no-op on any currently selected code. It also does not yet trigger on the real examples of interest, which require handling batches of MFMAs at once.
2025-06-23AMDGPU: Remove legacy pass manager version of AMDGPUAttributor (#145262)Matt Arsenault
2025-06-22AMDGPU: Use reportFatalUsageError for regalloc flag error (#145198)Matt Arsenault
2025-06-21AMDGPU: Really delete AMDGPUAnnotateKernelFeatures (#145136)Matt Arsenault
2025-06-20AMDGPU: Remove legacy pass manager version of AMDGPUUnifyMetadata (#144985)Matt Arsenault
This is only run in the new pass manager now.
2025-06-20AMDGPU: Remove legacy PM version of AMDGPUPromoteAllocaToVector (#144986)Matt Arsenault
This is only run in the middle end with the new pass manager now, so garbage collect the old PM version.
2025-06-17[llvm] annotate interfaces in llvm/Target for DLL export (#143615)Andrew Rogers
## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-06-05[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)Diana Picus
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.
2025-06-03[MISched] Add templates for creating custom schedulers (#141935)Pengcheng Wang
We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.
2025-05-30Reapply "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer ↵Shilei Tian
kernel arguments (#137488)"" This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.
2025-05-30Revert "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer ↵Shilei Tian
kernel arguments (#137488)"" This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.
2025-05-30Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel ↵Shilei Tian
arguments (#137488)" This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.
2025-05-30Revert "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel ↵Shilei Tian
arguments (#137488)" This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.
2025-05-30[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments ↵Shilei Tian
(#137488)
2025-05-29[AMDGPU] Move InferAddressSpacesPass to middle end optimization pipeline ↵Shilei Tian
(#138604) It will run twice in the non-LTO pipeline with `O1` or higher. In LTO post link pipeline, it will be run once with `O2` or higher, since inline and SROA don't run in `O1`.
2025-05-26[AMDGPU] Cluster export instructions in PostRA Scheduler (#141399)Carl Ritson
DAG mutation needs to be applied post-RA to maintain order established during pre-RA scheduler.
2025-05-19[AMDGPU] Set AS8 address width to 48 bitsAlexander Richardson
Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-11[AMDGPU] Move kernarg preload logic to separate pass (#130434)Austin Kerbow
Moves kernarg preload logic to its own module pass. Cloned function declarations are removed when preloading hidden arguments. The inreg attribute is now added in this pass instead of AMDGPUAttributor. The rest of the logic is copied from AMDGPULowerKernelArguments which now only check whether an arguments is marked inreg to avoid replacing direct uses of preloaded arguments. This change requires test updates to remove inreg from lit tests with kernels that don't actually want preloading.
2025-05-06Register assembly printer passes (#138348)Matthias Braun
Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.
2025-05-02[AMDGPU][Attributor] Add `ThinOrFullLTOPhase` as an argument (#123994)Shilei Tian