summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
AgeCommit message (Collapse)Author
2025-11-05AMDGPU: Do not infer implicit inputs for !nocallback intrinsicsMatt Arsenault
(#131759) This isn't really the right check, we want to know that the intrinsic does not perform a true function call to any code (in the module or not). nocallback appears to be the closest thing to this property we have now though. Fixes theoretically miscompiles with intrinsics like statepoint, which hide a call to a real function. Also do the same for inferring no-agpr usage.
2025-11-02[llvm] Remove "const" in the presence of "constexpr" (NFC) (#166109)Kazu Hirata
"const" is extraneous in the presence of "constexpr" for simple variables and arrays.
2025-10-08AMDGPU: Render non-0 values for amdgpu-agpr-allocMatt Arsenault
(#162300) This now tries to compute a lower bound on the number of registers for individual inline asm uses. Also starts using AACallEdges to handling indirect calls.
2025-10-08AMDGPU: Fix parsing wrong operand format for read_register/write_register ↵Matt Arsenault
(#162414) Apparently the IR verifier doesn't enforce the correct structure. Also I do not know why we have this extra level of wrapper in the intrinsic, it just makes it harder to get at the string. I also do not know why kokkos is using these intrinsics, but it shouldn't.
2025-10-08AMDGPU: Account for read/write register intrinsics for AGPR usage (#161988)Matt Arsenault
Fix the special case intrinsics that can directly reference a physical register. There's no reason to use this.
2025-10-08AMDGPU: Figure out required AGPR count for inline asm (#150910)Matt Arsenault
For now just try to compute the minimum number of AGPRs required to allocate the asm. Leave the attributor changes to turn this into an integer value for later.
2025-10-08AMDGPU: Stop inferring amdgpu-agpr-alloc on irrelevant targets (#161957)Matt Arsenault
This only matters for subtargets with configurable AGPR allocation.
2025-10-04AMDGPU: Fix using IRAttribute with nounwind for AMDGPUNoAGPR (#161954)Matt Arsenault
Don't think this did anything harmful, but it doesn't make sense to report this as implementing nounwind handling.
2025-10-03[AMDGPU][Attributor] Stop inferring amdgpu-no-flat-scratch-init in sanitized ↵Chaitanya
functions. (#161319) This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init` for functions marked with `sanitize_*` attribute.
2025-09-15[AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (#158076)Shilei Tian
2025-08-27[AMDGPU][Attributor] Remove final update of waves-per-eu after the ↵Shilei Tian
attributor run (#155246) We do not need this in the attributor, because `ST.getWavesPerEU` accounts for both the waves-per-eu and flat-workgroup-size attributes. If the waves-per-eu values are not valid, it drops them. In the attributor, we only need to propagate the values without using intermediate flat workgroup size values. Fixes SWDEV-550257.
2025-08-18Revert "[AMDGPU][Attributor] Infer inreg attribute in `AMDGPUAttributor` ↵Shilei Tian
(#146720)" This reverts commit 84ab301554f8b8b16b94263a57b091b07e9204f2 because it breaks several AMDGPU test bots.
2025-08-18[AMDGPU][Attributor] Infer inreg attribute in `AMDGPUAttributor` (#146720)Shilei Tian
This patch introduces `AAAMDGPUUniformArgument` that can infer `inreg` function argument attribute. The idea is, for a function argument, if the corresponding call site arguments are always uniform, we can mark it as `inreg` thus pass it via SGPR. In addition, this AA is also able to propagate the inreg attribute if feasible.
2025-07-28AMDGPU: Remove unused TargetPassConfig include from attributor (#150892)Matt Arsenault
2025-07-24[AMDGPU] Remove AAInstanceInfo from the AMDGPUAttributor (#150232)Juan Manuel Martinez Caamaño
Related to compile-time issue SWDEV-543240 and functional issue SWDEV-544256
2025-07-14[llvm] Remove unused includes (NFC) (#148768)Kazu Hirata
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-07-08Attributor: Infer noalias.addrspace metadata for memory instructions (#136553)Shoreshen
Add noalias.addrspace metadata for store, load and atomic instruction in AMDGPU backend.
2025-06-23AMDGPU: Remove legacy pass manager version of AMDGPUAttributor (#145262)Matt Arsenault
2025-05-23[IPO] Teach AbstractAttribute::getName to return StringRef (NFC) (#141313)Kazu Hirata
This patch addresses clang-tidy's readability-const-return-type by dropping const from the return type while switching to StringRef at the same time because these functions just return string constants.
2025-05-17[AMDGPU] Fix a warningKazu Hirata
This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:1139:17: error: unused variable 'Func' [-Werror,-Wunused-variable]
2025-05-17[AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` (#123995)Shilei Tian
Currently, we use `AAAMDWavesPerEU` to iteratively update values based on attributes from the associated function, potentially propagating user-annotated values, along with `AAAMDFlatWorkGroupSize`. Similarly, we have `AAAMDFlatWorkGroupSize`. However, since the value calculated through the flat workgroup size always dominates the user annotation (i.e., the attribute), running `AAAMDWavesPerEU` iteratively is unnecessary if no user-annotated value exists. This PR completely rewrites how the `amdgpu-waves-per-eu` attribute is handled in `AMDGPUAttributor`. The key changes are as follows: - `AAAMDFlatWorkGroupSize` remains unchanged. - `AAAMDWavesPerEU` now only propagates user-annotated values. - A new function is added to check and update `amdgpu-waves-per-eu` based on the following rules: - No waves per eu, no flat workgroup size: Assume a flat workgroup size of `1,1024` and compute waves per eu based on this. - No waves per eu, flat workgroup size exists: Use the provided flat workgroup size to compute waves-per-eu. - Waves per eu exists, no flat workgroup size: This is a tricky case. In this PR, we assume a flat workgroup size of `1,1024`, but this can be adjusted if a different approach is preferred. Alternatively, we could directly use the user-annotated value. - Both waves per eu and flat workgroup size exist: If there’s a conflict, the value derived from the flat workgroup size takes precedence over waves per eu. This PR also updates the logic for merging two waves per eu pairs. The current implementation, which uses `clampStateAndIndicateChange` to compute a union, might not be ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 2 waves per eu, and another specifies a minimum of 4, we should guarantee that 4 waves per eu can be supported, as failing to do so could result in excessive resource allocation per wave. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, `lo_a,up_a` and `lo_b,up_b`: `max(lo_a, lo_b), max(up_a, up_b)`. This ensures that resource allocation adheres to the stricter constraints from both inputs. Fix #123092.
2025-05-11[AMDGPU] Move kernarg preload logic to separate pass (#130434)Austin Kerbow
Moves kernarg preload logic to its own module pass. Cloned function declarations are removed when preloading hidden arguments. The inreg attribute is now added in this pass instead of AMDGPUAttributor. The rest of the logic is copied from AMDGPULowerKernelArguments which now only check whether an arguments is marked inreg to avoid replacing direct uses of preloaded arguments. This change requires test updates to remove inreg from lit tests with kernels that don't actually want preloading.
2025-05-02[AMDGPU][Attributor] Add `ThinOrFullLTOPhase` as an argument (#123994)Shilei Tian
2025-05-01[AMDGPU] Max. WG size-induced occupancy limits max. waves/EU (#137807)Lucas Ramirez
The default maximum waves/EU returned by the family of `AMDGPUSubtarget::getWavesPerEU` is currently the maximum number of waves/EU supported by the subtarget (only a valid occupancy range in "amdgpu-waves-per-eu" may lower that maximum). This ignores maximum achievable occupancy imposed by flat workgroup size and LDS usage, resulting in situations where `AMDGPUSubtarget::getWavesPerEU` produces a maximum higher than the one from `AMDGPUSubtarget::getOccupancyWithWorkGroupSizes`. This limits the waves/EU range's maximum to the maximum achievable occupancy derived from flat workgroup sizes and LDS usage. This only has an impact on functions which restrict flat workgroup size with "amdgpu-flat-work-group-size", since the default range of flat workgroup sizes achieves the maximum number of waves/EU supported by the subtarget. Improvements to the handling of "amdgpu-waves-per-eu" are left for a follow up PR (e.g., I think the attribute should be able to lower the full range of waves/EU produced by these methods).
2025-04-07[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)Rahul Joshi
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-03-19AMDGPU: Fix attributor not handling all trap intrinsics (#131758)Matt Arsenault
2025-03-06AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc (#129893)Matt Arsenault
This performs the minimal replacment of amdgpu-no-agpr to amdgpu-agpr-alloc=0. Most of the test diffs are due to the new attribute sorting later alphabetically. We could do better by trying to perform range merging in the attributor, and trying to pick non-0 values.
2024-12-20AMDGPU: Fix mishandling of search for constantexpr addrspacecasts (#120346)Matt Arsenault
2024-12-11[AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (#114438)Shilei Tian
2024-12-11[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute ↵Shilei Tian
(#114357) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.
2024-12-10[AMDGPU] Re-enable closed-world assumption as an opt-in feature (#115371)Shilei Tian
Although the ABI (if one exists) doesn’t explicitly prohibit cross-code-object function calls—particularly since our loader can handle them—such calls are not actually allowed in any of the officially supported programming models. However, this limitation has some nuances. For instance, the loader can handle cross-code-object global variables, which complicates the situation further. Given this complexity, assuming a closed-world model at link time isn’t always safe. To address this, this PR introduces an option that enables this assumption, providing end users the flexibility to enable it for improved compiler optimizations. However, it is the user’s responsibility to ensure they do not violate this assumption.
2024-12-09Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in ↵Jun Wang
AMDGPUAttributor (#94647)" (#118907) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-09AMDGPU: Propagate amdgpu-max-num-workgroups attribute (#113018)Matt Arsenault
I'm not sure what the interpretation of 0 is supposed to be, AMDGPUUsage doesn't say.
2024-12-04Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in ↵Philip Reames
AMDGPUAttributor (#94647)" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.
2024-12-04[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor ↵Jun Wang
(#94647) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
2024-10-31[NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cppShilei Tian
2024-10-30Revert "[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)"Shilei Tian
This reverts commit 922a0d3dfe2db7a2ef50e8cef4537fa94a7b95bb.
2024-10-30[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)Shilei Tian
Avoid calling TTI or other stuff unnecessarily
2024-10-29[AMDGPU][Attributor] Check the validity of a dependent AA before using its ↵Shilei Tian
value (#114165) Even though the Attributor framework will invalidate all its dependent AAs after the current iteration, a dependent AA can still use the worst state of a depending AA if it doesn't check the state of the depending AA in current iteration.
2024-09-12[NFC][AMDGPU][Attributor] Only iterate over filtered functions when creating ↵Shilei Tian
AAs (#108417)
2024-09-06[Attributor] Add support for atomic operations in `AAAddressSpace` (#106927)Shilei Tian
2024-09-01Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106889)Shilei Tian
We can't assume closed world even in full LTO post-link stage. It is only true if we are building a "GPU executable". However, AMDGPU does support "dyamic library". I'm not aware of any approach to tell if it is relocatable link when we create the pass. For now let's revert the patch as it is currently breaking things. We can re-enable it once we can handle it correctly.
2024-08-27[AMDGPU][Attributor] Remove uniformity check in the indirect call ↵Shilei Tian
specialization callback (#106177) This patch removes the conservative uniformity check in the indirect call specialization callback, as whether the function pointer is uniform doesn't matter too much. Instead, we add an argument to control specialization.
2024-08-25Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" ↵Anshil Gandhi
(#106000)" (#106001) This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022. Add a requirement for an amdgpu target in the test.
2024-08-25Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)Anshil Gandhi
This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.
2024-08-25[AMDGPU][LTO] Assume closed world after linking (#105845)Anshil Gandhi
2024-08-14Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor ↵Shilei Tian
(#100952)" This reverts commit 36467bfe89f231458eafda3edb916c028f1f0619.
2024-08-09[AMDGPU][Attributor] Add a pass parameter `closed-world` for ↵Shilei Tian
AMDGPUAttributor pass (#101760)
2024-08-07Revert "Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for ↵Shilei Tian
AMDAttributor (#100952)"" This reverts commit 7a68449a82ab1c1ab005caa72c1d986ca5deca36. https://lab.llvm.org/buildbot/#/builders/123/builds/3205
2024-08-06Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor ↵Shilei Tian
(#100952)" This reverts commit 874cd100a076f3b98aaae09f90ef224682501538.