summaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
AgeCommit message (Collapse)Author
2025-11-21[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> ↵Jordan Rupprecht
Basic/DiagnosticFrontend.h (#162883) d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h. Doing this will help prevent future layering issues. See #162865. Frontend already depends on Basic, so no new deps need to be added anywhere except for places that do strict dep checking.
2025-10-30[AMDGPU][Clang] Support for type inferring extended image builtins for ↵Rana Pratap Reddy
AMDGPU (#164358) Introduces the builtins for extended image insts for amdgcn.
2025-10-21[clang] Add support for cluster sync scope (#162575)macurtis-amd
From Sam Liu: >CUDA supports thread block clusters https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-block-clusters > >In their atomic intrinsics, cluster scope is supported https://docs.nvidia.com/cuda/cuda-c-programming-guide/#nv-atomic-fetch-add-and-nv-atomic-add > >For compatibility, clang and hip needs to support cluster scope.
2025-10-10[AMDGPU] Support for type inferring image load/store builtins for AMDGPU ↵Rana Pratap Reddy
(#140210) Introduces the builtins for amdgcn_image_load/store/sample.
2025-09-29[AMDGPU][SPIRV] Use SPIR-V syncscopes for some AMDGCN BIs (#154867)Alex Voicu
AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those for scoped fences and some specific RMWs. However, at present we don't map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN ones. This ends up pessimising the resulting code as system scope is used instead of device (agent) or subgroup (wavefront), so we correct the behaviour, to ensure that we do the right thing during reverse translation.
2025-09-10[AMDGPU] Add builtins for wave reduction intrinsics (#150170)Aaditya
2025-09-04[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)Pierre van Houtryve
- Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-02[AMDGPU] Support cluster load instructions for gfx1250 (#156548)Changpeng Fang
2025-08-27clang/AMDGPU: Add __builtin_amdgcn_inverse_ballot_w{32,64} (#155724)Nicolai Hähnle
Add builtins that expose the underlying llvm.amdgcn.inverse.ballot intrinsic that we've had for a while. This allows more explicitly writing code that selects or branches in terms of lane masks, which can lead to better code quality.
2025-08-05[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194)Stanislav Mekhanoshin
2025-08-05[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)zGoldthorpe
This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.
2025-08-04[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036)Stanislav Mekhanoshin
2025-07-24[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540)Changpeng Fang
2025-07-24[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627)Pierre van Houtryve
"amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.
2025-07-21AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)Changpeng Fang
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-18[Clang][AMDGPU] Add the missing builtin `__builtin_amdgcn_sqrt_bf16` (#149447)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18[AMDGPU] Add support for `v_tanh_f16` on gfx1250 (#149439)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17[AMDGPU] Add support for `v_cos_bf16` on gfx1250 (#149355)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17[AMDGPU] Add support for `v_sin_bf16` on gfx1250 (#149241)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17[AMDGPU] Add support for `v_exp_bf16` on gfx1250 (#149229)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16[AMDGPU] Add support for `v_log_bf16` on gfx1250 (#149201)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16[AMDGPU] Add support for `v_rsq_bf16` on gfx1250 (#149194)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991)Changpeng Fang
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15[AMDGPU] Add support for `v_rcp_bf16` on gfx1250 (#148916)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425)Shilei Tian
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-29AMDGPU: Implement intrinsic/builtins for gfx1250 load transpose instructions ↵Changpeng Fang
(#146289)
2025-05-19[AMDGPU][clang] provide device implementation for __builtin_logb and … ↵choikwa
(#129347) …__builtin_scalbn Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.
2025-05-19[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)Krzysztof Drewniak
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.
2025-04-18AMDGPU: Mark workitem ID intrinsics with range attribute (#136196)Matt Arsenault
This avoids the need to have special handling at every use site. Unfortunately this means we unnecessarily emit AssertZext in the DAG (where we already directly understand the range of the intrinsic), andt we regress in undefined cases as we don't fold out asserts on undef.
2025-04-11[AMDGPU][Clang] Add builtins for gfx12 ray tracing intrinsics (#135224)Shilei Tian
2025-04-07Revert "Reland [Clang][Cmake] fix libtool duplicate member name warnings" ↵Farzon Lotfi
(#134656) Reverts llvm/llvm-project#133850
2025-04-07Reland [Clang][Cmake] fix libtool duplicate member name warnings (#133850)Farzon Lotfi
fixes https://github.com/llvm/llvm-project/issues/133199 As of the third commit the fix to the linker missing references in `Targets/DirectX.cpp` found in https://github.com/llvm/llvm-project/pull/133776 was fixed by moving `HLSLBufferLayoutBuilder.cpp` to `clang/lib/CodeGen/Targets/`. It fixes the circular reference issue found in https://github.com/llvm/llvm-project/pull/133619 for all `-DBUILD_SHARED_LIBS=ON` builds by removing `target_link_libraries` from the sub directory cmake files. testing for amdgpu offload was done via `cmake -B ../llvm_amdgpu -S llvm -GNinja -C offload/cmake/caches/Offload.cmake -DCMAKE_BUILD_TYPE=Release` PR https://github.com/llvm/llvm-project/pull/132252 Created a second file that shared <TargetName>.cpp in clang/lib/CodeGen/CMakeLists.txt For example There were two AMDGPU.cpp's one in TargetBuiltins and the other in Targets. Even though these were in different directories libtool warns that it might not distinguish them because they share the same base name. There are two potential fixes. The easy fix is to rename one of them and keep one cmake file. That solution though doesn't future proof this problem in the event of a third <TargetName>.cpp and it seems teams want to just use the target name https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483. The alternative fix that this PR went with is to seperate the cmake files into their own sub directories as static libs.
2025-03-31Revert "[Clang][Cmake] fix libtool duplicate member name warnings" (#133795)Farzon Lotfi
Reverts llvm/llvm-project#133619
2025-03-31[Clang][Cmake] fix libtool duplicate member name warnings (#133619)Farzon Lotfi
fixes #133199 PR #132252 Created a second file that shared `<TargetName>.cpp` in `clang/lib/CodeGen/CMakeLists.txt` For example There were two `AMDGPU.cpp`'s one in `TargetBuiltins` and the other in `Targets`. Even though these were in different directories `libtool` warns that it might not distinguish them because they share the same base name. There are two potential fixes. The easy fix is to rename one of them and keep one cmake file. That solution though doesn't future proof this problem in the event of a third `<TargetName>.cpp` and it seems teams want to just use the target name https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483. The alternative fix is to seperate the cmake files into their own sub directories. I chose to create static libraries. It might of been possible to build an OBJECT, but I only saw examples of this in compiler-rt and test directories so assumed there was a reason it wasn't used.
2025-03-28[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)Joseph Huber
Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.
2025-03-27[NFC][HLSL] Move emitter out of AMDGPU.cpp (#133251)Farzon Lotfi
- Move all HLSL code out of AMDGPU.cpp to CGHLSLBuiltins.cpp - Fixes accidental reorganization of HLSL code into AMDGPU caused by (https://github.com/llvm/llvm-project/pull/132252, https://github.com/llvm/llvm-project/commit/7f920e2e5f70b)
2025-03-26[NFC][clang] Remove superfluous header files after refactor in #132252 (#132495)Jonathan Thackray
Remove superfluous header files after refactor in #132252
2025-03-21[NFC][clang] Split clang/lib/CodeGen/CGBuiltin.cpp into target-specific ↵Jonathan Thackray
files (#132252) clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can take minutes to recompile (depending on compiler and host system) when modified, and 5 seconds for clangd to update for every edit. Splitting this file was discussed in this thread: https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/ and the idea has received a number of +1 votes, hence this change.