llvm-project.git/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp, branch main

[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> Basic/DiagnosticFrontend.h (#162883)

2025-11-21T03:39:49+00:00

d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.

Doing this will help prevent future layering issues. See #162865.

Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.

[AMDGPU][Clang] Support for type inferring extended image builtins for AMDGPU (#164358)

2025-10-30T16:50:28+00:00

Introduces the builtins for extended image insts for amdgcn.

[clang] Add support for cluster sync scope (#162575)

2025-10-21T10:47:26+00:00

From Sam Liu:
>CUDA supports thread block clusters
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-block-clusters
>
>In their atomic intrinsics, cluster scope is supported
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#nv-atomic-fetch-add-and-nv-atomic-add
>
>For compatibility, clang and hip needs to support cluster scope.

[AMDGPU] Support for type inferring image load/store builtins for AMDGPU (#140210)

2025-10-10T09:56:08+00:00

Introduces the builtins for amdgcn_image_load/store/sample.

[AMDGPU][SPIRV] Use SPIR-V syncscopes for some AMDGCN BIs (#154867)

2025-09-29T21:50:15+00:00

AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those
for scoped fences and some specific RMWs. However, at present we don't
map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN
ones. This ends up pessimising the resulting code as system scope is
used instead of device (agent) or subgroup (wavefront), so we correct
the behaviour, to ensure that we do the right thing during reverse
translation.

[AMDGPU] Add builtins for wave reduction intrinsics (#150170)

2025-09-10T13:36:07+00:00

[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)

2025-09-04T09:19:25+00:00

- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.

[AMDGPU] Support cluster load instructions for gfx1250 (#156548)

2025-09-02T23:34:20+00:00

clang/AMDGPU: Add __builtin_amdgcn_inverse_ballot_w{32,64} (#155724)

2025-08-28T02:40:03+00:00

Add builtins that expose the underlying llvm.amdgcn.inverse.ballot
intrinsic that we've had for a while.

This allows more explicitly writing code that selects or branches in
terms of lane masks, which can lead to better code quality.

[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194)

2025-08-05T22:15:21+00:00