llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPULowerIntrinsics.cpp, branch main

[AMDGPU] Add s_cluster_barrier on gfx1250 (#159175)

2025-09-16T21:49:48+00:00

AMDGPU: Refactor lowering of s_barrier to split barriers (#154648)

2025-08-28T14:01:20+00:00

Let's do the lowering of non-split into split barriers in a new IR pass,
AMDGPULowerIntrinsics. That way, there is no code duplication between
SelectionDAG and GlobalISel. This simplifies some upcoming extensions to
the code.

CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering

2023-06-10T01:04:37+00:00

Expand large or unknown size memory intrinsics into loops in the
default lowering pipeline if the target doesn't have the corresponding
libfunc. Previously AMDGPU had a custom pass which existed to call the
expansion utilities.

With a default no-libcall option, we can remove the libfunc checks in
LoopIdiomRecognize for these, which never made any sense. This also
provides a path to lifting the immarg restriction on
llvm.memcpy.inline.

There seems to be a bug where TLI reports functions as available if
you use -march and not -mtriple.

AMDGPU: Remove r600 local id annotations in AMDGPULowerIntrinsics

2023-06-07T18:55:55+00:00

With these dropped and memory intrinsic moved into a generic pass, we
can drop the whole pass.

No tests fail with this removed. The new amdgcn intrinsics are
annotated in clang up front.  Theoretically may regress r600, but that
would need new testing and support work (r600 ideally would also
follow the clang handling). The regression would be any IR passes
making use of known bits between this point and codegen. The DAG
computeKnownBits understand the intrinsics directly now.

If we wanted to refine these values, a better place would be in
AMDGPUAttributor.

[iwyu] Handle regressions in libLLVM header include

2022-05-04T06:32:38+00:00

Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847

AMDGPU: Directly implement computeKnownBits for workitem intrinsics

2022-04-22T14:49:50+00:00

Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inserted earlier after inlining, but before codegen.

Probably shouldn't change anything now. Just replacing the
late metadata annotation needs more work, since we lose
out on optimizations after these are lowered to CopyFromReg.

Seems to be slightly better than relying on the AssertZext from the
metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using
-start-before=amdgpu-isel instead of running the usual codegen
pipeline.

[AArch64, AMDGPU] Use make_early_inc_range (NFC)

2021-11-03T16:22:51+00:00

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

2021-01-20T19:22:45+00:00

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036

[NFC][AMDGPU] Reduce include files dependency.

2021-01-07T19:22:05+00:00

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813

AMDGPU: Use caller subtarget, not intrinsic declaration

2020-08-27T20:42:09+00:00

Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.