llvm-project.git/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll, branch main

Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)"

2025-11-23T05:17:45+00:00

This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.

This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.

[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)

2025-11-23T02:11:24+00:00

This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.

AMDGPU: Select vector reg class for divergent build_vector (#168169)

2025-11-15T05:53:39+00:00

The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.

AMDGPU: Use v_mov_b32 to implement divergent zext i32->i64 (#168166)

2025-11-15T04:19:24+00:00

Some cases are relying on SIFixSGPRCopies to force VALU
reg_sequence inputs with SGPR inputs to use all VGPR inputs,
but this doesn't always happen if the reg_sequence isn't
invalid. Make sure we use a vgpr up-front here so we don't
rely on something later.

AMDGPU: Relax shouldCoalesce to allow more register tuple widening (#166475)

2025-11-11T21:50:57+00:00

Allow widening up to 128-bit registers or if the new register class
is at least as large as one of the existing register classes.

This was artificially limiting. In particular this was doing the wrong
thing with sequences involving copies between VGPRs and AV registers.
Nearly all test changes are improvements.

The coalescer does not just widen registers out of nowhere. If it's
trying
to "widen" a register, it's generally packing a register into an
existing
register tuple, or in a situation where the constraints imply the wider
class anyway. 067a11015 addressed the allocation failure concern by
rejecting coalescing if there are no available registers. The original
change in a4e63ead4b didn't include a realistic testcase to judge if
this is harmful for pressure. I would expect any issues from this to
be of garden variety subreg handling issue. We could use more dynamic
state information here if it really is an issue.

I get the best results by removing this override completely. This is
a smaller step for patch splitting purposes.

[AMDGPU] Rework GFX11 VALU Mask Write Hazard (#138663)

2025-10-28T07:09:28+00:00

Apply additional counter waits to address VALU writes to SGPRs. Rework
expiry detection and apply wait coalescing to mitigate some of the
additional waits.

[AMDGPU] Reland "Remove redundant s_cmp_lg_* sX, 0" (#164201)

2025-10-22T13:42:29+00:00

Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by
excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0.
Passes check-libc-amdgcn-amd-amdhsa now.

Distribution of instructions that allowed a redundant S_CMP to be
deleted in check-libc-amdgcn-amd-amdhsa test:

```
S_AND_B32      485
S_AND_B64      47
S_ANDN2_B32    42
S_ANDN2_B64    277492
S_CSELECT_B64  17631
S_LSHL_B32     6
S_OR_B64       11
```

---------

Signed-off-by: John Lu 
Co-authored-by: Matt Arsenault

Revert "[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 " (#164116)

2025-10-18T20:38:14+00:00

Reverts llvm/llvm-project#162352

Broke our buildbot:
https://lab.llvm.org/buildbot/#/builders/10/builds/15674
To reproduce

cd llvm-project
cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake
-GNinja
cd thebuild
ninja
ninja check-libc-amdgcn-amd-amdhsa

[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 (#162352)

2025-10-18T14:33:47+00:00

Remove redundant s_cmp_lg_* sX, 0 if SALU instruction already sets SCC
if sX!=0.

---------

Signed-off-by: John Lu

AMDGPU: Allow folding multiple uses of some immediates into copies (#154757)

2025-09-05T23:22:09+00:00

In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.