llvm-project.git/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (#157845)

2025-09-10T15:57:15+00:00

Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.

[AMDGPU] Restrict scale operands of WMMA to low 256 VGPRs (#157526)

2025-09-08T22:44:51+00:00

These cannot accept high registers.

CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337)

2025-09-08T03:12:19+00:00

This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.

---------

Co-authored-by: Sergei Barannikov

AMDGPU: Allow folding multiple uses of some immediates into copies (#154757)

2025-09-05T23:22:09+00:00

In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.

AMDGPU: Remove flat special case in getRegClass (#156991)

2025-09-05T22:42:16+00:00

[AMDGPU] High VGPR lowering on gfx1250 (#156965)

2025-09-04T23:20:47+00:00

[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)

2025-09-04T09:19:25+00:00

- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.

[AMDGPU] Tail call support for whole wave functions (#145860)

2025-09-04T08:34:43+00:00

Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).

Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.

AMDGPU: Remove the DS special case in getRegClass (#156696)

2025-09-04T06:14:17+00:00

These instructions should now have proper representation
with separate instructions for operands which must be paired.

AMDGPU: Special case align requirement for AV_MOV_B64_IMM_PSEUDO

2025-09-04T00:55:39+00:00

This should not require aligned registers. Fixes expensive_checks
test failure. I don't see a better way until the new system
to specify the alignment per register is done.