llvm-project.git/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp, branch main

[NFC] Check operand type instead of opcode (#168641)

2025-11-19T02:37:56+00:00

A folow-up of #168458.

[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit (#168458)

2025-11-18T22:11:10+00:00

On some targets, a packed f32 instruction can only read 32 bits from a
scalar operand (SGPR or literal) and replicates the bits to both
channels. In this case, we should not fold an immediate value if it
can't be replicated from its lower 32-bit.

Fixes SWDEV-567139.

[AMDGPU] When shrinking and/or to bitset*, remove implicit scc def (#168128)

2025-11-15T15:21:43+00:00

When shrinking and/or to bitset* remove leftover implicit scc def.
bitset* instructions do not set scc.

Signed-off-by: John Lu

[TableGen] Split *GenRegisterInfo.inc. (#167700)

2025-11-14T16:30:51+00:00

Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.

[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)

2025-11-14T11:00:57+00:00

AMDGPU: Remove wrapper around TRI::getRegClass (#159885)

2025-11-11T23:31:52+00:00

This shadows the member in the base class, but differs slightly
in behavior. The base method doesn't check for the invalid case.

CodeGen: Remove TRI argument from getRegClass (#158225)

2025-11-10T23:43:55+00:00

TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.

[AMDGPU][MachineVerifier] test failures in SIFoldOperands (#166600)

2025-11-08T05:12:19+00:00

After PR:https://github.com/llvm/llvm-project/pull/151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault 
Co-authored-by: Shilei Tian

AMDGPU: Delete redundant recursive copy handling code (#157032)

2025-11-06T02:01:12+00:00

This fixes a regression exposed after
445415219708f9539801018e03282049ca33e0e2.
This introduces a few small regressions for true16. There are more cases
where the value can propagate through subregister extracts which need
new handling. They're also small enough that perhaps there's a way to
avoid needing to deal with this case in the first place.

AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272)

2025-10-08T02:19:54+00:00

This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.