summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
AgeCommit message (Collapse)Author
2025-11-18[NFC] Check operand type instead of opcode (#168641)Shilei Tian
A folow-up of #168458.
2025-11-18[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from ↵Shilei Tian
its lower 32-bit (#168458) On some targets, a packed f32 instruction can only read 32 bits from a scalar operand (SGPR or literal) and replicates the bits to both channels. In this case, we should not fold an immediate value if it can't be replicated from its lower 32-bit. Fixes SWDEV-567139.
2025-11-15[AMDGPU] When shrinking and/or to bitset*, remove implicit scc def (#168128)LU-JOHN
When shrinking and/or to bitset* remove leftover implicit scc def. bitset* instructions do not set scc. Signed-off-by: John Lu <John.Lu@amd.com>
2025-11-14[TableGen] Split *GenRegisterInfo.inc. (#167700)Ivan Kosarev
Reduces memory usage compiling backend sources, most notably for AMDGPU by ~98 MB per source on average. AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and is even larger downstream. At the same time, it is included in nearly all backend sources, typically just for a small portion of its content, resulting in compilation being unnecessarily memory-hungry, which in turn stresses buildbots and wastes their resources. Splitting .inc files also helps avoiding extra ccache misses where changes in .td files don't cause changes in all parts of what previously was a single .inc file. It is thought that rather than building on top of the current single-output-file design of TableGen, e.g., using `split-file`, it would be more preferable to recognise the need for multi-file outputs and give it a proper first-class support directly in TableGen.
2025-11-14[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)Jay Foad
2025-11-11AMDGPU: Remove wrapper around TRI::getRegClass (#159885)Matt Arsenault
This shadows the member in the base class, but differs slightly in behavior. The base method doesn't check for the invalid case.
2025-11-10CodeGen: Remove TRI argument from getRegClass (#158225)Matt Arsenault
TargetInstrInfo now directly holds a reference to TargetRegisterInfo and does not need TRI passed in anywhere.
2025-11-07[AMDGPU][MachineVerifier] test failures in SIFoldOperands (#166600)Abhay Kanhere
After PR:https://github.com/llvm/llvm-project/pull/151421 merged following fails in SIFoldOperands showed up. LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll LLVM :: CodeGen/AMDGPU/mfma-loop.ll LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll In Folding code, if folded operand is register ensure earlyClobber is set. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-11-05AMDGPU: Delete redundant recursive copy handling code (#157032)Matt Arsenault
This fixes a regression exposed after 445415219708f9539801018e03282049ca33e0e2. This introduces a few small regressions for true16. There are more cases where the value can propagate through subregister extracts which need new handling. They're also small enough that perhaps there's a way to avoid needing to deal with this case in the first place.
2025-10-08AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints ↵Matt Arsenault
(#158272) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.
2025-10-03[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764)Brox Chen
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. Use the correct src index for v_mov_b16_t16.
2025-10-03AMDGPU: Fix constrain register logic for physregs (#161794)Matt Arsenault
We do not need to reconstrain physical registers. Enables an additional fold for constant physregs.
2025-09-27AMDGPU: Check if immediate is legal for av_mov_b32_imm_pseudo (#160819)Matt Arsenault
This is primarily to avoid folding a frame index materialized into an SGPR into the pseudo; this would end up looking like: %sreg = s_mov_b32 %stack.0 %av_32 = av_mov_b32_imm_pseudo %sreg Which is not useful. Match the check used for the b64 case. This is limited to the pseudo to avoid regression due to gfx908's special case - it is expecting to pass here with v_accvgpr_write_b32 for illegal cases, and stay in the intermediate state with an sgpr input. This avoids regressions in a future patch.
2025-09-26[AMDGPU] Avoid constraining RC based on folded into operand (NFC) (#160743)Josh Hutton
The RC of the folded operand does not need to be constrained based on the RC of the current operand we are folding into. The purpose of this PR is to facilitate this PR: https://github.com/llvm/llvm-project/pull/151033
2025-09-17[AMDGPU] Fold copies of constant physical registers into their uses (#154410)Stanislav Mekhanoshin
Co-authored-by: Jay Foad <Jay.Foad@amd.com> Co-authored-by: Jay Foad <Jay.Foad@amd.com>
2025-09-16CodeGen: Surface shouldRewriteCopySrc utility function (#158524)Matt Arsenault
Change shouldRewriteCopySrc to return the common register class and expose it as a utility function. I've found myself reproducing essentially the same logic in multiple places. The purpose of this function is to jsut work through the API constraints of which combination of register class and subreg indexes you have. i.e. you need to use a different function if you have 0, 1, or 2 subregister indexes involved in a pair of copy-like operations.
2025-09-12CodeGen: Remove MachineFunction argument from getRegClass (#158188)Matt Arsenault
This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.
2025-09-03AMDGPU: Fold 64-bit immediate into copy to AV class (#155615)Matt Arsenault
This is in preparation for patches which will intoduce more copies to av registers.
2025-09-03AMDGPU: Avoid using exact class check in reg_sequence AGPR fold (#156135)Matt Arsenault
This does better in cases which mix align2 and non-align2 classes.
2025-09-02AMDGPU: Stop special casing aligned VGPR targets in operand folding (#155559)Matt Arsenault
Perform a register class constraint check when performing the fold
2025-08-27AMDGPU: Remove special case of SGPR_LO class in imm folding (#155518)Matt Arsenault
Previous change accidentally broke this which shows it's not doing anything.
2025-08-27AMDGPU: Fold mov imm to copy to av_32 class (#155428)Matt Arsenault
Previously we had special case folding into copies to AGPR_32, ignoring AV_32. Try folding into the pseudos. Not sure why the true16 case regressed.
2025-08-26AMDGPU: Replace copy-to-mov-imm folding logic with class compat checks (#154501)Matt Arsenault
This strengthens the check to ensure the new mov's source class is compatible with the source register. This avoids using the register sized based checks in getMovOpcode, which don't quite understand AV superclasses correctly. As a side effect it also enables more folds into true16 movs. getMovOpcode should probably be deleted, or at least replaced with class check based logic. In this particular case other legality checks need to be mixed in with attempted IR changes, so I didn't try to push all of that into the opcode selection.
2025-08-18Revert "[AMDGPU] Fold copies of constant physical registers into their uses ↵Stanislav Mekhanoshin
(#154183)" (#154219) This reverts commit 3395676a18ab580f21ebcd4324feaf1294a8b6d9. Fails libc/test/src/string/libc.test.src.string.memmove_test.__hermetic__
2025-08-18[AMDGPU] Fold copies of constant physical registers into their uses (#154183)Stanislav Mekhanoshin
With current codegen this only affects src_flat_scratch_base_lo/hi. Co-authored-by: Jay Foad <Jay.Foad@amd.com> Co-authored-by: Jay Foad <Jay.Foad@amd.com>
2025-08-07[AMDGPU] bf16 clamp folding (#152573)Stanislav Mekhanoshin
2025-08-04[AMDGPU] Fold into uses of splat REG_SEQUENCEs through COPYs. (#145691)Ivan Kosarev
2025-07-30[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415)Stanislav Mekhanoshin
2025-07-26AMDGPU: Fix not folding splat immediate into VGPR MFMA src2 (#150628)Matt Arsenault
2025-07-21[AMDGPU] Prevent folding of FI with scale_offset on gfx1250 (#149894)Stanislav Mekhanoshin
SS forms of SCRATCH_LOAD_DWORD do not support SCALE_OFFSET, so if this bit is used SCRATCH_LOAD_DWORD_SADDR cannot be formed. This generally shall not happen because FI is not supposed to be scaled, but add this as a precaution.
2025-07-16AMDGPU: Fix assert when multi operands to update after folding imm (#148205)macurtis-amd
In the original motivating test case, [FoldList](https://github.com/llvm/llvm-project/blob/d8a2141ff98ee35cd1886f536ccc3548b012820b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp#L1764) had entries: ``` #0: UseMI: %224:sreg_32 = S_OR_B32 %219.sub0:sreg_64, %219.sub1:sreg_64, implicit-def dead $scc UseOpNo: 1 #1: UseMI: %224:sreg_32 = S_OR_B32 %219.sub0:sreg_64, %219.sub1:sreg_64, implicit-def dead $scc UseOpNo: 2 ``` After calling [updateOperand(#0)](https://github.com/llvm/llvm-project/blob/d8a2141ff98ee35cd1886f536ccc3548b012820b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp#L1773), [tryConstantFoldOp(#0.UseMI)](https://github.com/llvm/llvm-project/blob/d8a2141ff98ee35cd1886f536ccc3548b012820b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp#L1786) removed operand 1, and entry #&#8203;1.UseOpNo was no longer valid, resulting in an [assert](https://github.com/llvm/llvm-project/blob/4a35214bddbb67f9597a500d48ab8c4fb25af150/llvm/include/llvm/ADT/ArrayRef.h#L452). This change defers constant folding until all operands have been updated so that UseOpNo values remain stable.
2025-06-26AMDGPU: Handle folding vector splats of inline split f64 inline immediates ↵Matt Arsenault
(#140878) Recognize a reg_sequence with 32-bit elements that produce a 64-bit splat value. This enables folding f64 constants into mfma operands
2025-06-26AMDGPU: Fix tracking subreg defs when folding through reg_sequence (#140608)Matt Arsenault
We weren't fully respecting the type of a def of an immediate vs. the type at the use point. Refactor the folding logic to track the value to fold, as well as a subregister to apply to the underlying value. This is similar to how PeepholeOpt tracks subregisters (though only for pure copy-like instructions, no constants). Fixes #139317
2025-06-10AMDGPU: Try constant fold after folding immediate (#141862)Matt Arsenault
This helps avoid some regressions in a future patch. The or 0 pattern appears in the division tests because the reduce 64-bit bit operation to a 32-bit one with half identity value is only implemented for constants. We could fix that by using computeKnownBits. Additionally the pattern disappears if I optimize the IR division expansion, so that IR should probably be emitted more optimally in the first place.
2025-05-30[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed [B]F16 ↵Daniil Fukalov
imms. (#142142) VOP3 instructions ignore opsel source modifiers, so a constant that contains two different [B]F16 imms cannot be encoded into instruction with an src opsel. E.g. without the fix the following instructions `s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0>` `v_cvt_scalef32_pk_fp8_f16 v0, s0, v2` lose `2.0` imm and are folded into `v_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0` Fixes SWDEV-531672
2025-05-29AMDGPU: Remove redundant operand folding checks (#140587)Matt Arsenault
This was pre-filtering out a specific situation from being added to the fold candidate list. The operand legality will ultimately be checked with isOperandLegal before the fold is performed, so I don't see the plus in pre-filtering this one case.
2025-05-29AMDGPU: Delete seemingly dead s_fmaak_f32/s_fmamk_f32 folding code (#140580)Matt Arsenault
No tests fail with this. I'm not sure I understand the comment, there can't be any folding into an operand that had to already be a constant. I tried different combinations of immediates to these instructions but never hit the condition.
2025-05-27[AMDGPU] SIFoldOperands: Delay foldCopyToVGPROfScalarAddOfFrameIndex (#141558)Fabian Ritter
foldCopyToVGPROfScalarAddOfFrameIndex transforms s_adds whose results are copied to vector registers into v_adds. We don't want to do that if foldInstOperand (which so far runs later) can fold the sreg->vreg copy away. This patch therefore delays foldCopyToVGPROfScalarAddOfFrameIndex until after foldInstOperand. This avoids unnecessary movs in the flat-scratch-svs.ll test and also avoids regressions in an upcoming patch to enable ISD::PTRADD nodes.
2025-05-23[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101)Rahul Joshi
2025-05-19AMDGPU: Check for subreg match when folding through reg_sequence (#140582)Matt Arsenault
We need to consider the use instruction's intepretation of the bits, not the defined immediate without use context. This will regress some cases where we previously coud match f64 inline constants. We can restore them by either using pseudo instructions to materialize f64 constants, or recognizing reg_sequence decomposed into 32-bit pieces for them (which essentially means recognizing every other input is a 0). Fixes #139908
2025-05-17AMDGPU: Move reg_sequence splat handling (#140313)Matt Arsenault
This code clunkily tried to find a splat reg_sequence by looking at every use of the reg_sequence, and then looking back at the reg_sequence to see if it's a splat. Extract this into a separate helper function to help clean this up. This now parses whether the reg_sequence forms a splat once, and defers the legal inline immediate check to the use check (which is really use context dependent) The one regression is in globalisel, which has an extra copy that should have been separately folded out. It was getting dealt with by the handling of foldable copies in tryToFoldACImm. This is preparation for #139908 and #139317
2025-05-08[AMDGPU][NFC] Remove unused operand types. (#139062)Ivan Kosarev
2025-05-05[AMDGPU] Handle MachineOperandType global address in SIFoldOperands. (#135424)Akhilesh Moorthy
This patch handles the global operand type properly, fixing the bug : Assertion `(isFI() || isCPI() || isTargetIndex() || isJTI()) && "Wrong MachineOperand accessor"` failed. Fixes SWDEV-504645 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-04[Target] Remove unused local variables (NFC) (#138443)Kazu Hirata
2025-04-30[AMDGPU] Fix register class constraints for si-fold-operands pass when ↵mssefat
folding immediate into copies (#131387) Fixes https://github.com/llvm/llvm-project/issues/130020 This fixes an issue where the si-fold-operands pass would incorrectly fold immediate values into COPY instructions targeting av_32 registers. The pass now checks register class constraints before attempting to fold the immediate.
2025-04-22[AMDGPU] Do not fold COPY with implicit operands (#136003)Mariusz Sikora
Folding may remove COPY from inside of the divergent loop.
2025-04-19[AMDGPU] Construct SmallVector with iterator ranges (NFC) (#136415)Kazu Hirata
2025-04-02[AMDGPU][True16][CodeGen] fold clamp update for true16 (#128919)Brox Chen
Check through COPY for possible clamp folding for v_mad_mixhi_f16 isel
2025-04-02[AMDGPU][True16][CodeGen] Implement sgpr folding in true16 (#128929)Brox Chen
We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be folded into True16 bit instructions taking 16 bit values. Also use sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded. This improves generated code quality.
2025-04-01[AMDGPU] Fix SIFoldOperandsImpl::tryFoldZeroHighBits when met non-reg src1 ↵Valery Pykhtin
operand. (#133761) This happens when a constant is propagated to a V_AND 0xFFFF, reg instruction. Fixes failures like: ``` llc: /github/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:366: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed. Stack dump: 0. Program arguments: /github/llvm-project/build/Debug/bin/llc -mtriple=amdgcn -mcpu=gfx1101 -verify-machineinstrs -run-pass si-fold-operands /github/llvm-project/llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-skips-non-reg.mir -o - 1. Running pass 'Function Pass Manager' on module '/github/llvm-project/llvm/test/CodeGen/AMDGPU/fold-zero-high-bits-skips-non-reg.mir'. 2. Running pass 'SI Fold Operands' on function '@test_tryFoldZeroHighBits_skips_nonreg' ... #12 0x00007f5a55005cfc llvm::MachineOperand::getReg() const /github/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:0:5 #13 0x00007f5a555c6bf5 (anonymous namespace)::SIFoldOperandsImpl::tryFoldZeroHighBits(llvm::MachineInstr&) const /github/llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1459:36 #14 0x00007f5a555c63ad (anonymous namespace)::SIFoldOperandsImpl::run(llvm::MachineFunction&) /github/llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:2455:11 #15 0x00007f5a555c6780 (anonymous namespace)::SIFoldOperandsLegacy::runOnMachineFunction ```