summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
AgeCommit message (Collapse)Author
2025-09-16[AMDGPU] Refactor out common exec mask opcode patterns (NFCI) (#154718)Carl Ritson
Create utility mechanism for finding wave size dependent opcodes used to manipulate exec/lane masks.
2025-02-28[AMDGPU][True16][CodeGen] True16 Add OpSel when optimizing exec mask (#128928)Brox Chen
True16 Add OpSel when optimizing exec mask True16 VOPCX have the opsel argument. Add it when we create these instructions in SIOptimizeExecMasking. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-02-12[TableGen] Emit OpName as an enum class instead of a namespace (#125313)Rahul Joshi
- Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).
2025-01-20[AMDGPU][NewPM] Port SIOptimizeExecMasking to NPM (#123572)Akshat Oke
2024-07-17[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)Jay Foad
2024-07-10[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)paperchalice
- Add `LiveIntervalsAnalysis`. - Add `LiveIntervalsPrinterPass`. - Use `LiveIntervalsWrapperPass` in legacy pass manager. - Use `std::unique_ptr` instead of raw pointer for `LICalc`, so destructor and default move constructor can handle it correctly. This would be the last analysis required by `PHIElimination`.
2024-03-08Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)AtariDreams
This only converts the instances where all that is needed is to change the variable type name. Basically, anything that involves a function that LiveRegUnits does not directly have was skipped to play it safe. Reverts https://github.com/llvm/llvm-project/commit/7a0e222a17058a311b69153d0b6f1b4459414778
2024-03-07Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"Jay Foad
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c. It was causing test failures on the expensive check builders.
2024-03-06Convert many LivePhysRegs uses to LiveRegUnits (#83905)AtariDreams
2023-11-02[AMDGPU] Detect kills in register sets when trying to form V_CMPX ↵Thomas Symalla
instructions. (#68293) During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to be alive. This is forced by clearing the kill flags on the operand after V_CMPX has been generated. However, if we have a kill of a register set that contains said register, this will not be detected by clearKillFlags. With this change, possible additional kill-flag candidates will be detected during the final call to findInstrBackwards and then, the kill flag will be removed to keep all registers in the set alive. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
2022-08-23[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass.Thomas Symalla
Fix typos and remove an unused argument. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132292
2022-07-21[AMDGPU] Combine s_or_saveexec, s_xor instructions.Thomas Symalla
This patch merges a consecutive sequence of s_or_saveexec s_o, s_i s_xor exec, exec, s_o into a single s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D129073
2022-07-06[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.Thomas Symalla
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086
2022-06-24AMDGPU: Clear kill flags when optimizing vcmp save exec sequenceKonstantin Zhuravlyov
It was causing bad machine code for several blender scenes: *** Bad machine code: Using an undefined physical register *** - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3 Differential Revision: https://reviews.llvm.org/D127768
2022-04-08[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.Thomas Symalla
We found that it might be beneficial to have the SIOptimizeExecMasking pass detect more cases where v_cmp, s_and_saveexec patterns can be transformed to s_mov, v_cmpx patterns. Currently, the search range for finding a fitting v_cmp instruction is 5, however, this is doubled to 10 here. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D123367
2022-03-31[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.Thomas Symalla
Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detected if the v_cmp target register is used between the two instructions as the v_cmp result gets omitted by using the v_cmpx instruction, resulting in invalid code. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122797
2022-03-28[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.Thomas Symalla
Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modifiers are not correctly inherited from the original operands. The patch adds the source modifiers, if they are exist, or sets them to 0. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122489
2022-03-25[AMDGPU] Improve v_cmpx usage on GFX10.3.Thomas Symalla
On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332
2022-03-21Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."Thomas Symalla
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64. Differential Revision: https://reviews.llvm.org/D122117
2022-03-21[AMDGPU] [NFC] Fix missing include.Thomas Symalla
2022-03-21[AMDGPU] Improve v_cmpx usage on GFX10.3.Thomas Symalla
On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696
2022-02-18[AMDGPU] Return better Changed status from SIOptimizeExecMaskingJay Foad
Differential Revision: https://reviews.llvm.org/D120024
2021-02-11[AMDGPU] Move kill lowering to WQM pass and add live mask trackingCarl Ritson
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746
2021-01-20[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargetsdfukalov
... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036
2021-01-07[NFC][AMDGPU] Reduce include files dependency.dfukalov
Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813
2020-11-10[AMDGPU] Fix lowering of S_MOV_{B32,B64}_termCarl Ritson
If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90451
2020-09-18AMDGPU: Don't sometimes allow instructions before lowered si_end_cfMatt Arsenault
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator. This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block. In a future change, this should always split the block.
2020-07-28AMDGPU: Optimize copies to exec with other insts after exec defMatt Arsenault
It's possible to have terminator instructions after a write to exec, so skip over them to find it.
2020-07-28AMDGPU: Don't assume there is only one terminator copyMatt Arsenault
This would stop on the first in reverse order, failing the verifier if there were more earlier in the block.
2019-12-27AMDGPU: Use RegisterMatt Arsenault
2019-11-13Sink all InitializePasses.h includesReid Kleckner
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
2019-08-20Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault
This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
2019-08-15Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
2019-08-01Reapply "AMDGPU: Split block for si_end_cf"Matt Arsenault
This reverts commit r359363, reapplying r357634 llvm-svn: 367500
2019-07-11[AMDGPU] gfx908 mfma supportStanislav Mekhanoshin
Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
2019-06-16[AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin
This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
2019-04-27Revert "AMDGPU: Split block for si_end_cf"Mark Searles
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
2019-04-03AMDGPU: Split block for si_end_cfMatt Arsenault
Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2018-07-11AMDGPU: Refactor Subtarget classesTom Stellard
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
2018-05-22AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard
Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
2018-05-14Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen
The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
2018-04-23AMDGPU: Fix a corner case crash in SIOptimizeExecMaskingNicolai Haehnle
Summary: See the new test case; this is really unlikely to happen with real code, but I ran into this while attempting to bugpoint-reduce a different issue. Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45885 llvm-svn: 330585
2018-02-23[MachineOperand][Target] MachineOperand::isRenamable semantics changesGeoff Berry
Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931
2018-01-29[AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regsGeoff Berry
Summary: Fix a few places that were modifying code after register allocation to set the renamable bit correctly to avoid failing the validation added in D42449. llvm-svn: 323675
2017-12-15MachineFunction: Return reference from getFunction(); NFCMatthias Braun
The Function can never be nullptr so we can return a reference. llvm-svn: 320884
2017-11-28[CodeGen] Rename functions PrintReg* to printReg*Francis Visoiu Mistrih
LLVM Coding Standards: Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()). Differential Revision: https://reviews.llvm.org/D40416 llvm-svn: 319168
2017-11-14AMDGPU: Fix producing saveexec when the copy is spilledMatt Arsenault
If the register from the copy from exec was spilled, the copy before the spill was deleted leaving a spill of undefined register verifier error and miscompiling. Check for other use instructions of the copy register. llvm-svn: 318132
2017-10-10AMDGPU: Fix missing skipFunction callsMatt Arsenault
llvm-svn: 315361
2017-08-01[AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unusedStanislav Mekhanoshin
With SI_END_CF elimination for some nested control flow we can now eliminate saved exec register completely by turning a saveexec version of instruction into just a logical instruction. Differential Revision: https://reviews.llvm.org/D36007 llvm-svn: 309766