summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
AgeCommit message (Collapse)Author
2025-11-14[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)Jay Foad
2025-11-04[AMDGPU][NFC] Avoid copying MachineOperands (#166293)LU-JOHN
Avoid copying machine operands. Signed-off-by: John Lu <John.Lu@amd.com>
2025-11-04[CodeGen] MachineVerifier to check early-clobber constraint (#151421)Abhay Kanhere
Currently MachineVerifier is missing verifying early-clobber operand constraint. The only other machine operand constraint - TiedTo is already verified.
2025-10-31[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function (#165797)vangthao95
Follow-up patch to address the comments in https://github.com/llvm/llvm-project/pull/165355.
2025-10-30[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 (#165355)vangthao95
When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.
2025-10-29[AMDGPU] Support image atomic no return instructions (#150742)Harrison Hao
Add support for no-return variants of image atomic operations (e.g. IMAGE_ATOMIC_ADD_NORTN, IMAGE_ATOMIC_CMPSWAP_NORTN). These variants are generated when the return value of the intrinsic is unused, allowing the backend to select no return type instructions.
2025-10-20[AMDGPU] Enable volatile and non-temporal for loads to LDS (#153244)Krzysztof Drewniak
The primary purpose of this commit is to enable marking loads to LDS (global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux as with normal buffer loads) and to ensure that their !nontemporal annotations translate to appropriate settings of te cache control bits. However, in the process of implementing this feature, we also fixed - Incorrect handling of buffer loads to LDS in GlobalISel - Updating the handling of volatile on buffers in SIMemoryLegalizer: previously, the mapping of address spaces would cause volatile on buffer loads to be silently dropped on at least gfx10. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-09-30AMDGPU: Use srcvalue and delete Ignore complex pattern (#161359)Petar Avramovic
2025-09-30AMDGPU: Fix gcc build break (#161354)Petar Avramovic
2025-09-30AMDGPU: Fix s_barrier_leave to write to scc (#161221)Petar Avramovic
s_barrier_leave implicitly defines $scc and does not use imm that represents type of barrier, isel pattern ignores imm operand from llvm intrinsic. Test if SIInsertWaitcnts tracks this scc write.
2025-09-03[AMDGPU] Support cluster_load_async_to_lds instructions on gfx1250 (#156595)Changpeng Fang
2025-09-02[AMDGPU] Support cluster load instructions for gfx1250 (#156548)Changpeng Fang
2025-08-28AMDGPU: Refactor lowering of s_barrier to split barriers (#154648)Nicolai Hähnle
Let's do the lowering of non-split into split barriers in a new IR pass, AMDGPULowerIntrinsics. That way, there is no code duplication between SelectionDAG and GlobalISel. This simplifies some upcoming extensions to the code.
2025-08-12[AMDGPU] Add s_barrier_init|join|leave instructions (#153296)Stanislav Mekhanoshin
2025-08-12[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001)Fabian Ritter
For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.
2025-08-04[AMDGPU] Use SDNodeXForm to select a few VOP3P modifiers, NFC (#151907)Changpeng Fang
It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs and VOP3PModsNegAbs. We can use SDNodeXForm instead.
2025-07-30[AMDGPU] Fix destination op_sel for v_cvt_scale32_* and v_cvt_sr_* (#151411)Changpeng Fang
GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this patch essentially revert the work from https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.
2025-07-30[AMDGPU] Fix op_sel settings for v_cvt_scale32_* and v_cvt_sr_* (#151286)Changpeng Fang
For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a byte select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1] and [1, 1] should map to byte 0, 1, 2 and 3, respectively. For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select. OPSEL[2:3] specify which byte of the destination to write to. Note that the order of the bits is different from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0], [0, 1], [1, 0] and [1, 1] map to byte 0, 1, 2 and 3, respectively. Fixes: SWDEV-544901
2025-07-29[AMDGPU] Bitop3 opcodes for gfx1250 (#151235)Stanislav Mekhanoshin
2025-07-29[AMDGPU] Implement v_mad_u32/v_mad_nc_u|i64_u32 on gfx1250 (#151226)Stanislav Mekhanoshin
2025-07-29[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058)Changpeng Fang
2025-07-24[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540)Changpeng Fang
2025-07-24[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493)Stanislav Mekhanoshin
We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.
2025-07-22[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111)Stanislav Mekhanoshin
2025-07-22[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107)Stanislav Mekhanoshin
Also switches immediate offset to signed for the subtarget.
2025-07-22[AMDGPU] Select scale_offset with SMEM instructions (#150078)Stanislav Mekhanoshin
2025-07-21[AMDGPU] ISel & PEI for whole wave functions (#145858)Diana Picus
Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-18[AMDGPU] Select flat GVS atomics on gfx1250 (#149554)Stanislav Mekhanoshin
2025-07-15AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)Changpeng Fang
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-08[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469)Fabian Ritter
The last parameter of these functions used to be `Signed`, and it looks like a few calls weren't updated when that was changed to `FlatVariant`. Effectively, the functions were called with `FlatVariant=SALU` due to integer promotions, which doesn't make any sense.
2025-07-08[AMDGPU] Re-Re-apply: Implement vop3p complex pattern optmization for gisel ↵Shoreshen
(#146984) Reverts llvm/llvm-project#146982 Fix up reported building error for https://github.com/llvm/llvm-project/pull/136262 with: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 | } | ^ 1 error generated. ninja: build stopped: subcommand failed. ``` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-04Revert "[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for ↵Shoreshen
gisel" (#146982) Reverts llvm/llvm-project#136262 Due to building error: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 | } | ^ 1 error generated. ninja: build stopped: subcommand failed. ```
2025-07-04[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel ↵Shoreshen
(#136262) This is a fix up for patch https://github.com/llvm/llvm-project/pull/130234, which is reverted in https://github.com/llvm/llvm-project/pull/136249 The main reason of building failure are: 1. ``` /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand*, SrcStatus> > getSrcStats(const llvm::MachineOperand*, const llvm::MachineRegisterInfo&, searchOptions, int)’: /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669: error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to ‘SmallVector<[...],3>’ 4669 | return Statlist; ``` 2. ``` /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4554 | } | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39: error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare] 4644 | (Stat >= SrcStatus::NEG_START || Stat <= SrcStatus::NEG_END)) { | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4893 | [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); }, | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4894 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4899 | [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); }, | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4900 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ 6 errors generated. ``` Both error cannot be reproduced at my local machine, the fix applied are: 1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `getSrcStats` replace ``` SmallVector<std::pair<const MachineOperand *, SrcStatus>, 4> Statlist; ``` with ``` SmallVector<std::pair<const MachineOperand *, SrcStatus>> Statlist; ``` 2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `AMDGPUInstructionSelector::selectVOP3PRetHelper` replace ``` auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); ``` with ``` auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT); const MachineOperand *Op = Results.first; unsigned Mods = Results.second; ``` These change hasn't be testified since both errors cannot be reproduced in local
2025-06-23AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202)Matt Arsenault
2025-06-19AMDGPU/GFX12: Fix s_barrier_signal_isfirst for single-wave workgroups (#143634)Nicolai Hähnle
Barrier instructions are no-ops in single-wave workgroups. This includes s_barrier_signal_isfirst, which will leave SCC unmodified. Model this correctly (via an implicit use of SCC) and ensure SCC==1 before the barrier instruction (if the wave is the only one of the workgroup, then it is the first). --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-28Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)Justin Bogner
This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.
2025-05-19[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)Krzysztof Drewniak
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.
2025-05-05AMDGPU: Fix -Wextra (#138539)Matt Arsenault
Another stupid gcc warning. Ideally we would directly use the enum type, but subregister indexes are emitted as an anonymous enum. Fixes #125548
2025-05-05[AMDGPU] Support arbitrary types in amdgcn.dead (#134841)Diana Picus
Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.
2025-04-24[AMDGPU] Use variadic isa<>. NFC. (#137016)Jay Foad
2025-04-17Revert "[AMDGPU] Implement vop3p complex pattern optmization for gisel" ↵Shoreshen
(#136249) Reverts llvm/llvm-project#130234
2025-04-18[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234)Shoreshen
Seeking opportunities to optimize VOP3P instructions by altering opsel, opsel_hi, neg, neg_hi bits Tests differences: 1. fix op_sel_hi bit for inline constant: 1. `CodeGen/AMDGPU/packed-fp32.ll` 2. use neg bit to remove xor with 0x80008000 1. `CodeGen/AMDGPU/strict_fsub.f16.ll` 2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll` 3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll` 4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll` 5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll` 6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll` 7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll` 3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit 1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`
2025-04-02[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on ↵Juan Manuel Martinez Caamaño
gfx9/10 (#133055) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.
2025-03-29[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)Tim Gymnich
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future
2025-03-19[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041)Mariusz Sikora
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-19[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)Mariusz Sikora
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-17[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007)Mariusz Sikora
New intrinsics / instructions : int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64 Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-13[AMDGPU][True16][CodeGen] gisel true16 for ICMP (#128913)Brox Chen
GlobalIsel true16 selection for ICMP
2025-03-06[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036)Mariusz Sikora
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-04[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548)Mariusz Sikora