| Age | Commit message (Collapse) | Author |
|
|
|
Avoid copying machine operands.
Signed-off-by: John Lu <John.Lu@amd.com>
|
|
Currently MachineVerifier is missing verifying early-clobber operand
constraint.
The only other machine operand constraint - TiedTo is already verified.
|
|
Follow-up patch to address the comments in
https://github.com/llvm/llvm-project/pull/165355.
|
|
When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or
S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do
not have the S_CMP_LG_U64 instruction. Work around this issue by using
S_OR_B64 instead.
|
|
Add support for no-return variants of image atomic operations
(e.g. IMAGE_ATOMIC_ADD_NORTN, IMAGE_ATOMIC_CMPSWAP_NORTN).
These variants are generated when the return value of the intrinsic is
unused, allowing the backend to select no return type instructions.
|
|
The primary purpose of this commit is to enable marking loads to LDS
(global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux
as with normal buffer loads) and to ensure that their !nontemporal
annotations translate to appropriate settings of te cache control bits.
However, in the process of implementing this feature, we also fixed
- Incorrect handling of buffer loads to LDS in GlobalISel
- Updating the handling of volatile on buffers in SIMemoryLegalizer:
previously, the mapping of address spaces would cause volatile on buffer
loads to be silently dropped on at least gfx10.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
|
|
|
|
s_barrier_leave implicitly defines $scc
and does not use imm that represents type of barrier,
isel pattern ignores imm operand from llvm intrinsic.
Test if SIInsertWaitcnts tracks this scc write.
|
|
|
|
|
|
Let's do the lowering of non-split into split barriers in a new IR pass,
AMDGPULowerIntrinsics. That way, there is no code duplication between
SelectionDAG and GlobalISel. This simplifies some upcoming extensions to
the code.
|
|
|
|
For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, ISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.
This patch only selects flat instructions with immediate offsets from address
computations with the inbounds flag: If the address computation does not leave
the bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.
Relevant tests are in fold-gep-offset.ll.
Analogous to #132353 for SDAG (which is not yet in a mergeable state, its
progress is currently blocked by #146076).
Fixes SWDEV-516125 for GISel.
|
|
It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs
and VOP3PModsNegAbs. We can use SDNodeXForm instead.
|
|
GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this
patch essentially revert the work from
https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.
|
|
For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a
byte
select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1]
and [1, 1] should map
to byte 0, 1, 2 and 3, respectively.
For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select.
OPSEL[2:3]
specify which byte of the destination to write to. Note that the order
of the bits is different
from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0],
[0, 1], [1, 0]
and [1, 1] map to byte 0, 1, 2 and 3, respectively.
Fixes: SWDEV-544901
|
|
|
|
|
|
|
|
|
|
We have a choice to use a scalar or vector prefetch for an uniform
pointer. Since we do not have scalar stores our scalar cache is
practically readonly. The rw argument of the prefetch intrinsic is
used to force vector operation even for an uniform case. On GFX12
scalar prefetch will be used anyway, it is still useful but it will
only bring data to L2.
|
|
|
|
Also switches immediate offset to signed for the subtarget.
|
|
|
|
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.
Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.
At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.
This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
a SReg_1 representing `%active`, which needs to be passed into
SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
special handling of %active. Since the return may be in a different
basic block, it's difficult to add the virtual reg for %active to
SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
marks any used VGPRs as WWM registers, which are then spilled and
restored with the usual logic.
Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
|
|
The last parameter of these functions used to be `Signed`, and it looks
like a few calls weren't updated when that was changed to `FlatVariant`.
Effectively, the functions were called with `FlatVariant=SALU` due to
integer promotions, which doesn't make any sense.
|
|
(#146984)
Reverts llvm/llvm-project#146982
Fix up reported building error for
https://github.com/llvm/llvm-project/pull/136262 with:
```
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
4566 | }
| ^
1 error generated.
ninja: build stopped: subcommand failed.
```
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
gisel" (#146982)
Reverts llvm/llvm-project#136262
Due to building error:
```
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
4566 | }
| ^
1 error generated.
ninja: build stopped: subcommand failed.
```
|
|
(#136262)
This is a fix up for patch
https://github.com/llvm/llvm-project/pull/130234, which is reverted in
https://github.com/llvm/llvm-project/pull/136249
The main reason of building failure are:
1.
```
/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:
In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand*,
SrcStatus> > getSrcStats(const llvm::MachineOperand*, const
llvm::MachineRegisterInfo&, searchOptions, int)’:
/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669:
error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to
‘SmallVector<[...],3>’
4669 | return Statlist;
```
2.
```
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1:
error: non-void function does not return a value in all control paths
[-Werror,-Wreturn-type]
4554 | }
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39:
error: overlapping comparisons always evaluate to true
[-Werror,-Wtautological-overlap-compare]
4644 | (Stat >= SrcStatus::NEG_START || Stat <= SrcStatus::NEG_END)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4893 | [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); },
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9:
note: 'Op' declared here
4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4894 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13:
note: 'Mods' declared here
4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4899 | [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); },
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9:
note: 'Op' declared here
4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4900 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
| ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13:
note: 'Mods' declared here
4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
| ^
6 errors generated.
```
Both error cannot be reproduced at my local machine, the fix applied
are:
1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function
`getSrcStats` replace
```
SmallVector<std::pair<const MachineOperand *, SrcStatus>, 4> Statlist;
```
with
```
SmallVector<std::pair<const MachineOperand *, SrcStatus>> Statlist;
```
2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function
`AMDGPUInstructionSelector::selectVOP3PRetHelper` replace
```
auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
```
with
```
auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT);
const MachineOperand *Op = Results.first;
unsigned Mods = Results.second;
```
These change hasn't be testified since both errors cannot be reproduced
in local
|
|
|
|
Barrier instructions are no-ops in single-wave workgroups. This includes
s_barrier_signal_isfirst, which will leave SCC unmodified.
Model this correctly (via an implicit use of SCC) and ensure SCC==1
before the barrier instruction (if the wave is the only one of the
workgroup, then it is the first).
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.
We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
|
|
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.
This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).
It also plumbs the intrinsic through to clang.
|
|
Another stupid gcc warning. Ideally we would directly use the enum type,
but subregister indexes are emitted as an anonymous enum.
Fixes #125548
|
|
Legalize the amdgcn.dead intrinsic to work with types other than i32. It
still generates IMPLICIT_DEFs.
Remove some of the previous code for selecting/reg bank mapping it for
32-bit types, since everything is done in the legalizer now.
|
|
|
|
(#136249)
Reverts llvm/llvm-project#130234
|
|
Seeking opportunities to optimize VOP3P instructions by altering opsel,
opsel_hi, neg, neg_hi bits
Tests differences:
1. fix op_sel_hi bit for inline constant:
1. `CodeGen/AMDGPU/packed-fp32.ll`
2. use neg bit to remove xor with 0x80008000
1. `CodeGen/AMDGPU/strict_fsub.f16.ll`
2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll`
3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll`
4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll`
5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll`
6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll`
7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll`
3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit
1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`
|
|
gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.
This feature is only available on gfx9/10.
A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
|
|
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
|
|
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
|
|
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and
image_bvh_dual_intersect_ray machine instruction.
- Add llvm_v10i32_ty and llvm_v10f32_ty
---------
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
|
|
New intrinsics / instructions :
int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
|
|
GlobalIsel true16 selection for ICMP
|
|
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
|
|
|