summaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
AgeCommit message (Collapse)Author
2025-10-24[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122)Mirko Brkušanin
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
2025-10-13[AMDGPU] Remove NoInfsFPMath uses (#163028)paperchalice
Only `ninf` should be used.
2025-09-24[AMDGPU] Add the support for 45-bit buffer resource (#159702)Shilei Tian
On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-09-16[AMDGPU] Fix codegen to emit COPY instead of S_MOV_B64 for aperture regs ↵Stanislav Mekhanoshin
(#158754)
2025-09-12[AMDGPU] Support lowering of cluster related instrinsics (#157978)Shilei Tian
Since many code are connected, this also changes how workgroup id is lowered. Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-04[AMDGPU][Legalizer] Avoid pack/unpack for G_FSHR (#156796)Anshil Gandhi
Scalarize G_FSHR only if the subtarget does not support V2S16 type.
2025-09-04[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)Pierre van Houtryve
- Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.
2025-08-28[AMDGPU] Remove `ApproxFuncFPMath` uses (#155578)paperchalice
One of options in `resetTargetOptions`, this removes `ApproxFuncFPMath` in AMDGPU part.
2025-08-19[AMDGPU] Narrow only on store to pow of 2 mem location (#150093)Tiger Ding
Lowering in GlobalISel for AMDGPU previously always narrows to i32 on truncating store regardless of mem size or scalar size, causing issues with types like i65 which is first extended to i128 then stored as i64 + i8 to i128 locations. Narrowing only on store to pow of 2 mem location ensures only narrowing to mem size near end of legalization. This LLVM defect was identified via the AMD Fuzzing project.
2025-08-11[AMDGPU] Per-subtarget DPP instruction classification (#153096)Stanislav Mekhanoshin
This is NFCI at this point.
2025-08-07[AMDGPU] Fix buffer addressing mode matching (#152584)Stanislav Mekhanoshin
Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.
2025-08-05[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218)Stanislav Mekhanoshin
2025-07-31[AMDGPU] Remove `UnsafeFPMath` uses (#151079)paperchalice
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-30[AMDGPU][GISel] Use buildObjectPtrOffset instead of buildPtrAdd (#150899)Fabian Ritter
This concerns offset computations for kernargs and RegBankLegalizeHelper::splitLoad, which should all be within the bounds of a memory object. See #150392 for the motivation for introducing the buildObjectPtrOffset function. For SWDEV-516125.
2025-07-29[AMDGPU] gfx1250 V_{MIN|MAX}_{I|U}64 opcodes (#151256)Stanislav Mekhanoshin
2025-07-29[AMDGPU] Support f64 atomics on gfx1250 (#151172)Changpeng Fang
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64 - DS_ADD_F64 Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>
2025-07-23[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291)Stanislav Mekhanoshin
2025-07-15[AMDGPU] gfx1250 64-bit relocations and fixups (#148951)Stanislav Mekhanoshin
2025-07-15AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)Changpeng Fang
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-08[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594)Stanislav Mekhanoshin
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2025-06-18AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 (#141903)Matt Arsenault
The hardware min/max follow the IR rules with IEEE mode disabled, so we can avoid the canonicalizes of the input. We lose the quieting of a signaling nan if both inputs are nans, but we only require that with strictfp.
2025-06-08[llvm] Compare std::optional<T> to values directly (NFC) (#143340)Kazu Hirata
This patch transforms: X && *X == Y to: X == Y where X is of std::optional<T>, and Y is of T or similar.
2025-06-06AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (#141883)Changpeng Fang
The latest asics support v_cvt_pk_f16_f32 instruction. However current implementation of vector fptrunc lowering fully scalarizes the vectors, and the scalar conversions may not always be combined to generate the packed one. We made v2f32 -> v2f16 legal in https://github.com/llvm/llvm-project/pull/139956. This work is an extension to handle wider vectors. Instead of fully scalarization, we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always been generated.
2025-05-28Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)Justin Bogner
This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.
2025-05-23[AMDGPU] Correct bitshift legality transformation for small vectors (#140940)zGoldthorpe
Fix for a bug found by the AMD fuzzing project. The legaliser would originally try to widen a small vector such as `<4 x i1>` to a single `i16` during the legalisation of bitshifts, as it was not originally written with consideration for vector operands. This patch simply adds a guard to prohibit this transformation and allow other legalisation transformations to step in.
2025-05-21AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900)Matt Arsenault
This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).
2025-05-08[GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574)Chinmay Deshpande
Current behavior crashes the compiler. This bug was found using the AMDGPU Fuzzing project. Fixes SWDEV-508816.
2025-05-07[AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (#131308)Pierre van Houtryve
It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result.
2025-05-05[AMDGPU] Support arbitrary types in amdgcn.dead (#134841)Diana Picus
Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.
2025-04-17AMDGPU: Fix the double rounding issue in v2f64 -> v2f16 conversion (#135659)Changpeng Fang
On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64 -> v2f16 Legal, we will generate the following sequence of instructions: v_cvt_f32_f64_e32 v1, s[6:7] v_cvt_f32_f64_e32 v2, s[4:5] v_cvt_pk_f16_f32 v1, v2, v1 It possibly returns imprecise results due to double rounding. This patch fixes the issue by not setting the conversion Legal. While we may still expect the above sequence of code when unsafe fpmath is set, I hope https://github.com/llvm/llvm-project/pull/134738 can address that performance concern. Fixes: SWDEV-523856
2025-04-16Reapply "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger ↵Vikram Hegde
vector types (#132358)" (#135758) reapply https://github.com/llvm/llvm-project/pull/132358, tests updated.
2025-04-14Revert "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger ↵Kazu Hirata
vector types (#132358)" This reverts commit 62ef10a0f62c668e1fa7e357f56052f3364544c5. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/132358
2025-04-15[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector ↵Vikram Hegde
types (#132358) Fixes https://github.com/llvm/llvm-project/issues/128650 Also adds few previously existing permlane64 tests which somehow got removed in between.
2025-03-29[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)Tim Gymnich
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future
2025-03-19[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041)Mariusz Sikora
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-19[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)Mariusz Sikora
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-18[NFC][AMDGPU][GlobalISel] Make LLTs constexpr (#131673)Tim Gymnich
- static const -> constexpr
2025-03-17[AMDGPU][GlobalISel] Enable vector reductions (#131413)Tim Gymnich
- Enable llvm vector reductions for AMDGPU. fixes https://github.com/llvm/llvm-project/issues/114816
2025-03-06[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036)Mariusz Sikora
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-04[AMDGPU] Don't store an immediate in a Register. NFCCraig Topper
2025-02-25[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233)Brox Chen
Enable gisel selection for uaddsat and usubsat in true16 flow This patch includes: 1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info for recognizing 16bit regclass id and bit width 2. uaddsat/usubsat test update
2025-02-25AMDGPU: Drop legacy r600.read.global.size intrinsics from amdgcn (#128700)Matt Arsenault
These ancient intrinsics were still consumed by the backend for libclc, which no longer uses them.
2025-02-20Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE ↵Matt Arsenault
fminimum(maximum)_num (#127711)" This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed. This is not a sound approach to dealing with this instruction change. The new behavior is a different opcode pair, not a modifier on the existing opcode.
2025-02-19AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE ↵Changpeng Fang
fminimum(maximum)_num (#127711) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding *_min_num_fXY/*_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.
2025-02-17AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487)Matt Arsenault
These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d0370872f28ec9965448f33db1b105addaf64ae.
2025-02-11[AMDGPU][NFC] Remove an unneeded return value. (#126739)Ivan Kosarev
And rename the function to disassociate it from the one where generating loading of the input value may actually fail.
2025-01-18[CodeGen] Use Register/MCRegister::isPhysical. NFCCraig Topper
2024-12-12[GlobalISel][NFC] Fix LLT Propagation (#119587)Tim Gymnich
Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.
2024-12-08Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no ↵Jon Chesterfield
longer special" Test case didn't run locally, investigating This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.
2024-12-08[amdgpu][lds] Simplify error diag path - lds variable names are no longer ↵Jon Chesterfield
special