summaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
AgeCommit message (Collapse)Author
2025-11-18[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584)Craig Topper
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The lowering should be based on the element width. I noticed this by inspection. No tests in tree are currently affected, but I thought it would be good to fix so someone doesn't have to debug it in the future.
2025-11-18[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (#168559)Hongyu Chen
Fixes the assertion in #168523 This patch lifts the small, odd-sized integer to 8 bits, ensuring that the following lowering code behaves correctly.
2025-11-18[AArch64][GlobalISel] Add better basic legalization for llround. (#168427)David Green
This adds handling for f16 and f128 lround/llround under LP64 targets, promoting the f16 where needed and using a libcall for f128. This codegen is now identical to the selection dag version.
2025-11-14[AArch64][GlobalISel] Improve lowering of vector fp16 fpext (#165554)Ryan Cowan
This PR improves the lowering of vectors of fp16 when using fpext. Previously vectors of fp16 were scalarized leading to lots of extra instructions. Now, vectors of fp16 will be lowered when extended to fp64 via the preexisting lowering logic for extends. To make use of the existing logic, we need to add elements until we reach the next power of 2.
2025-11-05DAG: Avoid some libcall string name comparisons (#166321)Matt Arsenault
Move to the libcall impl based functions.
2025-10-25[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893)Yunqing Yu
Cache extracted elements in lowerShuffleVector(). For example, when lowering ``` %0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1 %2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N ) ``` Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in shufflemask. This is undesirable and bloats the code, especially for larger vectors. With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0` and reuse it for all four result elements.
2025-10-24[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122)Mirko Brkušanin
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
2025-10-24[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)David Green
I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-15[GISel] Use G_ZEXT when widening G_EXTRACT_VECTOR_ELT/G_INSERT_VECTOR_ELT ↵Craig Topper
index. (#163416)
2025-10-02[AArch64][GlobalISel] Add `G_FMODF` instruction (#160061)Ryan Cowan
This commit adds the intrinsic `G_FMODF` to GMIR & enables its translation, legalization and instruction selection in AArch64.
2025-09-25GlobalISel: Adjust insert point when expanding G_[SU]DIVREMMatt Arsenault
(#160683) The insert point management is messy here. We probably should have an insert point guard, and not have ths dest operand utilities modify the insert point. Fixes #159716
2025-09-25[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ↵AZero13
ucmp (#159889) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.
2025-09-18[RISCV][GISel] Lower G_SSUBE (#157855)woruyu
### Summary Try to implemente Lower G_SSUBE in LegalizerHelper::lower
2025-09-11[RISCV][GlobalIsel] Lower G_FMINIMUMNUM, G_FMAXIMUMNUM (#157295)Shaoce SUN
Similar to the implementation in https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s` instructions follow IEEE 754-2019 semantics, and `G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.
2025-09-11[RISCV][GISel] Lower G_SADDE (#156865)woruyu
### Summary Try to implemente Lower G_SADDE in LegalizerHelper::lower
2025-09-10[RISCV][GISel] Widen G_ABDS/G_ABDU before lowering when Zbb is enabled. ↵Craig Topper
(#157766) This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.
2025-09-05[RISCV][GISel] Lower G_ABDS and G_ABDU (#155888)Shaoce SUN
Implementation follows the `ISD::ABDS` handling in `RISCVTargetLowering`.
2025-09-03[GlobalISel] Add multi-way splitting support for wide scalar shifts. (#155353)Amara Emerson
This patch implements direct N-way splitting for wide scalar shifts instead of recursive binary splitting. For example, an i512 G_SHL can now be split directly into 8 i64 operations rather than going through i256 -> i128 -> i64. The main motivation behind this is to alleviate (although not entirely fix) pathological compile time issues with huge types, like i4224. The problem we see is that the recursive splitting strategy combined with our messy artifact combiner ends up with terribly long compiles as tons of intermediate artifacts are generated, and then attempted to be combined ad-nauseum. Going directly from the large shifts to the destination types short-circuits a lot of these issues, but it's still an abuse of the backend and front-ends should never be doing this sort of thing.
2025-09-03[RISCV][GlobalISel] Lower G_ATOMICRMW_SUB via G_ATOMICRMW_ADD (#155972)Kane Wang
RISCV does not provide a native atomic subtract instruction, so this patch lowers `G_ATOMICRMW_SUB` by negating the RHS value and performing an atomic add. The legalization rules in `RISCVLegalizerInfo` are updated accordingly, with libcall fallbacks when `StdExtA` is not available, and intrinsic legalization is extended to support `riscv_masked_atomicrmw_sub`. For example, lowering `%1 = atomicrmw sub ptr %a, i32 1 seq_cst` on riscv32a produces: ``` li a1, -1 amoadd.w.aqrl a0, a1, (a0) ``` On riscv64a, where the RHS type is narrower than XLEN, it currently produces: ``` li a1, 1 neg a1, a1 amoadd.w.aqrl a0, a1, (a0) ``` There is still a constant-folding or InstConbiner gap. For instance, lowering ``` %b = sub i32 %x, %y %1 = atomicrmw sub ptr %a, i32 %b seq_cst ``` generates: ``` subw a1, a1, a2 neg a1, a1 amoadd.w.aqrl a0, a1, (a0) ``` This sequence could be optimized further to eliminate the redundant neg. Addressing this may require improvements in the Combiner or Peephole Optimizer in future work. --------- Co-authored-by: Kane Wang <kanewang95@foxmail.com>
2025-08-27[GlobalISel] Add support for scalarizing vector insert and extract elements ↵David Green
(#153274) This Adds scalarization handling for fewer vector elements of insert and extract, so that i128 and fp128 types can be handled if they make it past combines. Inserts are unmerged with the inserted element added to the remerged vector, extracts are unmerged then the correct element is copied into the destination. With a non-constant vector the usual stack lowering is used.
2025-08-13[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364)David Green
For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into two halfs, the scalar i128 remainder was causing problems, causing a crash with invalid vector types. This makes sure they are handled correctly in fewerElementsBitcast.
2025-07-29[GISel] Introduce MachineIRBuilder::(build|materialize)ObjectPtrOffset (#150392)Fabian Ritter
These functions are for building G_PTR_ADDs when we know that the base pointer and the result are both valid pointers into (or just after) the same object. They are similar to SelectionDAG::getObjectPtrOffset. This PR also changes call sites of the generic (build|materialize)PtrAdd functions that implement pointer arithmetic to split large memory accesses to the new functions. Since memory accesses have to fit into an object in memory, pointer arithmetic to an offset into a large memory access also yields an address in that object. Currently, these (build|materialize)ObjectPtrOffset functions only add "nuw" to the generated G_PTR_ADD, but I intend to introduce an "inbounds" MIFlag in a later PR (analogous to a concurrent effort in SDAG: #131862, related: #140017, #141725) that will also be set in the (build|materialize)ObjectPtrOffset functions. Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where offsets are now folded into scratch instructions, and cases where the behavior of the check regeneration script changed, resulting, e.g., in better checks for "nusw G_PTR_ADD" instructions, matched empty lines, and the use of "CHECK-NEXT" in MIPS tests. For SWDEV-516125.
2025-07-29[GlobalISel] Remove `UnsafeFPMath` references (#146319)paperchalice
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen pipeline.
2025-07-22[GlobalISel] Allow Legalizer to lower volatile memcpy family. (#145997)Pete Chou
This change updates legalizer to allow lowering volatile memcpy family as a target might rely on lowering to legalize them.
2025-07-11[NFC] Correct typo: invertion -> inversion (#147995)Fraser Cormack
2025-07-05[AArch64][GlobalISel] Fix lowering of i64->f32 itofp. (#132703)David Green
This is a GISel equivalent of #130665, preventing a double-rounding issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP. Because it is legal to convert i64->f16 itofp without double-rounding, but not a fpround f64->f16, that variant is lowered to build the two extends.
2025-06-26[GlobalISel] Remove dead code. (NFC) (#145811)Pete Chou
LegalizerHelper::lowerMemCpyFamily only execpts G_MEMCPY, G_MEMMOVE, and G_MMSET.
2025-06-25[X86][GlobalISel] Enable SINCOS with libcall mapping (#142438)JaydeepChauhan14
2025-06-23PowerPC: Stop reporting memcpy as an alias of memmove on AIX (#143836)Matt Arsenault
Instead of reporting ___memmove as an implementation of memcpy, make it unavailable and let the lowering logic consider memmove as a fallback path. This avoids a special case 1:N mapping for libcall implementations.
2025-06-23CodeGen: Emit error if getRegisterByName fails (#145194)Matt Arsenault
This avoids using report_fatal_error and standardizes the error message in a subset of the error conditions.
2025-06-21[GlobalISel] Widen vector loads from aligned ptrs (#144309)David Green
If the pointer is aligned to more than the size of the vector, we can widen the load up to next power of 2 size, as SDAG performs. Some of the v3 tests are currently worse - those should be addressed in other issues.
2025-06-15[GlobalISel] Split Legalizer debug ouput into paragraphs. NFC (#143427)David Green
This helps keep the legalizer output easier to read, splitting each instructions legalization into a separate block.
2025-06-05[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959)Stanley Gambarin
2025-05-21AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900)Matt Arsenault
This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).
2025-05-13[GISel][AArch64] Added more efficient lowering of Bitreverse (#139233)jyli0116
GlobalISel was previously inefficient in handling bitreverses of vector types. This deals with i16, i32, i64 vector types and converts them into i8 bitreverses and rev instructions.
2025-05-06[GlobalISel][AArch64] Handles bitreverse to prevent falling back (#138150)jyli0116
Handles bitreverse for vector types which were previously falling back onto Selection DAG. Includes 8-bit element vectors greater than 128 bits and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x i8>.
2025-05-05[CodeGen] Use range-based for loops (NFC) (#138488)Kazu Hirata
This is a reland of #138434 except that: - the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp have been dropped because they caused a test failure under asan, and - the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have been improved with structured bindings.
2025-05-04Revert "[CodeGen] Use range-based for loops (NFC) (#138434)"Nico Weber
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b. Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs (sanitizer builds; macOS; possibly more), see comments on https://github.com/llvm/llvm-project/pull/138434
2025-05-04[CodeGen] Remove unused local variables (NFC) (#138441)Kazu Hirata
2025-05-04[CodeGen] Use range-based for loops (NFC) (#138434)Kazu Hirata
2025-04-29[GlobalISel] Fix miscompile when narrowing vector loads/stores to ↵Tobias Stadler
non-byte-sized types (#136739) LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized types, because this would require (un)packing of bits across byte boundaries. Precommit tests: #134904
2025-04-16[llvm] Use llvm::append_range (NFC) (#136066)Kazu Hirata
This patch replaces: llvm::copy(Src, std::back_inserter(Dst)); with: llvm::append_range(Dst, Src); for breavity. One side benefit is that llvm::append_range eventually calls llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
2025-04-13[CodeGen] Use llvm::append_range (NFC) (#135567)Kazu Hirata
2025-03-29[CodeGen] Use llvm::append_range (NFC) (#133603)Kazu Hirata
2025-03-29[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)Tim Gymnich
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future
2025-03-22[llvm] Construct SmallVector with ArrayRef (NFC) (#132560)Kazu Hirata
2025-03-20[AArch64][GlobalISel] Legalize more CTPOP vector types. (#131513)David Green
Similar to other operations, s8, s16 s32 and s64 vector elements are clamped to legal vector sizes, odd number of elements are widened to the next power-2 and s128 is scalarized. This helps legalize cttz as well as ctpop.
2025-03-19[AArch64][GlobalISel] Clean up CTLZ vector type legalization. (#131514)David Green
Similar to other operations, s8, s16 and s32 vector elements are clamped to legal vector sizes, but in this case s64 are scalarized to use the gpr instructions. This allows vector types to split as opposed to scalarizing.
2025-03-18[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)David Green
From #106446, this adds a variant of getVectorIdxTy that returns an LLT. Many uses only look at the width, so a getVectorIdxWidth was added as the common base.
2025-03-02[GlobalISel] Use Register. NFCCraig Topper