| Age | Commit message (Collapse) | Author |
|
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.
I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
|
|
Fixes the assertion in #168523
This patch lifts the small, odd-sized integer to 8 bits, ensuring that
the following lowering code behaves correctly.
|
|
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
|
|
This PR improves the lowering of vectors of fp16 when using fpext.
Previously vectors of fp16 were scalarized leading to lots of extra
instructions. Now, vectors of fp16 will be lowered when extended to fp64
via the preexisting lowering logic for extends. To make use of the
existing logic, we need to add elements until we reach the next power of
2.
|
|
Move to the libcall impl based functions.
|
|
Cache extracted elements in lowerShuffleVector(). For example, when
lowering
```
%0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1
%2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N )
```
Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in
shufflemask. This is undesirable and bloats the code, especially for
larger vectors.
With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0`
and reuse it for all four result elements.
|
|
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
|
|
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
|
|
index. (#163416)
|
|
This commit adds the intrinsic `G_FMODF` to GMIR & enables its
translation, legalization and instruction selection in AArch64.
|
|
(#160683)
The insert point management is messy here. We probably should
have an insert point guard, and not have ths dest operand utilities
modify the insert point.
Fixes #159716
|
|
ucmp (#159889)
Same deal we use for determining ucmp vs scmp.
Using selects on platforms that like selects is better than using usubo.
Rename function to be more general fitting this new description.
|
|
### Summary
Try to implemente Lower G_SSUBE in LegalizerHelper::lower
|
|
Similar to the implementation in
https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s`
instructions follow IEEE 754-2019 semantics, and
`G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.
|
|
### Summary
Try to implemente Lower G_SADDE in LegalizerHelper::lower
|
|
(#157766)
This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.
|
|
Implementation follows the `ISD::ABDS` handling in
`RISCVTargetLowering`.
|
|
This patch implements direct N-way splitting for wide scalar shifts
instead
of recursive binary splitting. For example, an i512 G_SHL can now be
split
directly into 8 i64 operations rather than going through i256 -> i128 ->
i64.
The main motivation behind this is to alleviate (although not entirely
fix)
pathological compile time issues with huge types, like i4224. The
problem
we see is that the recursive splitting strategy combined with our messy
artifact combiner ends up with terribly long compiles as tons of
intermediate
artifacts are generated, and then attempted to be combined ad-nauseum.
Going directly from the large shifts to the destination types
short-circuits
a lot of these issues, but it's still an abuse of the backend and
front-ends
should never be doing this sort of thing.
|
|
RISCV does not provide a native atomic subtract instruction, so this
patch lowers `G_ATOMICRMW_SUB` by negating the RHS value and performing
an atomic add. The legalization rules in `RISCVLegalizerInfo` are
updated accordingly, with libcall fallbacks when `StdExtA` is not
available, and intrinsic legalization is extended to support
`riscv_masked_atomicrmw_sub`.
For example, lowering
`%1 = atomicrmw sub ptr %a, i32 1 seq_cst`
on riscv32a produces:
```
li a1, -1
amoadd.w.aqrl a0, a1, (a0)
```
On riscv64a, where the RHS type is narrower than XLEN, it currently
produces:
```
li a1, 1
neg a1, a1
amoadd.w.aqrl a0, a1, (a0)
```
There is still a constant-folding or InstConbiner gap. For instance,
lowering
```
%b = sub i32 %x, %y
%1 = atomicrmw sub ptr %a, i32 %b seq_cst
```
generates:
```
subw a1, a1, a2
neg a1, a1
amoadd.w.aqrl a0, a1, (a0)
```
This sequence could be optimized further to eliminate the redundant neg.
Addressing this may require improvements in the Combiner or Peephole
Optimizer in future work.
---------
Co-authored-by: Kane Wang <kanewang95@foxmail.com>
|
|
(#153274)
This Adds scalarization handling for fewer vector elements of insert and
extract, so that i128 and fp128 types can be handled if they make it
past combines. Inserts are unmerged with the inserted element added to
the remerged vector, extracts are unmerged then the correct element is
copied into the destination. With a non-constant vector the usual stack
lowering is used.
|
|
For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into
two halfs, the scalar i128 remainder was causing problems, causing a
crash with invalid vector types. This makes sure they are handled
correctly in fewerElementsBitcast.
|
|
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.
This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.
Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.
Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.
For SWDEV-516125.
|
|
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen
pipeline.
|
|
This change updates legalizer to allow lowering volatile memcpy family
as a target might rely on lowering to legalize them.
|
|
|
|
This is a GISel equivalent of #130665, preventing a double-rounding
issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the
changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP.
Because it is legal to convert i64->f16 itofp without double-rounding,
but not a fpround f64->f16, that variant is lowered to build the two
extends.
|
|
LegalizerHelper::lowerMemCpyFamily only execpts G_MEMCPY, G_MEMMOVE, and
G_MMSET.
|
|
|
|
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.
This avoids a special case 1:N mapping for libcall implementations.
|
|
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
|
|
If the pointer is aligned to more than the size of the vector, we can
widen the load up to next power of 2 size, as SDAG performs.
Some of the v3 tests are currently worse - those should be addressed in
other issues.
|
|
This helps keep the legalizer output easier to read, splitting each
instructions legalization into a separate block.
|
|
|
|
This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.
Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
|
|
GlobalISel was previously inefficient in handling bitreverses of vector
types. This deals with i16, i32, i64 vector types and converts them into
i8 bitreverses and rev instructions.
|
|
Handles bitreverse for vector types which were previously falling back
onto Selection DAG. Includes 8-bit element vectors greater than 128 bits
and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x
i8>.
|
|
This is a reland of #138434 except that:
- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
have been dropped because they caused a test failure under asan, and
- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
been improved with structured bindings.
|
|
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.
Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
|
|
|
|
|
|
non-byte-sized types (#136739)
LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized
types, because this would require (un)packing of bits across byte
boundaries.
Precommit tests: #134904
|
|
This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
|
|
|
|
|
|
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
|
|
|
|
Similar to other operations, s8, s16 s32 and s64 vector elements are
clamped to legal vector sizes, odd number of elements are widened to the
next power-2 and s128 is scalarized.
This helps legalize cttz as well as ctpop.
|
|
Similar to other operations, s8, s16 and s32 vector elements are clamped
to legal vector sizes, but in this case s64 are scalarized to use the
gpr instructions. This allows vector types to split as opposed to
scalarizing.
|
|
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
|
|
|