| Age | Commit message (Collapse) | Author |
|
(#168661)"
This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.
This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.
|
|
This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.
|
|
While I am at it, this patch uses const l-value references for
std::shared_ptr. We don't need to increment the reference count by
passing std::shared_ptr by value.
Identified with llvm-use-ranges.
|
|
Function &F is the more standard abbreviation (~4000 uses in llvm versus
~300 uses).
|
|
truncation exists (#169022)
Fixes #169017
|
|
Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
|
|
We already know we're looking at BITREVERSE, we can match on the source
operand.
|
|
(#168786)
This reverts commit 6d5f87fc4284c4c22512778afaf7f2ba9326ba7b.
Previously this failed due to treating the unknown MachineMemOperand
value as known uniform.
|
|
To make life easier for future contributors. Note that formatting
changes are due to git clang-format on the touched whitespace-error
lines.
|
|
In functions that have been seriously deformed during optimisation,
there can be call instructions with line-zero immediately after frame
setup (see C reproducer in the test added). Our previous algorithms for
prologue_end ignored these, meaning someone entering a function at
prologue_end would break-in after a function call had completed. Prefer
instead to place prologue_end and the function scope-line on the line
zero call: this isn't false (it's the first meaningful instruction of the
function) and is approximately true. Given a less than ideal function,
this is an OK solution.
|
|
(#168777)
This prevents it from being optimized out in non-asserts builds.
Update X86 test to remove REQUIRES: asserts and check for LLVM ERROR.
Add FileCheck to RISC-V test and remove UNSUPPORTED.
This is the more complete fix for #168772 and #168525.
|
|
|
|
Attempt to only define used subregisters when creating IMPLICIT_DEF fix
ups for live interval subranges. This avoids the appearance at the MIR
level of entire (wide) registers becoming live rather than relying only
on transient LiveIntervals dead definitions for unused subregisters.
|
|
|
|
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
|
|
|
|
matchCombineBuildUnmerge (#168692)
This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.
I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.
Fixes #168495
|
|
Fixes #167710
|
|
This patch is just a small cleanup that unifies the various spots that
add a DWARF expression to the output.
|
|
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.
I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
|
|
If vector-unaligned-mem support is not enabled, we should not generate
loads/stores that are not aligned to their element size.
We already do this for non-VP vector loads/stores.
This code has been in our downstream for about a year and a half after
finding the vectorizer generating misaligned loads/stores. I don't think
that is unique to our downstream.
Doing this for masked vp.load/store requires widening the mask as well
which is harder to do.
NOTE: Because we have to scale the VL, this will introduce additional
vsetvli and the VL optimizer will not be effective at optimizing any
arithmetic that is consumed by the store.
|
|
Fixes the assertion in #168523
This patch lifts the small, odd-sized integer to 8 bits, ensuring that
the following lowering code behaves correctly.
|
|
non-vectors (#168081)
Updates the demanded elements before recursing through copies in case
the type of the source register changes from a non-vector register to a
vector register.
Fixes #167842.
|
|
- Detect cases where LHS & RHS values will not cause overflow
(when the Hi halfs are zero).
|
|
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
|
|
This generates more optimal codegen when using partial reductions with
predication.
```
partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1))
-> partial_reduce_*mla(acc, sel(p, a, splat(0)), b)
partial.reduce.*mla(acc, sel(p, *ext(op), splat(0)), splat(1))
-> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1)))
```
|
|
20a22a45e96bc94c3a8295cccc9031bd87552725 was supposed to fully remove
these, but left around the functionality to actually compute them and a
unittest that ensured they worked. These are not development features in
the sense of features used in development mode, but experimental
features that have been superseded by MIR2Vec.
|
|
undef) (#165539)
This PR adds a new combine to the `post-legalizer-combiner` pass. The
new combine checks for vectors being unmerged and subsequently padded
with `G_IMPLICIT_DEF` values by building a new vector. If such a case is
found, the vector being unmerged is instead just concatenated with a
`G_IMPLICIT_DEF` that is as wide as the vector being unmerged.
This removes unnecessary `mov` instructions in a few places.
|
|
This prevents a machine verifier error, where it "Expected implicit
register after groups".
Fixes #158661
|
|
- This patch detects cycles by phis and bails out if one is found.
- It prevents to violate DAG restrictions.
Abort pipelining in the below case
%1 = phi i32 [ %a, %entry ], [ %3, %loop ]
%2 = phi i32 [ %a, %entry ], [ %1, %loop ]
%3 = phi i32 [ %b, %entry ], [ %2, %loop ]
---------
Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
|
|
|
|
target-specific constant offset. (#165591)
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
|
|
Reverts llvm/llvm-project#167909
|
|
EnableFSDiscriminator is declared in DebugInfoMetadata.h.
Identified with readability-redundant-declaration.
|
|
|
|
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.
`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.
Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.
The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.
Pull Request: https://github.com/llvm/llvm-project/pull/167943
|
|
Teach `SDNodeInfoEmitter` TableGen backend to process `SDTypeConstraint`
records and emit tables for them. The tables are used by
`SDNodeInfo::verifyNode()` to validate a node being created.
This PR only adds validation code for `SDTCisVT` and `SDTCVecEltisVT`
constraints to keep it smaller.
Pull Request: https://github.com/llvm/llvm-project/pull/150125
|
|
LOOP_DEPENDENCE_MASK (#168221)
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.
Fixes: #168227
|
|
sorry, this was my mistake
|
|
I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.
|
|
|
|
|
|
This PR improves the lowering of vectors of fp16 when using fpext.
Previously vectors of fp16 were scalarized leading to lots of extra
instructions. Now, vectors of fp16 will be lowered when extended to fp64
via the preexisting lowering logic for extends. To make use of the
existing logic, we need to add elements until we reach the next power of
2.
|
|
As in title. Without this, fpext behaves in selectionDAG as always
having no fast-math flags.
|
|
Not all RegisterId values are registers, so Id is a more appropriate
name.
Use asMCReg() in some places that assumed it was a register.
|
|
To avoid scaling offsets back and forth. This is also what SelectionDAG
equivalent (ComputeValueVTs) does, and will allow to reuse
ComputeValueTypes with less effort.
|
|
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.
Also fix other build error with gcc bots.
|
|
This fixes the -fveclib flag getting lost on its way to the backend.
Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.
Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.
RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes
are needed to thread that analysis through codegen. Ideally this would come
from an IR module flag, and nothing would be in TargetOptions. For now, it's
better for all of these sorts of controls to be consistent.
|
|
RegisterId can represent a physical register, a MCRegUnit, or
an index into a side structure that stores register masks. These 3
types were encoded by using the physical reg, stack slot, and
virtual register encoding partitions from the Register class.
This encoding scheme alias wasn't well contained so
Register::index2StackSlot and Register::stackSlotIndex appeared
in multiple places.
This patch gives RegisterRef its own encoding defines and separates
it from Register.
I've removed the generic idx() method in favor of getAsMCReg(),
getAsMCRegUnit(), and getMaskIdx() for some degree of type safety.
Some places used the RegisterId field of RegisterRef directly as a
register. Those have been updated to use getAsMCReg.
Some special cases for RegisterId 0 have been removed as it can
be treated like a MCRegister by existing code.
I think I want to rename the Reg field of RegisterRef to Id, but
I'll do that in another patch.
Additionally, callers of the RegisterRef constructor need to be
audited for implicit conversions from Register/MCRegister
to unsigned.
|
|
(#161501)
InlineAsmLowering rejected inline assembly with memory reference inputs
if the values passed to the inline asm weren't pointers. The DAG
lowering however handled them just fine.
This patch updates InlineAsmLowering to store such values on the stack,
and then use the stack pointer as the "indirect" version of the operand.
|