| Age | Commit message (Collapse) | Author |
|
This fixes verifier errors in this test after earlier passes start
introducing more subregister uses. This probably isn't adequately
tested but I know nothing about this pass.
|
|
This is directly available in TargetInstrInfo
|
|
|
|
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.
|
|
Both conceptually belong to the same subtarget, so it should not
be necessary to pass in the context TargetRegisterInfo to any
TargetInstrInfo member. Add this reference so those superfluous
arguments can be removed.
Most targets placed their TargetRegisterInfo as a member
in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo,
so unify all targets to look the same.
|
|
In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.
We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.
https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.
However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.
With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.
For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:
- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
|
|
.. to isReMaterializableImpl. The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.
Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
|
|
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.
The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.
This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
|
|
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
|
|
getPointerRegClass is a layering violation. Its primary purpose
is to determine how to interpret an MCInstrDesc's operands RegClass
fields. This should be context free, and only depend on the subtarget.
The model of this is also wrong, since this should be an
instruction / operand specific property, not a global pointer class.
Remove the the function argument to help stage removal of this hook
and avoid introducing any new obstacles to replacing it.
The remaining uses of the function were to get the subtarget, which
TargetRegisterInfo already belongs to. A few targets needed new
subtarget derived properties copied there.
|
|
canCombine (#157210)
We already have the register number from the user operand. Use it
instead of assuming it must be operand 0 of the producing instruction.
Fixes #157118
|
|
This fixes a latent miscompile. To understand why the flag can't be
preserved, consider the case where a0=0, a1=0, a2=-1, and s3=-1.
|
|
getOpcode() already returns unsigned.
|
|
This reduces the amount of boilerplate required when adding a new
field to MIMetadata and reduces the chance of bugs like the
one I fixed in TargetInstrInfo::reassociateOps.
Reviewers: arsenm, nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/133535
|
|
RegallocBase::cleanupFailedVReg hacks up the state of the liveness in
order to facilitate producing valid IR. During this process, we may end
up producing undef copies.
If the destination of these copies is a spill candidate, we will attempt
to fold the source register when issuing the spill. The undef of the
source is not propagated to storeRegToStackSlot , thus we end up
dropping the undef, issuing a spill, and producing an illegal liveness
state.
This checks for undef copies, and, if found, inserts a kill instead of
spill.
|
|
It seems `subsituteRegister` checks `FromReg == ToReg` instead of
`TRI->isSubRegisterEq`.
This PR simply reverts the original PR
(https://github.com/llvm/llvm-project/pull/131361) to its initial
implementation (without using `subsituteRegister`).
Not sure whether it is a desired fix (and by no means that I am an
expert on LLVM backend), but it does fix a numeric error on our internal
workload.
Original author: @sdesmalen-arm
|
|
When the RegisterCoalescer adds an implicit-def when coalescing
a SUBREG_TO_REG (#123632), this causes issues when removing other
COPY nodes by commuting the instruction because it doesn't take
the implicit-def into consideration. This PR fixes that.
|
|
(#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
|
|
|
|
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
|
|
This just moves the x86 implementation into generic code since it appears
to be suitable for any target. The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API. This just removes one piece to implement in the
process of enabling foldMemoryOperand.
|
|
instructions into a tree (#132728)
This pass is designed to increase ILP by performing accumulation into
multiple registers. It currently supports only the S/UABAL accumulation
instruction, but can be extended to support additional instructions.
Reland of #126060 which was reverted due to a conflict with #131272.
|
|
instructions into a tree to increase ILP (#126060) (#132607)
This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.
|
|
instructions into a tree to increase ILP (#126060)
This pattern shows up often in media libraries. The optimization should only
kick in for O3. Currently only supports a single family of accumulation
instructions, but can easily be expanded to support additional
instructions in the future.
|
|
interface. NFC (#131272)
|
|
Fixes the "use after poison" issue introduced by #121516 (see
<https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>).
The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.
The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
|
|
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
|
|
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
|
|
Avoid getting this from the MachineFunction
|
|
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
|
|
This opens up a door for reusing reassociation optimizations on
target-specific binary operations with non-standard operand list.
This is effectively a NFC.
|
|
D33412/D33413 introduced this to support a clang pragma to set section
names for a symbol depending on if it would be placed in
bss/data/rodata/text, which may not be known until the backend. However,
for text we know that only functions will go there, so just directly set
the section in clang instead of going through a completely separate
attribute.
Autoupgrade the "implicit-section-name" attribute to directly setting
the section on a Fuction.
|
|
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
|
|
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
|
|
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
|
|
Follow up on a post-commit review of 9468de4 (TargetInstrInfo: make
getOperandLatency return optional (NFC)) by Bjorn Pettersson to fix a
couple of things that are not NFC:
- std::optional<T>::operator<= returns true if the first operand is a
std::nullopt and second operand is T. Fix a couple of places where we
assumed it would return false.
- In TargetSchedule, computeInstrCost could take another codepath,
returning InstrLatency instead of DefaultDefLatency. Fix one instance
not accounting for this behavior.
|
|
In commit b05335989239 ("[X86InstrInfo] support memfold on spillable
inline asm
(#70832)"), I had a last minute fix to update the memoperands. I
originally
did this in the parent foldInlineAsmMemOperand call, updated the mir
test via
update_mir_test_checks.py, but then decided to move it to the child call
of
foldInlineAsmMemOperand.
But I forgot to rerun update_mir_test_checks.py. That last minute change
caused
the same memoperand to be added twice when recursion occurred (for tied
operands). I happened to get lucky that trailing content omitted from
the
CHECK line doesn't result in test failure.
But rerunning update_mir_test_checks.py on the mir test added in that
commit
produces updated output. This is resulting in updates to the test that:
1. conflate additions to the test in child commits with simply updating
the
test as it should have been when first committed.
2. look wrong because the same memoperand is specified twice (we don't
deduplicate memoperands when added). Example:
INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0)
Fix the bug, so that in child commits, we don't have additional
unrelated test
changes (which would be wrong anyways) from simply running
update_mir_test_checks.py.
Link: #20571
|
|
|
|
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
|
|
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.
Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
reassociation as X86/PowerPC/RISCV. (#72820)
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes #72777.
|
|
(#72910)
This reverts commit 42204c94ba9fcb0b4b1335e648ce140a3eef8a9d.
It was accidentally backed out.
#20571
#70743
|
|
This reverts commit 99ee2db198d86f685bcb07a1495a7115ffc31d7e.
It's causing ICEs in the ARM tests. See the comment here:
https://github.com/llvm/llvm-project/commit/99ee2db198d86f685bcb07a1495a7115ffc31d7e
|
|
foldMemoryOperand looks at pairs of instructions (generally a load to
virt reg then use of the virtreg, or def of a virtreg then a store) and
attempts to combine them. This can reduce register pressure.
A prior commit added the ability to mark such a MachineOperand as
foldable. In terms of INLINEASM, this means that "rm" was used (rather
than just "r") to denote that the INLINEASM may use a memory operand
rather than a register operand. This effectively undoes decisions made
by the instruction selection framework. Callers will be added in the
register allocation frameworks. This has been tested with all of the
above (which will come as follow up patches).
Thanks to @topperc who suggested this at last years LLVM US Dev Meeting
and @qcolombet who confirmed this was the right approach.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
When using the inline asm constraint string "rm" (or "g"), we generally
would like the compiler to choose "r", but it is permitted to choose "m"
if there's register pressure. This is distinct from "r" in which the
register is not permitted to be spilled to the stack.
The decision of which to use must be made at some point. Currently, the
instruction selection frameworks (ISELs) make the choice, and the
register allocators had better be able to handle the result.
Steal a bit from Storage when using register operands to disambiguate
between the two cases. Add helpers/getters/setters, and print in MIR
when such a register is foldable.
The getter will later be used by the register allocation frameworks (and
asserted by the ISELs) while the setters will be used by the instruction
selection frameworks.
Link: https://github.com/llvm/llvm-project/issues/20571
|
|
https://github.com/llvm/llvm-project/commit/28b912687900bc0a67cd61c374fce296b09963c4
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
|
|
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
|
|
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
|
|
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
|
|
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
|