llvm-project.git/llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll, branch main

Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)"

2025-11-23T05:17:45+00:00

This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.

This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.

[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)

2025-11-23T02:11:24+00:00

This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.

ARM: Enable terminal rule (#165958)

2025-11-10T20:49:01+00:00

[RegAlloc] Remove default restriction on non-trivial rematerialization (#159211)

2025-10-04T22:50:44+00:00

In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.

However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.

With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.

For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:

- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%

[PHIElimination] Revert #131837 #146320 #146337 (#146850)

2025-07-03T11:48:08+00:00

Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337

[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)

2025-06-29T18:28:42+00:00

The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.

Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"

2023-10-09T11:31:32+00:00

This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.

[CodeGen] Really renumber slot indexes before register allocation (#67038)

2023-10-09T10:44:41+00:00

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.

[InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold

2023-06-15T12:17:58+00:00

InstCombine tries to swap compare operands to match sub instructions
in order to expose "CSE opportunities". However, it doesn't really
make sense to perform this transform in the middle-end, as we cannot
actually CSE the instructions there.

The backend already performs this fold in
https://github.com/llvm/llvm-project/blob/18f5446a45da5a61dbfb1b7667d27fb441ac62db/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L4236
on the SDAG level, however this only works within a single basic block.

To handle cross-BB cases, we do need to handle this in the IR layer.
This patch moves the fold from InstCombine to CGP in the backend,
while keeping the same (somewhat dubious) heuristic.

Differential Revision: https://reviews.llvm.org/D152541

[ARM] Convert active.lane.masks to vctp with non-zero starts

2023-03-29T13:17:10+00:00

This attempts to expand the logic in the MVETailPredication pass to convert
active lane masks that the vectorizer produces to vctp instructions that the
backend can later turn into tail predicated loops. Especially for addrecs with
non-zero starts that can be created from epilog vectorization. There is some
adjustment to the logic to handle this, moving some of the code to check the
addrec earlier so that we can get the start value. This start value is then
incorporated into the logic of checkin the new vctp is valid, and there is a
newly added check that it is known to be a multiple of the VF as we expect.

Differential Revision: https://reviews.llvm.org/D146517