| Age | Commit message (Collapse) | Author |
|
LoopPeel sometimes proves that, when reached, the original loop always
executes at least two iterations. LoopPeel then unconditionally executes
both the remaining loop's initial iteration and the peeled final
iteration. But that increases the latter's frequency above its frequency
in the original loop. To maintain the total frequency, this patch
compensates by decreasing the remaininng loop's latch probability.
This is another step in issue #135812 and was discussed at
<https://github.com/llvm/llvm-project/pull/166858#discussion_r2528968542>.
|
|
PR #159163's probability computation for epilogue loops does not handle
the possibility of an original loop probability of one. Runtime loop
unrolling does not make sense for such an infinite loop, and a division
by zero results. This patch works around that case.
Issue #165998.
|
|
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.
For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.
This patch fixes branch weights in tests that PR #157754 adversely
affected.
|
|
Followup metadata for remainder loops is handled by two implementations,
both added by 7244852557ca6:
1. `tryToUnrollLoop` in `LoopUnrollPass.cpp`.
2. `CloneLoopBlocks` in `LoopUnrollRuntime.cpp`.
As far as I can tell, 2 is useless: I added `assert(!NewLoopID)` for the
`NewLoopID` returned by the `makeFollowupLoopID` call, and it never
fails throughout check-all for my build.
Moreover, if 2 were useful, it appears it would have a bug caused by
7cd826a321d9. That commit skips adding loop metadata to a new remainder
loop if the remainder loop itself is to be completely unrolled because
it will then no longer be a loop. However, that commit incorrectly
assumes that `UnrollRemainder` dictates complete unrolling of a
remainder loop, and thus it skips adding loop metadata even if the
remainder loop will be only partially unrolled.
To avoid further confusion here, this patch removes 2. check-all
continues to pass for my build. If 2 actually is useful, please advise
so we can create a test that covers that usage.
Near 2, this patch retains the `UnrollRemainder` guard on the
`setLoopAlreadyUnrolled` call, which adds `llvm.loop.unroll.disable` to
the remainder loop. That behavior exists both before and after
7cd826a321d9. The logic appears to be that remainder loop unrolling
(whether complete or partial) is opt-in. That is, unless
`UnrollRemainder` is true, `UnrollRuntimeLoopRemainder` skips running
remainder loop unrolling, and `llvm.loop.unroll.disable` suppresses any
later attempt at it.
This patch also extends testing of remainder loop followup metadata to
be sure remainder loop partial unrolling is handled correctly by 1.
|
|
The original loop (OL) that serves as input to LoopUnroll has basic
blocks that are arranged as follows:
```
OLPreHeader
OLHeader <-.
... |
OLLatch ---'
OLExit
```
In this depiction, every block has an implicit edge to the next block
below, so any explicit edge indicates a conditional branch.
Given OL and unroll count N, LoopUnroll sometimes creates an unrolled
loop (UL) with a remainder loop (RL) epilogue arranged like this:
```
,-- ULGuard
| ULPreHeader
| ULHeader <-.
| ... |
| ULLatch ---'
| ULExit
`-> RLGuard -----.
RLPreHeader |
,-> RLHeader |
| ... |
`-- RLLatch |
RLExit |
OLExit <-----'
```
Each UL iteration executes N OL iterations, but each RL iteration
executes 1 OL iteration. ULGuard or RLGuard checks whether the first
iteration of UL or RL should execute, respectively. If so, ULLatch or
RLLatch checks whether to execute each subsequent iteration.
Once reached, OL always executes its first iteration but not necessarily
the next N-1 iterations. Thus, ULGuard is always required before the
first UL iteration. However, when control flows from ULGuard directly to
RLGuard, the first OL iteration has yet to execute, so RLGuard is then
redundant before the first RL iteration.
Thus, this patch makes the following changes:
- Adjust ULGuard to branch to RLPreHeader instead of RLGuard, thus
eliminating RLGuard's unnecessary branch instruction for that path.
- Eliminate the creation of RLGuard phi node poison values. Without this
patch, RLGuard has such a phi node for each value that is defined by any
OL iteration and used in OLExit. The poison value is required where
ULGuard is the predecessor. The poison value indicates that control flow
from ULGuard to RLGuard to Exit has no counterpart in OL because the
first OL iteration must execute either in UL or RL.
- Simplify the CFG by not splitting ULExit and RLGuard because, without
the ULGuard predecessor, the single block can now be a dedicated UL
exit.
- To RLPreHeader, add an `llvm.assume` call that asserts the RL trip
count is non-zero. Without this patch, RLPreHeader is reachable only
when RLGuard guarantees that assertion is true. With this patch, RLGuard
guarantees it only when RLGuard is the predecessor, and the OL structure
guarantees it when ULGuard is the predecessor. If RL itself is unrolled
later, this guarantee somehow prevents ScalarEvolution from giving up
when trying to compute a maximum trip count for RL. That maximum trip
count enables the branch instruction in the final unrolled instance of
RLLatch to be eliminated. Without the `llvm.assume` call, some existing
unroll tests start to fail because that instruction is not eliminated.
The original motivation for this patch is to facilitate later patches
that fix LoopUnroll's computation of branch weights so that they
maintain the block frequency of OL's body (see #135812). Specifically,
this patch ensures RLGuard's branch weights do not affect RL's
contribution to the block frequency of OL's body in the case that
ULGuard skips UL.
|
|
Update UnrollRuntimeLoopRemainder to always give priority to the
UnrollRuntimeMultiExit option, if provided.
After ad9da92cf6f7357 (https://github.com/llvm/llvm-project/pull/124462),
we would ignore the option if the backend indicates multi-exit is profitable.
This means it cannot be used to disable runtime unrolling.
To be consistent with canProfitablyRuntimeUnrollMultiExitLoop, always
respect the option.
This surfaced while discussing https://github.com/llvm/llvm-project/pull/131998.
PR: https://github.com/llvm/llvm-project/pull/134259
|
|
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.
This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.
PR: https://github.com/llvm/llvm-project/pull/124462
|
|
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
|
|
Identified with misc-include-cleaner.
|
|
Use the more aggressive invalidation method in a number of places
that add incoming values to lcssa phi nodes. It is likely that
it's possible to construct cases with incorrect SCEV preservation
similar to https://github.com/llvm/llvm-project/issues/109333 for
these.
|
|
|
|
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
|
|
|
|
|
|
|
|
- There is no restriction on a loop with controlled convergent
operations when
the relevant tokens are defined and used within the loop.
- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
runtime checks.
- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
lying outside the loop. Such unrolling is disabled for now.
- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.
- Spell out the initializer for UnrollLoopOptions to improve
readability.
Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
|
|
We need to remap any DbgRecord, not just DbgVariableRecords.
This is the followup to #91447.
Co-authored-by: PietroGhg <pietro.ghiglio@codeplay.com>
|
|
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
|
|
(#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.
This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
|
|
For integers larger than 64-bit, this would zero-extend a -1
value, instead of sign-extending it.
Fixes https://github.com/llvm/llvm-project/issues/80289.
|
|
This patch adds support for CloneBasicBlock duplicating the DPValues
attached to instructions, and adds facilities to remap them into their new
context. The plumbing to achieve this is fairly straightforwards and
mechanical.
I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch
and SimplifyCFG. The former only updates for the epilogue right now so I've
added CHECK lines just for the end of an unrolled loop (further updates
coming later). SimpleLoopUnswitch had no debug-info tests so I've added a
new one. The two modified parts of SimplifyCFG are covered by the two
modified SimplifyCFG tests.
These are scenarios where we have to do extra cloning for copying of
DPValues because they're no longer instructions, and remap them too.
|
|
Make sure every conditional branch constructed by `LoopUnrollRuntime`
code sets branch weights.
- Add new 1:127 weights for the conditional jumps checking whether the
whole (unrolled) loop should be skipped in the generated prolog or
epilog code.
- Remove `updateLatchBranchWeightsForRemainderLoop` function and just
add weights immediately when constructing the relevant branches. This
leads to simpler code and makes the code more obvious as every call
to `CreateCondBr` now has a `BranchWeights` parameter.
- Rework formula for epilogue latch weights, to assume equal
distribution of remainders and remove `assert` (as I was able to
reach this code when forcing small unroll factors on the commandline).
Differential Revision: https://reviews.llvm.org/D158642
|
|
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
|
|
Relax condition on runtime trip count unrolling loops with 1 non-latch exit
that leads to a deop block.
There are cases when the deopt blocks are common exits for different loops.
LoopSimplify pass splits such edges to the common deopting blocks to make
sure that all exit nodes of the loop only have predecessors that are inside
of the loop (See simplifyOneLoop()). This breaks the current condition for
unrolling. This patch allows the split transitive blocks that still lead to
the deopting blocks.
Differential Revision: https://reviews.llvm.org/D152639
|
|
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
|
|
Function::splice()
This is part of a series of patches that aim at making Function::getBasicBlockList() private.
Differential Revision: https://reviews.llvm.org/D139984
|
|
|
|
|
|
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
|
|
This reverts commit 300c9a78819b4608b96bb26f9320bea6b8a0c4d0.
We will reland once these issues are ironed out.
|
|
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
|
|
|
|
ConnectProlog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47ec51.
Fix is analog to cfc741bc0e029.
Fixes #56286.
|
|
ConnectEpilog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47ec51.
Fix is analog to cfc741bc0e029.
Fixes #56282.
|
|
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
|
|
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
|
|
|
|
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
|
|
This is a followup to D125754. We introduce two branches, one
before the unrolled loop and one before the epilogue (and similar
for the prologue case). The previous patch only froze the
condition on the first branch.
Rather than independently freezing the second condition, this patch
instead freezes TripCount and bases BECount on it. These are the
two quantities involved in the conditions, and this ensures that
both work on a consistent, non-poisonous trip count.
Differential Revision: https://reviews.llvm.org/D125896
|
|
When performing runtime unrolling with multiple exits, one of the
earlier (non-latch) exits may exit the loop on the first iteration,
such that we never branch on the latch exit condition. As such, we
need to freeze the condition of the new branch that is introduced
before the loop, as it now executes unconditionally.
Differential Revision: https://reviews.llvm.org/D125754
|
|
Estimation on the impact on preprocessor output:
before: 1065307662
after: 1064800684
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
|
|
BECounts are guaranteed to be integers nowadays.
|
|
This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all.
Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag.
Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled?
Sorry for the lack of tests here. These are both *exceedingly* narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.
|
|
All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.
|
|
|
|
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.
Why does this matter? A couple of reasons:
* SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.)
Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.
* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
|
|
|
|
|
|
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop. When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.
Differential Revision: https://reviews.llvm.org/D108476
|
|
Requested in review comment on D108476
|