| Age | Commit message (Collapse) | Author |
|
Previously, SimplifyCFG only simplified unconditional branches when they
met a pattern (`swicth` -> `icmp` -> `br` -> `phi`) as follows:
```LLVM
switch i8 %A, label %DEFAULT [ i8 1, label %end i8 2, label %end ]
DEFAULT:
%tmp = icmp eq i8 %A, 92
br label %end
end:
... = phi i1 [ true, %entry ], [ %tmp, %DEFAULT ], [ true, %entry ]
```
This PR supports a new and more generic pattern (`switch` -> `icmp` ->
`select` -> `br` -> `phi` ) to simplify unconditional branches as
follows:
```LLVM
; BEFORE
case1:
switch i32 %x, label %DEFAULT [
i32 0, label %end
i32 1, label %case2
]
case2:
br label %end
DEFAULT:
%tmp = icmp eq i32 %x, 2
%val = select i1 %tmp, i32 V3, i32 V4
br label %end
end:
... = phi i32 [ V1, %case1 ], [ V2, %case2 ], [ %val, %DEFAULT ]
```
We prefer to split the edge to 'end' so that there are TWO entries of
V3/V4 to the PHI, merging the icmp & select into the switch, as follows:
```LLVM
; AFTER
case1:
switch i32 %x, label %DEFAULT [
i32 0, label %end
i32 1, label %case2
i32 2, label %case3
]
case2:
br label %end
case3:
br label %end
DEFAULT:
br label %end
end:
... = phi i32 [ V1, %case1 ], [ V2, %case2 ], [ V3, %case3 ], [ V4, %DEFAULT]
```
Alive2 Proof: https://alive2.llvm.org/ce/z/jYHM4f
Promising Optimization Impact:
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3006
|
|
Continued from #165804
This maintains the BFI of the default branch. Originally `10/63`, post-pass, it ends up being `5/63 + 58/63 * 5/58`(first term is from `PROF`, second is the probability of going to the switch lookup times the probability, there, of taking the default branch)
Issue #147390
|
|
(#166032)
This picks up from #166028, making the `Function` argument optional:
most cases don't need to provide it, but in e.g. InstCombine's case,
where the instruction (select, branch) is not attached to a function
yet, the function needs to be passed explicitly.
Co-authored-by: Florian Hahn <flo@fhahn.com>
|
|
ConstantRange uses `[-1, -1)` as the canonical form of a full set.
Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't
work for full ranges. This patch fixes the value enumeration in
`ConstantComparesGatherer` to prevent missing values for full sets.
Closes https://github.com/llvm/llvm-project/issues/166369.
|
|
conditions (#165748)
In simplifycfg/cvp/sccp, we eliminate dead edges of switches according
to the knownbits/range info of conditions. However, these approximations
may not meet the real-world needs when the domain of condition values is
sparse. For example, if the condition can only be either -3 or 3, we
cannot prove that the condition never evaluates to 1 (knownbits:
???????1, range: [-3, 4)).
This patch adds a helper function `collectPossibleValues` to enumerate
all the possible values of V. To fix the motivating issue,
`eliminateDeadSwitchCases` will use the result to remove dead edges.
Note: In
https://discourse.llvm.org/t/missed-optimization-due-to-overflow-check/88700
I proposed a new value lattice kind to represent such values. But I find
it hard to apply because the transition becomes much complicated.
Compile-time impact looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=32d6b2139a6c8f79e074e8c6cfe0cc9e79c4c0c8&to=e47c26e3f1bf9eb062684dda4fafce58438e994b&stat=instructions:u
This patch removes many dead error-handling codes:
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3012
Closes https://github.com/llvm/llvm-project/issues/165179.
|
|
A switch on `umin` can eliminate the default case by making the `umin`'s
constant the default case.
Proof: https://alive2.llvm.org/ce/z/_N6nfs
Fixes: https://github.com/llvm/llvm-project/issues/162111
|
|
`turnSwitchRangeIntoICmp` (#165931)
PR #161000 introduced a bug whereby the IR would become invalid by having an unconditional branch have `!prof`attached to it. This only became evident in PR #165744, because the IR of `test/Transforms/SimplifyCFG/pr165301.ll`was simple enough to both (1) introduce the unconditional branch, and (2) survive in that fashion until the end of the pass (simplifycfg) and thus trip the verifier.
|
|
`simplifySwitchOfPowersOfTwo` converts (when applicable, see `00f5a1e30b`) a switch to a conditional branch. Its false case goes to the `default` target of the former switch, and the true case goes to a BB performing a `cttz`. We can calculate the branch weights from the branch weights of the old switch.
Issue #147390
|
|
nodes (#165744)
`PHINode::removeIncomingValue` removes itself when there are no incoming
edges. Then we cannot use it to retrieve the next instruction.
Closes https://github.com/llvm/llvm-project/issues/165301.
|
|
Previously, `hoistCommonCodeFromSuccessors` returned early if one of the
succ of BB has >1 predecessors.
However, if the succ is an unreachable BB, we can relax the condition to
perform `hoistCommonCodeFromSuccessors` based on the assumption of not
reaching UB.
See discussion https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2989
for details.
Alive2 proof: https://alive2.llvm.org/ce/z/OJOw0s
Promising optimization impact:
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2995
|
|
(#165335)
A recent commit 00f5a1e30b modified simplifySwitchOfPowersOfTwo to
generate a branch to handle the non-power-of-2 case when appropriate,
but does not set a DebugLoc on the new branch instruction; this patch
propagates the switch's DebugLoc to the new branch, as for the other
instructions generated in the same block.
Found using #107279.
|
|
In `simplifyBranchOnICmpChain`, if we can merge the comparisons into a
range check, use a conditional branch instead. This change also breaks
the cycle found in https://github.com/llvm/llvm-project/issues/165088.
Closes https://github.com/llvm/llvm-project/issues/165088.
Detailed description of the cycle:
```
define void @pr165088_cycle_1(i8 %x) {
entry:
%switch = icmp uge i8 %x, 2
%cond1 = icmp ugt i8 %x, 1
%or.cond = and i1 %switch, %cond1
br i1 %or.cond, label %block3, label %block2
block1:
%cond2 = icmp ugt i8 %x, 1
br i1 %cond2, label %block3, label %block2
block2:
br label %block3
block3:
%cond3 = icmp eq i8 %x, 0
br i1 %cond3, label %exit, label %block1
exit:
ret void
}
```
`simplifyBranchOnICmpChain` folds the branch in `entry` to a switch.
Then we get:
```
entry:
switch i8 %x, label %block3 [
i8 1, label %block2
i8 0, label %block2
]
...
```
`performValueComparisonIntoPredecessorFolding` redirects the default
target `block3` into `block1` because `%x` is never zero.
```
entry:
switch i8 %x, label %block1 [
i8 1, label %block2
i8 0, label %block2
]
...
```
Then `turnSwitchRangeIntoICmp` will convert the switch back into a
branch on icmp.
```
entry:
%switch = icmp ult i8 %x, 2
br i1 %switch, label %block2, label %block1
...
```
Since `block1` and `block2` share the same successor `block3`,
`performBranchToCommonDestFolding` merges the conditions of `entry` and
`block1`, resulting in the original pattern again.
|
|
Favour a `cttz`-indexed table lookup over an indirect jump table when
the default switch case is reachable, by branching non-power-of-two
inputs to the default case.
Proofs: https://alive2.llvm.org/ce/z/HeRAtf.
|
|
(#162683)
Use `llvm::DenseMapInfo` to define `DenseMapInfo` specializations
instead of surrounding it with `namespace llvm {}`.
|
|
unreachable (#162677)
Fixes #162585.
#161000 changed `br i1 true, label %if, label %else` to `br label %if`,
so we should remove one more incoming value.
|
|
While we do not want to form actual lookup tables early, we do want to
perform some optimizations, as they may enable inlining of the much
simpler form.
Builds on https://github.com/llvm/llvm-project/pull/156477, which
originally included this change as well. This PR makes two changes on
top of it:
* Do not perform the optimization early if it requires adding a mask
check. These make the resulting IR less analyzable.
* Add a new SimplifyCFG option that controls switch-to-arithmetic
conversion separately from switch-to-lookup conversion. Enable the new
flag at the end of the function simplification pipeline. This means that
we attempt the arithmetic conversion before inlining, but avoid it in
the early pipeline, where it may lose information.
|
|
Handle branch weights for `indirectbr` simplification: if we drop branches that aren't taken, we just need to remove the corresponding branch weight (which is presumably 0).
Issue #147390
|
|
Fixes #157113.
Take the following IR as an example; we know the destination of the `[1,
3]` cases is `%else`.
```llvm
define i32 @src(i8 range(i8 0, 6) %arg) {
switch i8 %arg, label %else [
i8 0, label %if
i8 4, label %if
i8 5, label %if
]
if:
ret i32 0
else:
ret i32 1
}
```
We can first try the non-wrapping range for both destinations, but I
don't see how that would be any better.
Proof: https://alive2.llvm.org/ce/z/acdWD4.
|
|
(#161739)
The switch becomes a conditional branch, one edge going to what was the default target of the switch, the other to a BB that performs a lookup in a table. The branch weights are accurately determinable from the ones of the switch.
Issue #147390
|
|
-> switch` (#161549)
We cannot calculate the weights of the switch precisely, but we do know the probability of the default branch. We then split equally the remaining probability over the rest of the cases. If we did nothing, the static estimation could be considerably poorer.
Issue #147390
|
|
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
|
|
The main difference between SimplifyCFG's `setBranchWeights` and the ProfDataUtils' is that the former doesn't propagate all-zero weights. That seems like a sensible thing to do, so updated the latter accordingly, and added a flag to control the behavior.
Also moved to ProfDataUtils the logic fitting 64-bit weights to 32-bit.
As side-effect, this fixes some profcheck failures.
|
|
`foldSwitchToSelect`
Make sure selects do exist prior to assigning weights to edges.
Fixes: https://github.com/llvm/llvm-project/issues/161137.
|
|
Propagate `!prof` from `switch` instructions.
Issue #147390
|
|
This is an overly broad check, the transformation made here can be done
safely for pointers with index!=repr width. This fixes the codegen
regression introduced by https://github.com/llvm/llvm-project/pull/105735
and should be beneficial for AMDGPU code-generation once the datalayout
there no longer uses the overly strict `ni:` specifier.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/159890
|
|
Co-authored-by: Nikita Popov <npopov@redhat.com>
|
|
This patch fixes:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp:338:6: error: unused
function 'isSelectInRoleOfConjunctionOrDisjunction'
[-Werror,-Wunused-function]
|
|
simplifications (#154426)
There’s a pattern where a branch is conditioned on a conjunction or disjunction that ends up being modeled as a `select` where the first operand is set to `true` or the second to `false`. If the branch has known branch weights, they can be copied to the `select`. This is worth doing in case later the `select` gets transformed to something else (i.e. if we know the profile, we should propagate it).
Issue #147390
|
|
(#154841)
|
|
In some cases, we can replace a switch with simpler instructions or a
lookup table.
For instance, if every case results in the same value, we can simply
replace the switch
with that single value.
However, lookup tables are not always supported.
Targets, function attributes and compiler options can deactivate lookup
table creation.
Currently, even simpler switch replacements like the single value
optimization do not
get applied, because we only enable these transformations if lookup
tables are enabled.
This PR enables the other kinds of replacements, even if lookup tables
are not supported.
First, it checks if the potential replacements are lookup tables.
If they are, then check if lookup tables are supported and whether to
continue.
If they are not, then we can apply the other transformations.
Originally, lookup table creation was delayed until late stages of the
compilation pipeline, because
it can result in difficult-to-analyze code and prevent other
optimizations.
As a side effect of this change, we can also enable the simpler
optimizations much earlier in the
compilation process.
|
|
Proof: https://alive2.llvm.org/ce/z/cpXuCb
|
|
(#155734)
The branch weights capture probability. The probability has everything to do with the (SSA) value the condition is predicated on, and nothing to do with the position in the CFG.
|
|
(#155602)
This PR is the first part to solve the issue in #149937.
The end goal is enabling more switch optimizations on targets that do
not support lookup tables.
SimplifyCFG has the ability to replace switches with either a few simple
calculations, a single value, or a lookup table.
However, it only considers these options if the target supports lookup
tables, even if the final result is not a LUT, but a few simple
instructions like muls, adds and shifts.
To enable more targets to use these other kinds of optimization, this PR
restructures the code in `switchToLookup`.
Previously, code was generated even before choosing what kind of
replacement to do. However, we need to know if we actually want to
create a true LUT or not before generating anything. Then we can check
for target support only if any LUT would be created.
This PR moves the code so it first determines the replacement kind and
then generates the instructions.
A later PR will insert the target support check after determining the
kind of replacement. If the result is not a LUT, then even targets
without LUT support can replace the switch with something else.
|
|
proof: https://alive2.llvm.org/ce/z/5PNCds
|
|
We cannot form phis/selects of token type, so this should be checked
inside canReplaceOperandWithVariable().
|
|
`mergeConditionalStoreToAddress` (#155058)
This is about code readability. The operands in the disjunction forming the combined predicate in `mergeConditionalStoreToAddress` could sometimes be negated twice. This patch addresses that.
2 tests needed updating because they exposed the double negation and now they don’t.
|
|
`ICI->getOperand(0)` is non-null.
|
|
condition. (#154007)
Proof: https://alive2.llvm.org/ce/z/TozSD6
|
|
Split out from https://github.com/llvm/llvm-project/pull/154007 as it
showed compile time improvements
NFC as there needs to be at least two icmps that is part of the chain.
|
|
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144
|
|
proof: https://alive2.llvm.org/ce/z/WVt4-F
|
|
getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers
is used for DbgRecords.
|
|
|
|
Extend jump-threading to allow local defs that are live outside of the
threaded block. Allow threading to destinations where the local defs are
not live.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
|
|
This was always undesirable, and after #149310 it is illegal and will
result in a verifier error.
Fix this by moving SimplifyCFG's check for this into
canReplaceOperandWithVariable(), so it's shared with GVNSink.
|
|
Avoid repeatedly querying `getUniquePredecessor` for already-visited
switch successors so as not to incur quadratic runtime.
Fixes: https://github.com/llvm/llvm-project/issues/147239.
|
|
one case (#145233)
Fix #141753 .
This patch introduces a new check, that tries to decide if the
conjunction of all the values uniquely identify the accepted values by
the switch.
|
|
(#146207)
Generate the GEP with the index type that InstCombine will cast it to but use the knowledge that the index is unsigned.
|
|
We should be able to allow `simplifySwitchOfPowersOfTwo` transform
to take place, as, on recent X86 targets, the weighted latency-size
appears to be 2. This favours computing trailing zeroes and indexing
into a smaller value table, over generating a jump table with an
indirect branch, which overall should be more efficient.
|
|
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
|