llvm-project.git/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp, branch main

Cleanup the LLVM exported symbols namespace (#161240)

2025-10-01T22:32:07+00:00

There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.

[InstCombine][nfc] Fix assert failure with function entry count equal to zero

2025-09-21T04:33:09+00:00

We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832

[FunctionSpecialization] Fix profile count preserving logic (#157939)

2025-09-11T06:22:21+00:00

The previous fix in #157768 had a bug; instead of subtracting the
original function's call count per call site of a specialization, we
were subtracting the running total of the specialization's calls.

Tracking issue: #147390

[FunctionSpecialization] Preserve call counts of specialized functions (#157768)

2025-09-10T01:27:24+00:00

A function that has been specialized will have its function entry counts
preserved as follows:

* Each specialization's count is the sum of each call site's basic
block's number of entries as computed by `BlockFrequencyInfo`.
* The original function's count will be decreased by the counts of its
specializations.

Tracking issue: #147390

[FuncSpec] Skip SCCP on blocks of dead functions and poison their callsites (#154668)

2025-08-27T17:04:52+00:00

Fixes #153295.
For test case below:
```llvm
define i32 @caller() {
entry:
  %call1 = call i32 @callee(i32 1)
  %call2 = call i32 @callee(i32 0)
  %cond = icmp eq i32 %call2, 0
  br i1 %cond, label %common.ret, label %if.then

common.ret:                                       ; preds = %entry
  ret i32 0

if.then:                                         ; preds = %entry
  %unreachable_call = call i32 @callee(i32 2)
  ret i32 %unreachable_call
}

define internal i32 @callee(i32 %ac) {
entry:
  br label %ai

ai:                                               ; preds = %ai, %entry
  %add = or i32 0, 0
  %cond = icmp eq i32 %ac, 1
  br i1 %cond, label %aj, label %ai

aj:                                               ; preds = %ai
  ret i32 0
}
```
Before specialization, the SCCP solver determines that
`unreachable_call` is unexecutable, as the value of `callee` can only be
zero.
After specializing the call sites `call1` and `call2`, FnSpecializer
announces `callee` is a dead function since all executable call sites
are specialized. However, the unexecutable call sites can become
executable again after solving specialized calls.
In this testcase, `call2` is considered `Overdefined` after
specialization, making `cond` also `Overdefined`. Thus,
`unreachable_call` becomes executable.
This patch skips SCCP on the blocks in dead functions, and poisons the
call sites of dead functions.

[llvm] Remove unused includes (NFC) (#154051)

2025-08-18T06:46:35+00:00

These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.

[PredicateInfo] Use bitcast instead of ssa.copy (#151174)

2025-08-11T07:25:01+00:00

PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.

[llvm] Lower latency bonus threshold in function specialization. (#143954)

2025-06-17T23:13:42+00:00

Related to #143219.

Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.

While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.

I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.

It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.

With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.

With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.

This threshold has been changed before also due to improved
alias information:
https://github.com/llvm/llvm-project/commit/2fb51fba8ca904a6d3ddf30ae94228ecf9e6a231#diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd

Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.

[CostModel] Remove optional from InstructionCost::getValue() (#135596)

2025-04-23T06:46:27+00:00

InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.

[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284)

2025-01-25T10:59:50+00:00

Fixes an issue reported from #114964, where metadata arguments were
attempted to be accessed as constants.