llvm-project.git/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp, branch main

Remove unused standard headers: memory, unordered_* (#167297)

2025-11-12T17:21:06+00:00

Cleanup the LLVM exported symbols namespace (#161240)

2025-10-01T22:32:07+00:00

There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.

[SampleProfile] Fix UB in Demangler invocation. (#137659)

2025-04-28T17:28:56+00:00

Currently the backing buffer of a `std::vector` is passed[1] to
`Demangler.getFunctionBaseName`. However, deeply inside the call stack
`OutputBuffer::grow` will call[2] `std::realloc` if it needs to grow the
buffer, leading to UB.

The demangler APIs specify[3] that "`Buf` and `N` behave like the second
and third parameters to `__cxa_demangle`" and the docs for the latter
say[4] that the output buffer must be allocated with `malloc` (but can
also be `NULL` and will then be realloced accordingly).

Note: PR #135863 changed this from a stack array to a `std::vector` and
increased the size to 65K, but this can still lead to a crash if the
demangled name is longer than that - yes, I'm surprised that a >65K-long
function name happens in practice...

[1]:
https://github.com/llvm/llvm-project/blob/d7e631c7cd6d9c13b9519991ec6becf08bc6b8aa/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp#L744
[2]:
https://github.com/llvm/llvm-project/blob/d7e631c7cd6d9c13b9519991ec6becf08bc6b8aa/llvm/include/llvm/Demangle/Utility.h#L50
[3]:
https://github.com/llvm/llvm-project/blob/d7e631c7cd6d9c13b9519991ec6becf08bc6b8aa/llvm/include/llvm/Demangle/Demangle.h#L92-L93
[4]:
https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a01696.html

[SampleFDO] Extend the function base name max size (#135863)

2025-04-16T20:11:55+00:00

The function base name could be way long which overflows and leads to a
crash. Update to extend the max size.

Also changed to use heap allocation( `std::vector` ) to avoid
stack overflow.

[SampleFDO] Match functions with the same base function name (#126688)

2025-03-24T17:11:55+00:00

Sometimes, there may be no matched anchors but the functions still
match. e.g. if the function’s template typename changes, all the
callsites that use the type are mismatched and the caller function that
contains those callsite are mismatched. Introduce a check to match the
functions if their demangled base names are the same.

[StaleProfileMatching] Use only profile anchor size for similarity calculation (#126783)

2025-02-13T23:38:08+00:00

We observed that the number of IR anchors is usually greater than the
number of profile anchors, because IR anchors can be optimized away
later and llvm-profgen might not generate profiles for cold callsites.
This can cause missing function matches. I’m changing the similarity
calculation to use only the profile anchor size. In another point of
view, It also makes sense to reuse as many profile anchors as possible
regardless of the new functions in the IR.

[CSSPGO] Turn on call-graph matching by default for CSSPGO (#125938)

2025-02-06T19:54:59+00:00

Tested call-graph matching on some of Meta's large services, it works to
reuse some renamed function profiles, no negative perf or significant
build speed regression observed. Turned it on by default for CSSPGO
mode.

[SampleProf] Templatize longestCommonSequence (NFC) (#114633)

2024-11-04T21:20:25+00:00

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template.  The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator.  To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

  std::unordered_map

does not work because we cannot infer the hash functor
LineLocationHash.  I could define std::hash.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo.  This way:

  DenseMap

automatically picks up DenseMapInfo.

[SampleFDO] Read call-graph matching recovered top-level function profile (#101053)

2024-09-04T18:08:40+00:00

With extbinary profile format, initial profile loading only reads
profile based on current function names in the module. However, if a
function is renamed, sample loader skips to load its original
profile(which has a different name), we will miss this case. To address
this, we load the top-level profile candidate explicitly for the
matching. If a match is found later, the function profile will be
further preserved for use by the sample loader.

[SampleFDO] Stale profile call-graph matching (#95135)

2024-07-17T17:33:00+00:00

Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible  pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)