summaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/IPO/SampleProfile.cpp
AgeCommit message (Collapse)Author
2025-11-09Remove unused <set> and <map> inclusion (#167175)serge-sans-paille
2025-11-08[llvm] Remove unused local variables (NFC) (#167106)Kazu Hirata
Identified with bugprone-unused-local-non-trivial-variable.
2025-10-01Cleanup the LLVM exported symbols namespace (#161240)Nicolai Hähnle
There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.
2025-10-01[SimplifyCFG][PGO] Reuse existing `setBranchWeights` (#160629)Mircea Trofin
The main difference between SimplifyCFG's `setBranchWeights`​ and the ProfDataUtils' is that the former doesn't propagate all-zero weights. That seems like a sensible thing to do, so updated the latter accordingly, and added a flag to control the behavior. Also moved to ProfDataUtils the logic fitting 64-bit weights to 32-bit. As side-effect, this fixes some profcheck failures.
2025-09-19[SampleProfile] Always use FAM to get OREAiden Grossman
The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
2025-05-24[llvm] Use std::tie to implement comparison functors (NFC) (#141353)Kazu Hirata
2025-05-15[llvm] Use std::optional::value_or (NFC) (#140014)Kazu Hirata
2025-04-28Clean up external users of GlobalValue::getGUID(StringRef) (#129644)Owen Rodley
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.
2025-03-29[Transforms] Use llvm::append_range (NFC) (#133607)Kazu Hirata
2025-03-28[NFC][SampleFDO] Clean the unneeded field and the related loop (#132376)Jinjie Huang
Clean the unneeded field 'TotalCollectedSamples' and the unnecessary loop. The field seems introduced in:https://reviews.llvm.org/D31952, and its uses were removed in: https://reviews.llvm.org/D19287, but this field and unnecessary calculation were not cleaned up. This patch will remove these unneeded codes.
2025-03-22[IPO] Avoid repeated hash lookups (NFC) (#132588)Kazu Hirata
2025-02-06[CSSPGO] Turn on call-graph matching by default for CSSPGO (#125938)Lei Wang
Tested call-graph matching on some of Meta's large services, it works to reuse some renamed function profiles, no negative perf or significant build speed regression observed. Turned it on by default for CSSPGO mode.
2025-01-27[NFC][DebugInfo] Make some block-start-position methods return iterators ↵Jeremy Morse
(#124287) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction*'s and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.
2025-01-08[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955)Ryan Mansfield
2024-12-13[PseudoProbe] Fix cleanup for pseudo probe after annotation (#119660)Haohai Wen
When using -sample-profile-remove-probe, pseudo probe desc should also be removed and dwarf discriminator for call instruction should be restored.
2024-11-08[SampleFDO] Support enabling sample loader pass in O0 mode (#113985)Lei Wang
Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.
2024-11-03[IPO] Remove unused includes (NFC) (#114716)Kazu Hirata
Identified with misc-include-cleaner.
2024-09-15[Instrumentation] Move out to Utils (NFC) (#108532)Antonio Frighetto
Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.
2024-07-18Fix assertion of null pointer samples in inline replay mode (#99378)Lei Wang
Fix https://github.com/llvm/llvm-project/issues/97108. In inline replay mode, `CalleeSamples` may be null and the order doesn't matter.
2024-07-17[SampleFDO] Stale profile call-graph matching (#95135)Lei Wang
Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = |equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)
2024-07-09[NFC] Coding style fixes: SampleProf (#98208)Mircea Trofin
Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I *think* the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h
2024-06-30[Transforms] Migrate to a new version of getValueProfDataFromInst (#96380)Kazu Hirata
2024-06-13[Transforms] Migrate to a new version of getValueProfDataFromInst (#95485)Kazu Hirata
Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371dd8843dfc52b9435afaa133997c1773d8 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700
2024-06-12Reapply "[llvm][IR] Extend BranchWeightMetadata to track provenance o… ↵Paul Kirth
(#95281) …f weights" #95136 Reverts #95060, and relands #86609, with the unintended code generation changes addressed. This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof meatdata nodes for branch weights, which can be used to distinguish weights added from llvm.expect* intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.
2024-06-11Revert "[llvm][IR] Extend BranchWeightMetadata to track provenance of ↵Paul Kirth
weights" (#95060) Reverts llvm/llvm-project#86609 This change causes compile-time regressions for stage2 builds (https://llvm-compile-time-tracker.com/compare.php?from=3254f31a66263ea9647c9547f1531c3123444fcd&to=c5978f1eb5eeca8610b9dfce1fcbf1f473911cd8&stat=instructions:u). It also introduced unintended changes to `.text` which should be addressed before relanding.
2024-06-10[llvm][IR] Extend BranchWeightMetadata to track provenance of weights (#86609)Paul Kirth
This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof metadata nodes for branch weights, which can be used to distinguish weights added from `llvm.expect*` intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.
2024-05-28[Sample Profile] Check hot callsite threshold when inlining a function with ↵William Junda Huang
a sample profile (#93286) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlined barring invalid cases (such as recursion). Inline cost check should still apply because a function's hotness and its inline cost are two different things. For example if a function is calling another very large function multiple times (at different code paths), the large function should not be inlined even if its hot.
2024-05-08[SampleProfileLoader] Fix integer overflow in generateMDProfMetadata (#90217)Nabeel Omer
This patch fixes an integer overflow in the SampleProfileLoader pass. The issue occurs when weights are saturated and Profi isn't being used. This patch also adds a newline to a debug message to make it more readable.
2024-04-29[PseudoProbe] Add an option to remove pseudo probes after profile annotation ↵Lei Wang
(#90293) This can be used for testing perf overhead of pseudo-probe.
2024-04-09Remove unused variable (#88223)Lei Wang
fix the CI
2024-03-28[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988)Lei Wang
Move all the stale profile matching stuffs into new files so that it can be shared for unit testing.
2024-03-27[CSSPGO] Fix the issue of missing callee profile matches (#85715)Lei Wang
Two fixes related to the callee/inlinee profile: 1. Fix the bug that the matching results are missing to distribute to the callee profiles (should be pass-by-reference). 2. Narrow imported function matching to checksum mismatched functions. More context: before we run matchings for all imported functions even checksums are matched, however, after we fix 1), we got a regression, it's likely due to the matching is not no-op for checksum matched function, so we want to make it consistent to only run matching for checksum mismatched (imported)functions. Since the metadata(pseudo_probe_desc) are dropped for imported function, we leverage the function attribute mechanism and add a new function attribute(`profile-checksum-mismatch`) to transfer the info from pre-link to post-link.
2024-03-27[CSSPGO] Reject high checksum mismatched profile (#84097)Lei Wang
Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile. Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(`hot-func-cutoff-for-staleness-error` and `min-functions-for-staleness-error`) to control the function selection to make sure the selected are hot enough and the num of function is not small. Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .
2024-03-19[CSSPGO] Fix the issue of preinliner import function list (#85719)Lei Wang
By design, when the nested profile is pre-inliner based, we should fully honor pre-inliner decision, fix it by setting threshold to zero. We observed a perf win on one internal service, no negative impact for other big services.
2024-02-19[CSSPGO] Compute and report profile matching recovered callsites and samples ↵Lei Wang
(#79090) This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching. Some implementation notes: - The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples. - Added two new counters `NumRecoveredCallsites`, `RecoveredCallsiteSamples` for this and removed `TotalCallsiteSamples` as now the we can use the `TotalFuncHashSamples` as base, and renamed some counters. - In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the `MatchState` for the state, and used a new function `recordCallsiteMatchStates` to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a `FuncCallsiteMatchStates` map for later counting use. - Changed the counting function to run on module-level and moved it to the end of the whole process(`computeAndReportProfileStaleness`). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.
2023-12-24[ProfileData] Copy CallTargetMaps a bit less. NFCIBenjamin Kramer
2023-11-16Add setBranchWeigths convenience function. NFC (#72446)Matthias Braun
Add `setBranchWeights` convenience function to ProfDataUtils.h and use it where appropriate.
2023-11-10[SampleProfile] Fix bug where remapper returns empty string and crashing ↵William Junda Huang
Sample Profile loader (#71479) Normally SampleContext does not allow using an empty StirngRef to construct an object, this is to prevent bugs reading the profile. However empty names may be emitted by a function which its name is intentionally set to empty, or a bug in the remapper that returns an empty string. Regardless, converting it to FunctionId first will prevent the assert, and that assert check is unnecessary, which will be addressed in another patch
2023-10-17[llvm-profdata] Do not create numerical strings for MD5 function names read ↵William Junda Huang
from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-09-01[llvm] Fix duplicate word typos. NFCFangrui Song
Those fixes were taken from https://reviews.llvm.org/D137338
2023-08-31[CSSPGO] Refactoring findIRAnchorswlei
Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D159169
2023-08-31[CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC)Jie Fu
/data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable] bool IsFuncHashMismatch = false; ^ 1 error generated.
2023-08-30[CSSPGO] Skip reporting staleness metrics for imported functionswlei
Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally). I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156725
2023-08-30[CSSPGO] Compute checksum mismatch recursively on nested profilewlei
Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158900
2023-08-30[CSSPGO] Retire FlattenProfileForMatchingwlei
- Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors. - Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile) Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158891
2023-08-30[CSSPGO] Support stale profile matching for LTOwlei
As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later. This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name). Measured on some stale profile instances, all showed some perf improvements. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156722
2023-08-30[CSSPGO] Refactoring SampleProfileMatcher::runOnFunctionwlei
- rename `IRLocation` --> `IRAnchors`, `ProfileLocation` --> `ProfileAnchors` - reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors` - introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times. - move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function. - move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches` Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D158817
2023-08-16[SampleProfile] Potential use after move in ↵William Huang
SampleProfileLoader::promoteMergeNotInlinedContextSamples SampleProfileLoader::promoteMergeNotInlinedContextSample adds certain uninlined functions to the sample profile map (unordered_map, which is previously read from a profile file). This action may cause the map to be rehashed, invalidating all pointers to FunctionSamples used by many members of SampleProfileLoader, while the existing code did nothing to guard against that. This bug is theoretical since adding a few new functions to a large profile usually won't trigger a rehash, or even if there's a rehash std::unordered_map tries its best to expand its capacity in-place. This bug will trigger if the container type of sample profile map is changed to llvm::DenseMap or other implementation, such as in D147740, for SampleProfReader's performance reason. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D157061
2023-07-07[SamplePGO] Fix ICE that callee samples returns null while finding import ↵wlei
functions We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE. In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued. See this reduced program: file1.c: ``` int bar(int x); int(*foo())() { return bar; }; void func() { int (*fptr)(int); fptr = foo(); a += (*fptr)(10); } ``` file2.c: ``` int bar(int x) { return x + 1;} ``` The two CALL: `foo` and `(*ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(*ptr)` to a direct call `bar(..)`. Note that at this time, `(*ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154637
2023-06-29[CSSPGO] Enable stale profile matching by default for CSSPGOwlei
We tested the stale profile matching on several Meta's internal services, all results are positive, for instance, in one service that refreshed its profile every one or two weeks, it consistently gave 1~2% performance improvement. We also observed an instance that a trivial refactoring caused a 2% regression and the matching can successfully recover the whole regression. Therefore, we'd like to turn it on by default for CSSPGO. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154027