summaryrefslogtreecommitdiff
path: root/llvm/lib/ProfileData/MemProf.cpp
AgeCommit message (Collapse)Author
2025-05-19[NFC][MemProf] Move IndexedMemProfData to its own header. (#140503)Snehasish Kumar
Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-19[NFC][MemProf] Move getGUID out of IndexedMemProfRecord (#140502)Snehasish Kumar
Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-19[NFC][MemProf] Move Radix tree methods to their own header and cpp. (#140501)Snehasish Kumar
Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-01[MemProf] Add v4 which contains CalleeGuids to CallSiteInfo. (#137394)Snehasish Kumar
This patch adds CalleeGuids to the serialized format and increments the version number to 4. The unit tests are updated to include a new test for v4 and the YAML format is also updated to be able to roundtrip the v4 format.
2025-04-28[MemProf][NFC] Hoist size computation out of the loop for v3 (#137479)Snehasish Kumar
Similar to the suggestion in #137394. In this case apply it to the current binary format (v3).
2025-04-28Clean up external users of GlobalValue::getGUID(StringRef) (#129644)Owen Rodley
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.
2025-03-12[MemProf] Extend CallSite information to include potential callees. (#130441)Snehasish Kumar
* Added YAML traits for `CallSiteInfo` * Updated the `MemProfReader` to pass `Frames` instead of the entire `CallSiteInfo` * Updated test cases to use `testing::Field` * Add YAML sequence traits for CallSiteInfo in MemProfYAML * Also extend IndexedMemProfRecord * XFAIL the MemProfYaml round trip test until we update the profile format For now we only read and write the additional information from the YAML format. The YAML round trip test will be enabled when the serialized format is updated.
2024-12-18[memprof] Move Frame::hash and hashCallStack to IndexedMemProfData (NFC) ↵Kazu Hirata
(#120365) Now that IndexedMemProfData::{addFrame,addCallStack} are the only callers of Frame::hash and hashCallStack, respectively, this patch moves those functions into IndexedMemProfData and makes them private. With this patch, we can obtain FrameId and CallStackId only through addFrame and addCallStack, respectively.
2024-11-24[memprof] Speed up llvm-profdata (#117446)Kazu Hirata
CallStackRadixTreeBuilder::build takes the parameter MemProfFrameIndexes by value, involving copies: std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>> MemProfFrameIndexes Then "build" makes another copy of MemProfFrameIndexe and passes it to encodeCallStack for every call stack, which is painfully slow. This patch changes the type to a pointer so that we don't have to make a copy every time we pass the argument. Without this patch, it takes 553 seconds to run "llvm-profdata merge" on a large MemProf raw profile. This patch shortenes that down to 67 seconds.
2024-11-22[memprof] Remove verifyIndexedMemProfRecord and verifyFunctionProfileData ↵Kazu Hirata
(#117412) This patch removes two functions to verify the consistency between: - IndexedAllocationInfo::CallStack - IndexedAllocationInfo::CSId Now that MemProf format Version 1 has been removed, IndexedAllocationInfo::CallStack doesn't participate in either serialization or deserialization, so we don't care about the consistency between the two fields in IndexAllocationInfo. Subsequent patches will remove uses of the old field and eventually remove the field.
2024-11-22Reapply "[MemProf] Use radix tree for alloc contexts in bitcode summaries" ↵Teresa Johnson
(#117395) (#117404) This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.
2024-11-22Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" ↵Teresa Johnson
(#117395) Reverts llvm/llvm-project#117066 This is causing some build bot failures that need investigation.
2024-11-22[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066)Teresa Johnson
Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.
2024-11-22[memprof] Remove MemProf format Version 1 (#117357)Kazu Hirata
This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.
2024-11-20[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014)Teresa Johnson
Prepare for usage in the bitcode reader/writer where we already have a LinearFrameId: - templatize input frame id type in CallStackRadixTreeBuilder - templatize input frame id type in computeFrameHistogram - make the map from FrameId to LinearFrameId optional We plan to use the same radix format in the ThinLTO summary records, where we already have a LinearFrameId.
2024-11-15[memprof] Remove MemProf format Version 0 (#116442)Kazu Hirata
This patch removes MemProf format Version 0 now that version 2 and 3 seem to be working well. I'm not touching version 1 for now because some tests still rely on version 1. Note that Version 0 is identical to Version 1 except that the MemProf section of the indexed format has a MemProf version field.
2024-06-10[memprof] Fix comment typos (NFC)Kazu Hirata
2024-06-07[memprof] Remove extraneous memprof:: (NFC) (#94825)Kazu Hirata
2024-06-07[memprof] Improve deserialization performance in V3 (#94787)Kazu Hirata
We call llvm::sort in a couple of places in the V3 encoding: - We sort Frames by FrameIds for stability of the output. - We sort call stacks in the dictionary order to maximize the length of the common prefix between adjacent call stacks. It turns out that we can improve the deserialization performance by modifying the comparison functions -- without changing the format at all. Both places take advantage of the histogram of Frames -- how many times each Frame occurs in the call stacks. - Frames: We serialize popular Frames in the descending order of popularity for improved cache locality. For two equally popular Frames, we break a tie by serializing one that tends to appear earlier in call stacks. Here, "earlier" means a smaller index within llvm::SmallVector<FrameId>. - Call Stacks: We sort the call stacks to reduce the number of times we follow pointers to parents during deserialization. Specifically, instead of comparing two call stacks in the strcmp style -- integer comparisons of FrameIds, we compare two FrameIds F1 and F2 with Histogram[F1] < Histogram[F2] at respective indexes. Since we encode from the end of the sorted list of call stacks, we tend to encode popular call stacks first. Since the two places use the same histogram, we compute it once and share it in the two places. Sorting the call stacks reduces the number of "jumps" by 74% when we deserialize all MemProfRecords. The cycle and instruction counts go down by 10% and 1.5%, respectively. If we sort the Frames in addition to the call stacks, then the cycle and instruction counts go down by 14% and 1.6%, respectively, relative to the same baseline (that is, without this patch).
2024-06-07[ProfileData] Add const to a few places (NFC) (#94803)Kazu Hirata
2024-06-06[memprof] Add CallStackRadixTreeBuilder (#93784)Kazu Hirata
Call stacks are a huge portion of the MemProf profile, taking up 70+% of the profile file size. This patch implements a radix tree to compress call stacks, which are known to have long common prefixes. Specifically, CallStackRadixTreeBuilder, introduced in this patch, takes call stacks in the MemProf profile, sorts them in the dictionary order to maximize the common prefix between adjacent call stacks, and then encodes a radix tree into a single array that is ready for serialization. The resulting radix array is essentially a concatenation of call stack arrays, each encoded with its length followed by the payload, except that these arrays contain "instructions" like "skip 7 elements forward" to borrow common prefixes from other call stacks. This patch does not integrate with the MemProf serialization/deserialization infrastructure yet. Once integrated, the radix tree is expected to roughly halve the file size of the MemProf profile.
2024-06-06[memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> (NFC) ↵Kazu Hirata
(#94432) This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, when I sort all call stacks by their lengths, the length at the first percentile is already 2. That is, 99 percent of call stacks do not take advantage of the inline element. Using std::vector<Frame> reduces the cycle and instruction counts by 11% and 22%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
2024-05-31[memprof] Replace uint32_t with LinearCallStackId where appropriate (NFC) ↵Kazu Hirata
(#94023) This patch replaces uint32_t with LinearCallStackId where appropriate. I'm replacing uint64_t with LinearCallStackId in writeMemProfCallStackArray, but that's OK because it's a value to be used as LinearCallStackId anyway.
2024-05-31[memprof] Use uint32_t for linear call stack IDs (#93924)Kazu Hirata
This patch switches to uint32_t for linear call stack IDs as uint32_t is sufficient to index into the call stack array.
2024-05-30[memprof] Use linear IDs for Frames and call stacks (#93740)Kazu Hirata
With this patch, we stop using on-disk hash tables for Frames and call stacks. Instead, we'll write out all the Frames as a flat array while maintaining mappings from FrameIds to the indexes into the array. Then we serialize call stacks in terms of those indexes. Likewise, we'll write out all the call stacks as another flat array while maintaining mappings from CallStackIds to the indexes into the call stack array. One minor difference from Frames is that the indexes into the call stack array are not contiguous because call stacks are variable-length objects. Then we serialize IndexedMemProfRecords in terms of the indexes into the call stack array. Now, we describe each call stack with 32-bit indexes into the Frame array (as opposed to the 64-bit FrameIds in Version 2). The use of the smaller type cuts down the profile file size by about 40% relative to Version 2. The departure from the on-disk hash tables contributes a little bit to the savings, too. For now, IndexedMemProfRecords refer to call stacks with 64-bit indexes into the call stack array. As a follow-up, I'll change that to uint32_t, including necessary updates to RecordWriterTrait.
2024-05-28[memprof] Add MemProf format Version 3 (#93608)Kazu Hirata
This patch adds Version 3 for development purposes. For now, this patch adds V3 as a copy of V2. For the most part, this patch adds "case Version3:" wherever "case Version2:" appears. One exception is writeMemProfV3, which is copied from writeMemProfV2 but updated to write out memprof::Version3 to the MemProf header. We'll incrementally modify writeMemProfV3 in subsequent patches.
2024-05-28[memprof] Remove const from the return type of toMemProfRecord (#93415)Kazu Hirata
"const" being removed in this patch prevents the move semantics from being used in: AI.CallStack = Callback(IndexedAI.CSId); With this patch on an indexed MemProf Version 2 profile, the cycle count and instruction count go down by 13.3% and 26.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
2024-05-24[memprof] Call llvm::SmallVector::reserve (#93324)Kazu Hirata
2024-05-23[memprof] Use std::move in toMemProfRecord (#93133)Kazu Hirata
std::move and reserve here result in a measurable speed-up in llvm-profdata modified to deserialize all MemProfRecords. The cycle count goes down by 7.1% while the instruction count goes down by 21%.
2024-05-15[memprof] Pass FrameIdConverter and CallStackIdConverter by reference (#92327)Kazu Hirata
CallStackIdConverter sets LastUnmappedId when a mapping failure occurs. Now, since toMemProfRecord takes an instance of CallStackIdConverter by value, namely std::function, the caller of toMemProfRecord never receives the mapping failure that occurs inside toMemProfRecord. The same problem applies to FrameIdConverter. The patch fixes the problem by passing FrameIdConverter and CallStackIdConverter by reference, namely llvm::function_ref. While I am it, this patch deletes the copy constructor and copy assignment operator to avoid accidental copies.
2024-04-25[memprof] Move getFullSchema and getHotColdSchema outside ↵Kazu Hirata
PortableMemInfoBlock (#90103) These functions do not operate on PortableMemInfoBlock. This patch moves them outside the class.
2024-04-23[memprof] Take Schema into account in PortableMemInfoBlock::serializedSize ↵Kazu Hirata
(#89824) PortableMemInfoBlock::{serialize,deserialize} take Schema into account, allowing us to serialize/deserialize a subset of the fields. However, PortableMemInfoBlock::serializedSize does not. That is, it assumes that all fields are always serialized and deserialized. In other words, if we choose to serialize/deserialize a subset of the fields, serializedSize would claim more storage than we actually need. This patch fixes the problem by teaching serializedSize to take Schema into account. For now, this patch has no effect on the actual indexed MemProf profile because we serialize/deserialize all fields, but that might change in the future. Aside from check-llvm, I tested this patch by verifying that llvm-profdata generates bit-wise identical files for each version for a large raw MemProf file I have.
2024-04-16[llvm] Drop unaligned from calls to readNext (NFC) (#88841)Kazu Hirata
Now readNext defaults to unaligned accesses. This patch drops unaligned to improve readability.
2024-04-16[memprof] Use CSId to construct MemProfRecord (#88362)Kazu Hirata
We are in the process of referring to call stacks with CallStackId in IndexedMemProfRecord and IndexedAllocationInfo instead of holding call stacks inline (both in memory and the serialized format). Doing so deduplicates call stacks and reduces the MemProf profile file size. Before we can eliminate the two fields holding call stacks inline: - IndexedAllocationInfo::CallStack - IndexedMemProfRecord::CallSites we need to eliminate all the read operations on them. This patch is a step toward that direction. Specifically, we eliminate the read operations in the context of MemProfReader and RawMemProfReader. A subsequent patch will eliminate the read operations during the serialization.
2024-04-15[memprof] Fix typos in serializedSizeV0 and serializedSizeV2 (#88629)Kazu Hirata
The first field to serialize is the size of IndexedMemProfRecord::AllocSites. It has nothing to do with GlobalValue::GUID. This happens to work because of: using GUID = uint64_t;
2024-04-07[memprof] Use static instead of anonymous namespaces (#87889)Kazu Hirata
This patch replaces anonymous namespaces with static as per LLVM Coding Standards.
2024-04-03[memprof] Add Version2 of IndexedMemProfRecord serialization (#87455)Kazu Hirata
I'm currently developing a new version of the indexed memprof format where we deduplicate call stacks in IndexedAllocationInfo::CallStack and IndexedMemProfRecord::CallSites. We refer to call stacks with integer IDs, namely CallStackId, just as we refer to Frame with FrameId. The deduplication will cut down the profile file size by 80% in a large memprof file of mine. As a step toward the goal, this patch teaches IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A subsequent patch will add Version2 support to llvm-profdata. The essense of the patch is to replace the serialization of a call stack, a vector of FrameIDs, with that of a CallStackId. That is: const IndexedAllocationInfo &N = ...; ... LE.write<uint64_t>(N.CallStack.size()); for (const FrameId &Id : N.CallStack) LE.write<FrameId>(Id); becomes: LE.write<CallStackId>(N.CSId);
2024-03-25[memprof] Compute CallStackId when deserializing IndexedAllocationInfo (#86421)Kazu Hirata
There are two ways to create in-memory instances of IndexedAllocationInfo -- deserialization of the raw MemProf data and that of the indexed MemProf data. With: commit 74799f424063a2d751e0f9ea698db1f4efd0d8b2 Author: Kazu Hirata <kazu@google.com> Date: Sat Mar 23 19:50:15 2024 -0700 we compute CallStackId for each call stack in IndexedAllocationInfo while deserializing the raw MemProf data. This patch does the same while deserilizing the indexed MemProf data. As with the patch above, this patch does not add any use of CallStackId yet.
2024-03-23[memprof] Add call stack IDs to IndexedAllocationInfo (#85888)Kazu Hirata
The indexed MemProf file has a huge amount of redundancy. In a large internal application, 82% of call stacks, stored in IndexedAllocationInfo::CallStack, are duplicates. We should work toward deduplicating call stacks by referring to them with unique IDs with actual call stacks stored in a separate data structure, much like we refer to memprof::Frame with memprof::FrameId. At the same time, we need to facilitate a graceful transition from the current version of the MemProf format to the next. We should be able to read (but not write) the current version of the MemProf file even after we move onto the next one. With those goals in mind, I propose to have an integer ID next to CallStack in IndexedAllocationInfo to refer to a call stack in a succinct manner. We'll gradually increase the areas of the compiler where IDs and call stacks have one-to-one correspondence and eventually remove the existing CallStack field. This patch adds call stack ID, named CSId, to IndexedAllocationInfo and teaches the raw profile reader to compute unique call stack IDs and store them in the new field. It does not introduce any user of the call stack IDs yet, except in verifyFunctionProfileData.
2023-11-29[MemProf][NFC] Correct comment about stripping of suffixes in profile (#73840)Teresa Johnson
The comment about the stripping of suffixes when creating the indexed MemProf profile was partially incorrect, as we do not strip ".__uniq." suffixes by default (by design). Update the comment accordingly.
2023-10-13Use llvm::endianness::{big,little,native} (NFC)Kazu Hirata
Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class. This patch replaces {big,little,native} with llvm::endianness::{big,little,native}. This patch completes the migration to llvm::endianness and llvm::endianness::{big,little,native}. I'll post a separate patch to remove the migration helpers in llvm/Support/Endian.h: using endianness = llvm::endianness; constexpr llvm::endianness big = llvm::endianness::big; constexpr llvm::endianness little = llvm::endianness::little; constexpr llvm::endianness native = llvm::endianness::native;
2023-08-29[memprof] Canonicalize the function name prior to hashing.Snehasish Kumar
Canonicalize the function name (strip suffixes etc) to ensure that function name suffixes added by late stage passes do not cause mismatches when memprof profile data is consumed. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D159132
2022-04-08[memprof] Deduplicate and outline frame storage in the memprof profile.Snehasish Kumar
The current implementation of memprof information in the indexed profile format stores the representation of each calling context fram inline. This patch uses an interned representation where the frame contents are stored in a separate on-disk hash table. The table is indexed via a hash of the contents of the frame. With this patch, the compressed size of a large memprof profile reduces by ~22%. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D123094
2022-03-22Reland "[memprof] Store callsite metadata with memprof records."Snehasish Kumar
This reverts commit f4b794427e8037a4e952cacdfe7201e961f31a6f. Reland with underlying msan issue fixed in D122260.
2022-03-21Revert "[memprof] Store callsite metadata with memprof records."Mitch Phillips
This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765. Reason: Causes the MSan buildbot to fail (see comments on https://reviews.llvm.org/D121179 for more information
2022-03-21[memprof] Store callsite metadata with memprof records.Snehasish Kumar
To ease profile annotation, each of the callsites in a function can be annotated with profile data - "IR metadata format for MemProf" [1]. This patch extends the on-disk serialized record format to store the debug information for allocation callsites incl inline frames. This change is incompatible with the existing format i.e. indexed profiles must be regenerated, raw profiles are unaffected. [1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D121179
2022-02-24Cleanup includes: ProfileDataserge-sans-paille
Estimation of the impact on preprocessor output: before: 1067349756 after: 1065940348 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120434
2022-02-17Reland "[memprof] Extend the index prof format to include memory profiles."Snehasish Kumar
This patch adds support for optional memory profile information to be included with and indexed profile. The indexed profile header adds a new field which points to the offset of the memory profile section (if present) in the indexed profile. For users who do not utilize this feature the only overhead is a 64-bit offset in the header. The memory profile section contains (1) profile metadata describing the information recorded for each entry (2) an on-disk hashtable containing the profile records indexed via llvm::md5(function_name). We chose to introduce a separate hash table instead of the existing one since the indexing for the instrumented fdo hash table is based on a CFG hash which itself is perturbed by memprof instrumentation. This commit also includes the changes reviewed separately in D120093. Differential Revision: https://reviews.llvm.org/D120103
2022-02-17Revert "Reland "[memprof] Extend the index prof format to include memory ↵Snehasish Kumar
profiles."" This reverts commit 807ba7aace188ada83ddb4477265728e97346af1.
2022-02-17Revert "[memprof] Fix frame deserialization on big endian systems."Snehasish Kumar
This reverts commit c74389b4b58d8db3f8262ce15b9d514d62fe265c. This broke the ml-opt-x86-64 build. https://lab.llvm.org/buildbot#builders/9/builds/4127