llvm-project.git/bolt/lib/Profile/DataAggregator.cpp, branch users/xur-llvm/spr/bolt

[bolt] Use a fixed name for the kernel image

2024-12-20T00:51:05+00:00

Use a fixed name for the kernel image to process kernel profiles,
regardless of the presence of a build ID or not.

This is to address the issue when the provided kernel image lacks a
matching build ID. This name, "[kernel.kallsyms]", is the default
for kernel DSOs in the Linux kernel source code
(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/dso.c#n428).

While "[guest.kernel.kallsyms]" is the kernel DSO name for guest kernel,
support for VM profiles is currently limited. Therefore, we can
skip this name for now.

Merge branch 'main' into users/xur-llvm/spr/bolt

2024-12-20T00:36:27+00:00

Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023)

2024-12-02T09:20:40+00:00

When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing, which allows passing
perf events without BOLT having to parse them by invoking 'perf script'.
The flag is used to pass a mock perf profile that has two memory
mappings for a mock binary that has two text segments. The mapping
size is updated as `parseMMapEvents` now processes all text segments.

Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815)"

2024-11-26T13:59:30+00:00

This caused test failures, see comment on the PR:

  Failed Tests (2):
    BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0
    BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0

> When a binary has multiple text segments, the Size is computed as the
> difference of the last address of these segments from the BaseAddress.
> The base addresses of all text segments must be the same.
>
> Introduces flag 'perf-script-events' for testing. It allows passing perf events
> without BOLT having to parse them using 'perf script'. The flag is used to
> pass a mock perf profile that has two memory mappings for a mock binary
> that has two text segments. The size of the mapping is updated as this
> change `parseMMapEvents` processes all text segments.

This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.

[BOLT] DataAggregator support for binaries with multiple text segments (#92815)

2024-11-25T13:12:43+00:00

When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing. It allows passing perf events
without BOLT having to parse them using 'perf script'. The flag is used to
pass a mock perf profile that has two memory mappings for a mock binary
that has two text segments. The size of the mapping is updated as this
change `parseMMapEvents` processes all text segments.

[BOLT] Set call to continuation count in pre-aggregated profile

2024-11-08T00:20:19+00:00

#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
- BranchFrom is in another function or DSO,
- BranchTo may be a call continuation site:
  - not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (`getFallthroughsInTrace`)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] `DataAggregator::getFallthroughsInTrace`
https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915

Test Plan: added callcont-fallthru.s

Reviewers: maksfb, ayermolo, ShatianWang, dcci

Reviewed By: maksfb, ShatianWang

Pull Request: https://github.com/llvm/llvm-project/pull/109486

[Bolt] Fix address translation for KASLR kernel

2024-10-30T16:33:39+00:00

This patch enables Bolt to analyze kernel addresses that
have been randomized by KASLR. It parses memory map (MMap)
entries within perf files to find the address mapping.

[BOLT] Add profile density computation

2024-10-25T01:30:59+00:00

Reuse the definition of profile density from llvm-profgen (#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
  pre-aggregated data),
- function density is the ratio of dynamically executed function bytes
  to the static function size in bytes,
- profile density:
  - functions are sorted by density in decreasing order, accumulating
    their respective sample counts,
  - profile density is the smallest density covering 99% of total sample
    count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
```
BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
```

Test Plan: updated pre-aggregated-perf.test

Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe, wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/101094

[BOLT] Set RawBranchCount in DataAggregator

2024-10-25T01:28:44+00:00

Align DataAggregator (Linux perf and pre-aggregated profile reader) to
DataReader (fdata profile reader) behavior: set BF->RawBranchCount which
is used in profile density computation (#101094).

Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/101093

[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)

2024-09-23T13:14:51+00:00

… segments in Elf binary.

The heuristic is improved by also taking into account that only
executable segments should contain instructions.

Fixes #109384.