summaryrefslogtreecommitdiff
path: root/bolt/lib/Rewrite/RewriteInstance.cpp
AgeCommit message (Collapse)Author
2025-11-14[BOLT][print] Add option '--print-only-file' (NFC) (#168023)YongKang Zhu
With this option we can pass to BOLT names of functions to be printed through a file instead of specifying them all on command line.
2025-11-09[BOLT] Support restartable sequences in tcmalloc (#167195)Maksim Panchenko
Add `RSeqRewriter` to detect code references from `__rseq_cs` section and ignore function referenced from that section. Code references are detected via relocations (static or dynamic). Note that the abort handler is preceded by a 4-byte signature byte sequence and we cannot relocate the handler without that the signature, otherwise the application may crash. Thus we are ignoring the function, i.e. making sure it's not separated from its signature.
2025-11-08[BOLT] Refactor tracking internals of BinaryFunction. NFCI (#167074)Maksim Panchenko
In addition to tracking offsets inside a `BinaryFunction` that are referenced by data relocations, we need to track those relocations too. Plus, we will need to map symbols referenced by such relocations back to the containing function. This change introduces `BinaryFunction::InternalRefDataRelocations` to track the aforementioned relocations and expands `BinaryContext::SymbolToFunctionMap` to include local/temp symbols involved in relocation processing. There is no functional change introduced that should affect the output. Future PRs will use the new tracking capabilities.
2025-11-07[BOLT] Remove redundant declarations (NFC) (#166893)Kazu Hirata
In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Identified with readability-redundant-declaration.
2025-10-25[BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065)Maksim Panchenko
CreatePastEnd parameter had no effect on the label creation. Remove it.
2025-10-20[ADT] Prepare for deprecation of StringSwitch cases with 4+ args. NFC. (#164173)Jakub Kuderski
Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: https://github.com/llvm/llvm-project/pull/163405.
2025-10-14[bolt] Fix typos discovered by codespell (#124726)Christian Clauss
https://github.com/codespell-project/codespell ```bash codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \ --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas ```
2025-10-09[BOLT] Support fragment symbol mapped to the parent address (#162727)Amir Ayupov
Observed in GCC-produced binary. Emit a warning for the user. Test Plan: added bolt/test/X86/fragment-alias.s
2025-10-09[BOLT] Modify warning when --use-old-text fails. NFC (#162731)Maksim Panchenko
When --use-old-text fails, we are emitting all code meant for the original `.text` section into the new section. This could be more bytes compared to those emitted under no `--use-old-text`, especially under `--lite`. As a result, `--use-old-text` results in a larger binary, not smaller which could be confusing to the user. Add more information to the warning, including recommendation to rebuild without `--use-old-text` for smaller binary size.
2025-10-08Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing ↵Gergely Bálint
binaries with pac-ret hardening" (#162353) (#162435) Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)" (#162353) This reverts commit c7d776b06897567e2d698e447d80279664b67d47. #120064 was reverted for breaking builders. Fix: changed the mismatched type in MarkRAStates.cpp to `auto`. --- Original message: OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records whether the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-07Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries ↵Gergely Bálint
with pac-ret hardening" (#162353) Reverts llvm/llvm-project#120064. @gulfemsavrun reported that the patch broke toolchain builders.
2025-10-07[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with ↵Gergely Bálint
pac-ret hardening (#120064) OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records if the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-06[BOLT] Always treat function entry as code (#160161)Maksim Panchenko
If an address has both, a data marker "$d" and a function symbol associated with it, treat it as code.
2025-10-03[BOLT][AArch64] Refuse to run CDSplit pass (#159351)Paschalis Mpeis
LongJmp does not support warm blocks. On builds without assertions, this may lead to unexpected crashes. This patch exits with a clear message.
2025-10-03[BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note (#161206)Gergely Bálint
This commit adds the GNUPropertyRewriter, which parses features from the .note.gnu.property section. Currently we only read the bit indicating BTI support (GNU_PROPERTY_AARCH64_FEATURE_1_BTI). As BOLT does not add BTI landing pads to targets of indirect branches/calls, we have to emit a warning that the output binary may be corrupted.
2025-10-01[BOLT] Remove unused parameter. NFC (#161617)Maksim Panchenko
`Skip` parameter not used/set inside `analyzeRelocation()`.
2025-09-25[BOLT] Don't check address past end of function for data/code marker ↵YongKang Zhu
annotation (#159210) We want to annotate function with data and code markers whose addresses fall within the range of the function, so setting `CheckPastEnd` to false.
2025-08-27[BOLT][AArch64] Fix another cause of extra entry point misidentification ↵YongKang Zhu
(#155055)
2025-08-22[BOLT] Add dump-dot-func option for selective function CFG dumping (#153007)YafetBeyene
## Change: * Added `--dump-dot-func` command-line option that allows users to dump CFGs only for specific functions instead of dumping all functions (the current only available option being `--dump-dot-all`) ## Usage: * Users can now specify function names or regex patterns (e.g., `--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate .dot files only for functions of interest * Aims to save time when analysing specific functions in large binaries (e.g., only dumping graphs for performance-critical functions identified through profiling) and we can now avoid reduce output clutter from generating thousands of unnecessary .dot files when analysing large binaries ## Testing The introduced test `dump-dot-func.test` confirms the new option does the following: - [x] 1. `dump-dot-func` can correctly filter a specified functions - [x] 2. Can achieve the above with regexes - [x] 3. Can do 1. with a list of functions - [x] No option specified creates no dot files - [x] Passing in a non-existent function generates no dumping messages - [x] `dump-dot-all` continues to work as expected
2025-08-20[BOLT] Validate extra entry point by querying data marker symbols (#154611)YongKang Zhu
Look up marker symbols and decide whether candidate is really extra entry point in `adjustFunctionBoundaries()`.
2025-08-19[BOLT] Keep X86 HLT instruction as a terminator in user mode (#154402)Maksim Panchenko
This is a follow-up to #150963. X86 HLT instruction may appear in the user-level code, in which case we should treat it as a terminator. Handle it as a non-terminator in the Linux kernel mode.
2025-07-29[BOLT][AArch64] Compensate for missing code markers (#151060)Maksim Panchenko
Code written in assembly can have missing code markers. In BOLT, we can compensate by recognizing that a function entry point should start a code sequence. Seen such code in lua jit library.
2025-07-25[BOLT] Require CFG in BAT mode (#150488)Amir Ayupov
`getFallthroughsInTrace` requires CFG for functions not covered by BAT, even in BAT/fdata mode. BAT-covered functions go through special handling in fdata (`BAT->getFallthroughsInTrace`) and YAML (`DataAggregator::writeBATYAML`) modes. Since all modes (BAT/no-BAT, YAML/fdata) now need disassembly/CFG construction: - drop special BAT/fdata handling that omitted disassembly/CFG in `RewriteInstance::run`, enabling *CFG for all non-BAT functions*, - switch `getFallthroughsInTrace` to check if a function has CFG, - which *allows emitting profile for non-simple functions* in all modes. Previously, traces in non-simple functions were reported as invalid/ mismatching disassembled function contents. This change reduces the number of such invalid traces and increases the number of profiled functions. These functions may participate in function reordering via call graph profile. Test Plan: updated unclaimed-jt-entries.s
2025-07-24[BOLT] More refactoring of PHDR handling. NFC (#148932)Maksim Panchenko
Replace ad-hoc adjustment of the program header count with info from the new segment list.
2025-07-02[BOLT] Decouple new segment creation from PHDR rewrite. NFCI (#146111)Maksim Panchenko
Refactor handling of PHDR table rewrite to make modifications easier.
2025-06-30[BOLT] Refactor mapCodeSections(). NFC (#146434)Maksim Panchenko
Factor out non-relocation specific code into a separate function.
2025-06-28[BOLT] Push code to higher addresses under options (#146180)Maksim Panchenko
When --hot-functions-at-end is used in combination with --use-old-text, allocate code at the highest possible addresses withing old .text. This feature is mostly useful for HHVM, where it is beneficial to have hot static code placed as close as possible to jitted code.
2025-06-27[BOLT] Skip creation of new segments (#146023)Maksim Panchenko
When all section contents are updated in-place, we can skip creation of new segment(s), save disk space, and free up low memory addresses. Currently, this feature only works with --use-gnu-stack.
2025-06-26[BOLT] Refactor NewTextSegmentAddress handling (#145950)Maksim Panchenko
Refactor the code for NewTextSegmentAddress to correctly point at the true start of the segment when PHDR table is placed at the beginning. We used to offset NewTextSegmentAddress by PHDR table plus cache line alignment. NFC for proper binaries. Some YAML binaries from our tests will diverge due to bad segment address/offset alignment.
2025-06-20[BOLT][NFCI] Use FileSymbols for local symbol disambiguation (#89088)Amir Ayupov
Remove SymbolToFileName mapping from every local symbol to its containing FILE symbol name, and reuse FileSymbols to disambiguate local symbols instead. Also removes the check for `ld-temp.o` file symbol which was added to prevent LTO build mode from affecting the disambiguated name. This may cause incompatibility when using the profile collected on a binary built in a different mode than the input binary. Addresses #90661. Speeds up discover file objects by 5-10% for large binaries: - binary with ~1.2M symbols: 12.6422s -> 12.0297s - binary with ~4.5M symbols: 48.8851s -> 43.7315s
2025-06-20[BOLT][NFCI] Use heuristic for matching split global functions (#90429)Amir Ayupov
This change speeds up fragment matching for large BOLTed binaries where all fragments of global parent functions are put under `bolt-pseudo.o` file symbol: - before: iterating over symbols under `bolt-pseudo.o` only to fail to find a parent, - after: bail out immediately and use a global parent by name. Test Plan: NFC, updated register-fragments-bolt-symbols.s
2025-06-02[BOLT] Fix references in ignored functions in CFG state (#140678)Maksim Panchenko
When we call setIgnored() on functions that already have CFG built, these functions are not going to get emitted and we risk missing external function references being updated. To mitigate the potential issues, run scanExternalRefs() on such functions to create patches/relocations. Since scanExternalRefs() relies on function relocations, we have to preserve relocations until the function is emitted. As a result, the memory overhead without debug info update could reach up to 2%.
2025-05-29[BOLT][AArch64] Detect veneers with missing data markers (#142069)Maksim Panchenko
The linker may omit data markers for long absolute veneers causing BOLT to treat data as code. Detect such veneers and introduce data markers artificially before BOLT's disassembler kicks in.
2025-05-17[BOLT] Remove unused local variables (NFC) (#140421)Kazu Hirata
While I'm at it, this patch removes GetExecutablePath, which becomes unused after removing the sole use.
2025-05-14[BOLT][heatmap] Add synthetic hot text section (#139824)Amir Ayupov
In heatmap mode, report samples and utilization of the section(s) between hot text markers `[__hot_start, __hot_end)`. The intended use is with multi-way splitting where there are several sections that contain "hot" code (e.g. `.text.warm` with CDSplit). Addresses the comment on #139193 https://github.com/llvm/llvm-project/pull/139193#pullrequestreview-2835274682 Test Plan: updated heatmap-preagg.test
2025-05-13[BOLT] Print heatmap from perf2bolt (#139194)Amir Ayupov
Add perf2bolt `--heatmap` option to produce heatmaps during profile aggregation. Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode (`perf2bolt --heatmap`), which impacts perf.data handling: exclusive mode covers all addresses, whereas optional mode consumes attached profile only covering function addresses. Test Plan: updated per2bolt tests: - pre-aggregated-perf.test: pre-aggregated data, - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input, - perf_test.test: no-LBR perf data.
2025-05-13[BOLT][heatmap] Compute section utilization and partition score (#139193)Amir Ayupov
Heatmap groups samples into buckets of configurable size (`--block-size` flag with 64 bytes as the default =X86 cache line size). Buckets are mapped to containing sections; for buckets that cover multiple sections, they are attributed to the first overlapping section. Buckets not mapped to a section are reported as unmapped. Heatmap reports **section hotness** which is a percentage of samples attributed to the section. Define **section utilization** as a percentage of buckets with non-zero samples relative to the total number of section buckets. Also define section **partition score** as a product of section hotness (where total excludes unmapped buckets) and mapped utilization, ranging from 0 to 1 (higher is better). The intended use of new metrics is with **production profile** collected from BOLT-optimized binary. In this case the partition score of .text (hot text if function splitting is enabled) reflects **optimization profile** representativeness and the quality of hot-cold splitting. Partition score of 1 means that all samples fall into hot text, and all buckets (cache lines) in hot text are exercised, equivalent to perfect hot-cold splitting. Test Plan: updated heatmap-preagg.test
2025-05-10[BOLT] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#139403)Kazu Hirata
2025-04-29[BOLT][RelVTable] Skip special handling on non virtual function pointer ↵YongKang Zhu
relocations (#137406) Besides virtual function pointers vtable could contain other kinds of entries like those for RTTI data that also require relocations. We need to skip special handling on relocations for non virtual function pointers in relative vtable. Co-authored-by: Maksim Panchenko <maks@meta.com>
2025-04-18[BOLT] Add --custom-allocation-vma flag (#136385)Rafael Auler
Add an advanced-user flag so we are able to rewrite binaries when we fail to identify a suitable location to put new code. User then can supply a custom location via --custom-allocation-vma. This happens more obviously if the binary has segments mapped to very high addresses.
2025-04-18[BOLT] Don't choke on nobits symbols (#136384)Rafael Auler
2025-04-16[BOLT][Instrumentation] Initial instrumentation support for RISCV64 (#133882)wangjue
This patch adds code generation for RISCV64 instrumentation.The work involved includes the following three points: a) Implements support for instrumenting direct function call and jump on RISC-V which relies on , Atomic instructions (used to increment counters) are only available on RISC-V when the A extension is used. b) Implements support for instrumenting direct function inderect call by implementing the createInstrumentedIndCallHandlerEntryBB and createInstrumentedIndCallHandlerExitBB interfaces. In this process, we need to accurately record the target address and IndCallID to ensure the correct recording of the indirect call counters. c)Implemented the RISCV64 Bolt runtime library, implemented some system call interfaces through embedded assembly. Get the difference between runtime addrress of .text section andstatic address in section header table, which in turn can be used to search for indirect call description. However, the community code currently has problems with relocation in some scenarios, but this has nothing to do with instrumentation. We may continue to submit patches to fix the related bugs.
2025-04-15[BOLT] Enable hugify for AArch64 (#117158)alekuz01
Add required hugify instrumentation and runtime libraries support for AArch64. Fixes #58226 Unblocks #62695
2025-04-14[BOLT] Support relative vtable (#135449)YongKang Zhu
To handle relative vftable, which is enabled with clang option `-fexperimental-relative-c++-abi-vtables`, we look for PC relative relocations whose fixup locations fall in vtable address ranges. For such relocations, actual target is just virtual function itself, and the addend is to record the distance between vtable slot for target virtual function and the first virtual function slot in vtable, which is to match generated code that calls virtual function. So we can skip the logic of handling "function + offset" and directly save such relocations for future fixup after new layout is known.
2025-04-04[BOLT][AArch64] Fix symbolization of unoptimized TLS access (#134332)Maksim Panchenko
TLS relocations may not have a valid BOLT symbol associated with them. While symbolizing the operand, we were checking for the symbol value, and since there was no symbol the check resulted in a crash. Handle TLS case while performing operand symbolization on AArch64.
2025-04-03[BOLT] Gadget scanner: detect non-protected indirect calls (#131899)Anatoly Trosinenko
Implement the detection of non-protected indirect calls and branches similar to pac-ret scanner.
2025-03-27[BOLT][AArch64] Add partial support for lite mode (#133014)Maksim Panchenko
In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by #116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.
2025-03-21[BOLT] Gadget scanner: streamline issue reporting (#131896)Anatoly Trosinenko
In preparation for adding more gadget kinds to detect, streamline issue reporting. Rename classes representing issue reports. In particular, rename `Annotation` base class to `Report`, as it has nothing to do with "annotations" in `MCPlus` terms anymore. Remove references to "return instructions" from variable names and report messages, use generic terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner. Remove `GeneralDiagnostic` as a separate class, make `GenericReport` (former `GenDiag`) store `std::string Text` directly. Remove unused `operator=` and `operator==` methods, as `Report`s are created on the heap and referenced via `shared_ptr`s. Introduce `GadgetKind` class - currently, it only wraps a `const char *` description to display to the user. This description is intended to be a per-gadget-kind constant (or a few hard-coded constants), so no need to store it to `std::string` field in each report instance. To handle both free-form `GenericReport`s and statically-allocated messages without unnecessary overhead, move printing of the report header to the base class (and take the message argument as a `StringRef`).
2025-03-19[BOLT] Support computed goto and allow map addrs inside functions (#120267)Ash Dobrescu
Create entry points for addresses referenced by dynamic relocations and allow getNewFunctionOrDataAddress to map addrs inside functions. By adding addresses referenced by dynamic relocations as entry points. This patch fixes an issue where bolt fails on code using computing goto's. This also fixes a mapping issue with the bugfix from this PR: https://github.com/llvm/llvm-project/pull/117766.
2025-03-14[BOLT] Pass unfiltered relocations to disassembler. NFCI (#131202)Maksim Panchenko
Instead of filtering and modifying relocations in readRelocations(), preserve the relocation info and use it in the symbolizing disassembler. This change mostly affects AArch64, where we need to look at original linker relocations in order to properly symbolize instruction operands.