summaryrefslogtreecommitdiff
path: root/bolt
AgeCommit message (Collapse)Author
2025-11-19[NFCI][bolt][test] Use AT&T syntax explicitly (#167225)Raul Tambre
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files (i.e. a global preference for Intel syntax). `-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
2025-11-14[BOLT][print] Add option '--print-only-file' (NFC) (#168023)YongKang Zhu
With this option we can pass to BOLT names of functions to be printed through a file instead of specifying them all on command line.
2025-11-11[BOLT] Move call probe information to CallSiteInfoAmir Ayupov
Pseudo probe matching (#100446) needs callee information for call probes. Embed call probe information (probe id, inline tree node, indirect flag) into CallSiteInfo. As a consequence: - Remove call probes from PseudoProbeInfo to avoid duplication, making it only contain block probes. - Probe grouping across inline tree nodes becomes more potent + allows to unambiguously elide block id 1 (common case). Block mask (blx) encoding becomes a low-ROI optimization and will be replaced by a more compact encoding leveraging simplified PseudoProbeInfo in #166680. The size increase is ~3% for an XL profile (461->475MB). Compact block probe encoding shrinks it by ~6%. Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/165490
2025-11-11[BOLT][DWARF] Slice .debug_str from the DWP for each CU (#159540)Liu Ke
Slice .debug_str from the DWP for each CU using .debug_str_offsets and emit it, instead of directly copying the global .debug_str, in order to address the bloat issue of DWO after updates. (more details here - #155766 )
2025-11-10[BOLT][AArch64] Add more heuristics on epilogue determination (#167077)YongKang Zhu
Add more heuristics to check if a basic block is an AArch64 epilogue. We assume instructions that load from stack or adjust stack pointer as valid epilogue code sequence if and only if they immediately precede the branch instruction that ends the basic block.
2025-11-10[BOLT] Simplify RAState helpers (NFCI) (#162820)Gergely Bálint
- unify isRAStateSigned and isRAStateUnsigned to a common getRAState, - unify setRASigned and setRAUnsigned into setRAState(MCInst, bool), - update users of these to match the new implementations.
2025-11-09[BOLT] Support restartable sequences in tcmalloc (#167195)Maksim Panchenko
Add `RSeqRewriter` to detect code references from `__rseq_cs` section and ignore function referenced from that section. Code references are detected via relocations (static or dynamic). Note that the abort handler is preceded by a 4-byte signature byte sequence and we cannot relocate the handler without that the signature, otherwise the application may crash. Thus we are ignoring the function, i.e. making sure it's not separated from its signature.
2025-11-08[BOLT] Use DenseMap::contains (NFC) (#167169)Kazu Hirata
Identified with readability-container-contains.
2025-11-08[BOLT] Refactor tracking internals of BinaryFunction. NFCI (#167074)Maksim Panchenko
In addition to tracking offsets inside a `BinaryFunction` that are referenced by data relocations, we need to track those relocations too. Plus, we will need to map symbols referenced by such relocations back to the containing function. This change introduces `BinaryFunction::InternalRefDataRelocations` to track the aforementioned relocations and expands `BinaryContext::SymbolToFunctionMap` to include local/temp symbols involved in relocation processing. There is no functional change introduced that should affect the output. Future PRs will use the new tracking capabilities.
2025-11-07[BOLT] Refactor undefined symbols handling. NFCI (#167075)Maksim Panchenko
Remove internal undefined symbol tracking and instead rely on the emission state of `MCSymbol` while processing data-to-code relocations. Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior to code emission.
2025-11-07[BOLT] Remove redundant declarations (NFC) (#166893)Kazu Hirata
In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Identified with readability-redundant-declaration.
2025-11-06[BOLT][AArch64] Skip as many zeros as possible in padding validation (#166467)YongKang Zhu
We are skipping four zero's at a time when validating code padding in case that the next zero would be part of an instruction or constant island, and for functions that have large amount of padding (like due to hugify), this could be very slow. We now change the validation to skip as many as possible but still need to be 4's exact multiple number of zero's. No valid instruction has encoding as 0x00000000 and even if we stumble into some constant island, the API `BinaryFunction::isInConstantIsland()` has been made to find the size between the asked address and the end of island (#164037), so this should be safe.
2025-11-06[BOLT] Adding a unittest that covers Arm SPE PBT aggregation (#160095)Ádám Kallai
When the SPE Previous Branch Target address (FEAT_SPE_PBT) feature is available, an SPE sample by combining this PBT feature, has two entries. Arm SPE records SRC/DEST addresses of the latest sampled branch operation, and it stores into the first entry. PBT records the target address of most recently taken branch in program order before the sampled operation, it places into the second entry. They are formed a chain of two consecutive branches. Where: - The previous branch operation (PBT) is always taken. - In SPE entry, the current source branch (SRC) may be either fall-through or taken, and the target address (DEST) of the recorded branch operation is always what was architecturally executed. However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information are always filled with zero in PBT entry. Therefore Bolt cannot evaluate the prediction, and source branch fields, it leaves them zero during the aggregation process. Tests includes a fully expanded example.
2025-11-05[BOLT][AArch64] Fix printing of relocation types (#166621)Maksim Panchenko
Enumeration of relocation types is not always sequential, e.g. on AArch64 the first real relocation type is 0x101. As such, the existing code in `Relocation::print()` was crashing while printing AArch64 relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.
2025-11-05[BOLT][AArch64] Fix LDR relocation type in ADRP+LDR sequence (#166391)YongKang Zhu
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the `ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR relaxation, relocation type for `LDR` should be `R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or `R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit. Sorry should have included this in #165787.
2025-11-05[BOLT][NFC] Rename funtions with _negative suffix to _unknown when th… ↵Elvina Yakubova
(#166536) …e size is unknown Keep _negative suffix only for test cases when the size is negative
2025-11-05[BOLT][AArch64] Fix search to proceed upwards from memcpy call (#166182)Elvina Yakubova
The search should proceed from CallInst to the beginning of BB since X2 can be rewritten and we need to catch the most recent write before the call. Patch by Yafet Beyene alulayafet@gmail.com
2025-11-04[BOLT] Fix impute-fall-throughs (#166305)Amir Ayupov
BOLT expects pre-aggregated profile entries to be unique, which holds for externally aggregated traces (or branches+fall-through ranges). Therefore, BOLT doesn't merge duplicate entries for faster processing. However, such traces are not expressly prohibited and could come from concatenated pre-aggregated profiles or otherwise. Relax the assumption about no duplicate (branch-only) traces in fall- through imputing. Test Plan: updated callcont-fallthru.s
2025-11-04[BOLT][AArch64] Run LDR relaxation (#165787)YongKang Zhu
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`, which, besides the existing ADR relaxation, will also run LDR relaxation that for now only handles these two forms of LDR instructions: `ldr Xt, [label]` and `ldr Wt, [label]`.
2025-11-04[BOLT][NFC] Clean up the outdated option --write-dwp in doc (#166150)Jinjie Huang
Since the "--write-dwp" option has been removed in [PR](https://github.com/llvm/llvm-project/pull/100771), this patch also cleans up the corresponding document and test to avoid misleading issues.
2025-11-03Update BOLT's README.md example optimization flag (#166251)Rafael Auler
Drop hfsort in favor of a more modern function reordering algorithm.
2025-11-03[BOLT] Add an option for constant island cloning (#165778)YongKang Zhu
Avoid cloning constant island helps to reduce app size, especially for BOLT optimization in which cloning would happen when a function is split into multiple fragments. Add an option to make the cloning optional, and we will introduce a new pass to handle the reference too far error that may result from disabling constant island cloning (#165787).
2025-11-03[BOLT] Issue error on unclaimed PC-relative relocation (#166098)Maksim Panchenko
Replace assert with an error and improve the report when unclaimed PC-relative relocation is left in strict mode.
2025-11-02[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020)Jakub Kuderski
Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-01[ADT] Use a dedicated empty type for StringSet (NFC) (#165967)Kazu Hirata
This patch introduces StringSetTag, a dedicated empty struct to serve as the "value type" for llvm::StringSet. This change is part of an effort to reduce the use of std::nullopt_t outside the context of std::optional.
2025-10-31[BOLT] Refactor handling of branch targets. NFCI (#165828)Maksim Panchenko
Refactor code that verifies external branch destinations and creates secondary entry points.
2025-10-31[BOLT] Add constant island check in scanExternalRefs() (#165577)Jinjie Huang
The [previous patch](https://github.com/llvm/llvm-project/pull/163418) has added a check to prevent adding an entry point into a constant island, but only for successfully disassembled functions. Because scanExternalRefs() is also called when a function fails to be disassembled or is skipped, it can still attempt to add an entry point at constant islands. The same issue may occur if without a check for it So, this patch complements the 'constant island' check in scanExternalRefs().
2025-10-29[BOLT][NFC] Drop unused profile staleness stats (#165489)Amir Ayupov
Equal number of blocks in a function/instructions in a block between stale profile and the binary isn't used in the matching. Remove these stats to declutter the output. Test Plan: NFC
2025-10-28[BOLT] Fix thread-safety of MarkRAStates (#165368)Gergely Bálint
The pass calls setIgnored() on functions in parallel, but setIgnored is not thread safe. This patch adds a std::mutex to guard setIgnored calls. Fixes: #165362
2025-10-28[DebugInfo] Support to get TU for hash from .debug_types.dwo section in ↵Liu Ke
DWARF4. (#161067) Using the DWP's cu_index/tu_index only loads the DWO units from the .debug_info.dwo section for hash, which works fine in DWARF5. However, tu_index points to .debug_types.dwo section in DWARF4, which can cause the type unit to be lost due to the incorrect loading target. (Related discussion in [811b60f](https://github.com/llvm/llvm-project/commit/811b60f0b99dad4b2989d21dde38d49155b0c4f9)) This patch supports to get the type unit for hash from .debug_types.dwo section in DWARF4.
2025-10-25[BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065)Maksim Panchenko
CreatePastEnd parameter had no effect on the label creation. Remove it.
2025-10-25[BOLT] Avoid extra function dump on invalid BBs found by UCE (NFC) (#165111)YongKang Zhu
2025-10-23[BOLT] Add --ba flag to deprecate --nl (#164257)Paschalis Mpeis
The `--nl` flag, originally for Non-LBR mode, is deprecated and will be replaced by `--basic-events` (alias `--ba`). `--nl` remains as a deprecated alias for backward compatibility.
2025-10-22[BOLT][AArch64] Validate code padding (#164037)YongKang Zhu
Check whether AArch64 function code padding is valid, and add an option to treat invalid code padding as error.
2025-10-21[BOLT] Check entry point address is not in constant island (#163418)Asher Dobrescu
There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user.
2025-10-20[ADT] Prepare for deprecation of StringSwitch cases with 4+ args. NFC. (#164173)Jakub Kuderski
Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: https://github.com/llvm/llvm-project/pull/163405.
2025-10-20[BOLT][NFC] Use brstack in guides and user outputs (#163950)Paschalis Mpeis
Update guides to use brstack, with a mention to BRBE for AArch64. Use brstack in user-facing outputs. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>
2025-10-16[BOLT] Replace LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] (NFC) (#163700)Kazu Hirata
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]], introduced as part of C++17.
2025-10-16[BOLT][NFC] Add MCPlusBuilder unittests for PAuth helpers (#162251)Gergely Bálint
PR #120064 added several MCPlusBuilder helpers for recognising instructions which sign or authenticate the link register. This patch adds MCPlusBuilder unittests for these helpers.
2025-10-15[BOLT][NFC] Rename getNames for PLT, TailDuplication (#119870)Paschalis Mpeis
2025-10-14[bolt] Fix typos discovered by codespell (#124726)Christian Clauss
https://github.com/codespell-project/codespell ```bash codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \ --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas ```
2025-10-14[BOLT][NFC] Fix for a dangling reference UB (#163344)Slava Gurevich
Fix UB caused by accessing the top element of the stack via a dangling reference after a call to .pop() This is tripping static analysis. No functional changes. Performance impact is negligible, but alt. implementation of the fix is possible if needed. Testing: Both functional and unit tests are passing.
2025-10-09[BOLT] Support fragment symbol mapped to the parent address (#162727)Amir Ayupov
Observed in GCC-produced binary. Emit a warning for the user. Test Plan: added bolt/test/X86/fragment-alias.s
2025-10-09[BOLT] Modify warning when --use-old-text fails. NFC (#162731)Maksim Panchenko
When --use-old-text fails, we are emitting all code meant for the original `.text` section into the new section. This could be more bytes compared to those emitted under no `--use-old-text`, especially under `--lite`. As a result, `--use-old-text` results in a larger binary, not smaller which could be confusing to the user. Add more information to the warning, including recommendation to rebuild without `--use-old-text` for smaller binary size.
2025-10-09[Bolt] Use fully qualified docker image name (NFC) (#162154)Baranov Victor
Based on https://github.com/llvm/llvm-project/pull/162007#issuecomment-3373161948, we should avoid having short links in docker images.
2025-10-08[BOLT][AArch64] Fix perf2bolt-spe test (#162312)Paschalis Mpeis
Lit recently started failing on some machines when using a subshell. The test used a subshell to handle kernels where SPE's brstack option is unavailable (<6.14), but it appears to work without it now. Tested with perf 6.8 and 6.17.
2025-10-08Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing ↵Gergely Bálint
binaries with pac-ret hardening" (#162353) (#162435) Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)" (#162353) This reverts commit c7d776b06897567e2d698e447d80279664b67d47. #120064 was reverted for breaking builders. Fix: changed the mismatched type in MarkRAStates.cpp to `auto`. --- Original message: OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records whether the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-07Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries ↵Gergely Bálint
with pac-ret hardening" (#162353) Reverts llvm/llvm-project#120064. @gulfemsavrun reported that the patch broke toolchain builders.
2025-10-07[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with ↵Gergely Bálint
pac-ret hardening (#120064) OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records if the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-06[BOLT] Always treat function entry as code (#160161)Maksim Panchenko
If an address has both, a data marker "$d" and a function symbol associated with it, treat it as code.