| Age | Commit message (Collapse) | Author |
|
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's
Clang config files (i.e. a global preference for Intel syntax).
`-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
|
|
With this option we can pass to BOLT names of functions to be printed
through a file instead of specifying them all on command line.
|
|
Pseudo probe matching (#100446) needs callee information for call probes.
Embed call probe information (probe id, inline tree node, indirect flag)
into CallSiteInfo. As a consequence:
- Remove call probes from PseudoProbeInfo to avoid duplication, making
it only contain block probes.
- Probe grouping across inline tree nodes becomes more potent + allows
to unambiguously elide block id 1 (common case).
Block mask (blx) encoding becomes a low-ROI optimization and will be
replaced by a more compact encoding leveraging simplified PseudoProbeInfo
in #166680.
The size increase is ~3% for an XL profile (461->475MB). Compact block
probe encoding shrinks it by ~6%.
Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test
Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/165490
|
|
Slice .debug_str from the DWP for each CU using .debug_str_offsets and
emit it, instead of directly copying the global .debug_str, in order to
address the bloat issue of DWO after updates. (more details here -
#155766 )
|
|
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
|
|
- unify isRAStateSigned and isRAStateUnsigned to a common getRAState,
- unify setRASigned and setRAUnsigned into setRAState(MCInst, bool),
- update users of these to match the new implementations.
|
|
Add `RSeqRewriter` to detect code references from `__rseq_cs` section
and ignore function referenced from that section. Code references are
detected via relocations (static or dynamic).
Note that the abort handler is preceded by a 4-byte signature byte
sequence and we cannot relocate the handler without that the signature,
otherwise the application may crash. Thus we are ignoring the function,
i.e. making sure it's not separated from its signature.
|
|
Identified with readability-container-contains.
|
|
In addition to tracking offsets inside a `BinaryFunction` that are
referenced by data relocations, we need to track those relocations too.
Plus, we will need to map symbols referenced by such relocations back to
the containing function.
This change introduces `BinaryFunction::InternalRefDataRelocations` to
track the aforementioned relocations and expands
`BinaryContext::SymbolToFunctionMap` to include local/temp symbols
involved in relocation processing.
There is no functional change introduced that should affect the output.
Future PRs will use the new tracking capabilities.
|
|
Remove internal undefined symbol tracking and instead rely on the
emission state of `MCSymbol` while processing data-to-code relocations.
Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior
to code emission.
|
|
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.
Identified with readability-redundant-declaration.
|
|
We are skipping four zero's at a time when validating code padding in
case that the next zero would be part of an instruction or constant
island, and for functions that have large amount of padding (like due to
hugify), this could be very slow. We now change the validation to skip
as many as possible but still need to be 4's exact multiple number of
zero's. No valid instruction has encoding as 0x00000000 and even if we
stumble into some constant island, the API
`BinaryFunction::isInConstantIsland()` has been made to find the size
between the asked address and the end of island (#164037), so this
should be safe.
|
|
When the SPE Previous Branch Target address (FEAT_SPE_PBT) feature is
available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch
operation, and it stores into the first entry. PBT records the target
address of most recently taken branch in program order before the
sampled operation, it places into the second entry. They are formed a
chain of two consecutive branches.
Where:
- The previous branch operation (PBT) is always taken.
- In SPE entry, the current source branch (SRC) may be either
fall-through or taken, and the target address (DEST) of the recorded
branch operation is always what was architecturally executed.
However PBT doesn't provide as much information as SPE does. It lacks
those information such as the address of source branch, branch type, and
prediction bit. These information are always filled with zero in PBT
entry. Therefore Bolt cannot evaluate the prediction, and source branch
fields, it leaves them zero during the aggregation process.
Tests includes a fully expanded example.
|
|
Enumeration of relocation types is not always sequential, e.g. on
AArch64 the first real relocation type is 0x101. As such, the existing
code in `Relocation::print()` was crashing while printing AArch64
relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.
|
|
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the
`ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR
relaxation, relocation type for `LDR` should be
`R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or
`R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit.
Sorry should have included this in #165787.
|
|
(#166536)
…e size is unknown
Keep _negative suffix only for test cases when the size is negative
|
|
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.
Patch by Yafet Beyene alulayafet@gmail.com
|
|
BOLT expects pre-aggregated profile entries to be unique, which holds
for externally aggregated traces (or branches+fall-through ranges).
Therefore, BOLT doesn't merge duplicate entries for faster processing.
However, such traces are not expressly prohibited and could come from
concatenated pre-aggregated profiles or otherwise.
Relax the assumption about no duplicate (branch-only) traces in fall-
through imputing.
Test Plan: updated callcont-fallthru.s
|
|
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
|
|
Since the "--write-dwp" option has been removed in
[PR](https://github.com/llvm/llvm-project/pull/100771), this patch also
cleans up the corresponding document and test to avoid misleading
issues.
|
|
Drop hfsort in favor of a more modern function reordering algorithm.
|
|
Avoid cloning constant island helps to reduce app size, especially for
BOLT optimization in which cloning would happen when a function is split
into multiple fragments. Add an option to make the cloning optional, and
we will introduce a new pass to handle the reference too far error that
may result from disabling constant island cloning (#165787).
|
|
Replace assert with an error and improve the report when unclaimed
PC-relative relocation is left in strict mode.
|
|
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
|
|
This patch introduces StringSetTag, a dedicated empty struct to serve
as the "value type" for llvm::StringSet. This change is part of an
effort to reduce the use of std::nullopt_t outside the context of
std::optional.
|
|
Refactor code that verifies external branch destinations and creates
secondary entry points.
|
|
The [previous patch](https://github.com/llvm/llvm-project/pull/163418)
has added a check to prevent adding an entry point into a constant
island, but only for successfully disassembled functions.
Because scanExternalRefs() is also called when a function fails to be
disassembled or is skipped, it can still attempt to add an entry point
at constant islands. The same issue may occur if without a check for it
So, this patch complements the 'constant island' check in
scanExternalRefs().
|
|
Equal number of blocks in a function/instructions in a block between
stale profile and the binary isn't used in the matching.
Remove these stats to declutter the output.
Test Plan: NFC
|
|
The pass calls setIgnored() on functions in parallel, but setIgnored is
not thread safe. This patch adds a std::mutex to guard setIgnored calls.
Fixes: #165362
|
|
DWARF4. (#161067)
Using the DWP's cu_index/tu_index only loads the DWO units from the
.debug_info.dwo section for hash, which works fine in DWARF5. However,
tu_index points to .debug_types.dwo section in DWARF4, which can cause
the type unit to be lost due to the incorrect loading target. (Related
discussion in
[811b60f](https://github.com/llvm/llvm-project/commit/811b60f0b99dad4b2989d21dde38d49155b0c4f9))
This patch supports to get the type unit for hash from .debug_types.dwo
section in DWARF4.
|
|
CreatePastEnd parameter had no effect on the label creation. Remove it.
|
|
|
|
The `--nl` flag, originally for Non-LBR mode, is deprecated and will be
replaced by `--basic-events` (alias `--ba`).
`--nl` remains as a deprecated alias for backward compatibility.
|
|
Check whether AArch64 function code padding is valid,
and add an option to treat invalid code padding as error.
|
|
There are cases where `addEntryPointAtOffset` is called with a given
`Offset` that points to an address within a constant island. This
triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT
to crash. This patch adds a check which ignores functions that would add
such entry points and warns the user.
|
|
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.
For more context, see: https://github.com/llvm/llvm-project/pull/163405.
|
|
Update guides to use brstack, with a mention to BRBE for AArch64. Use
brstack in user-facing outputs.
---------
Co-authored-by: Amir Ayupov <aaupov@fb.com>
|
|
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]],
introduced as part of C++17.
|
|
PR #120064 added several MCPlusBuilder helpers for recognising
instructions which sign or authenticate the link register.
This patch adds MCPlusBuilder unittests for these helpers.
|
|
|
|
https://github.com/codespell-project/codespell
```bash
codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \
--ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas
```
|
|
Fix UB caused by accessing the top element of the stack via a dangling
reference after a call to .pop() This is tripping static analysis.
No functional changes. Performance impact is negligible, but alt.
implementation of the fix is possible if needed.
Testing: Both functional and unit tests are passing.
|
|
Observed in GCC-produced binary. Emit a warning for the user.
Test Plan: added bolt/test/X86/fragment-alias.s
|
|
When --use-old-text fails, we are emitting all code meant for the
original `.text` section into the new section. This could be more bytes
compared to those emitted under no `--use-old-text`, especially under
`--lite`. As a result, `--use-old-text` results in a larger binary, not
smaller which could be confusing to the user.
Add more information to the warning, including recommendation to rebuild
without `--use-old-text` for smaller binary size.
|
|
Based on
https://github.com/llvm/llvm-project/pull/162007#issuecomment-3373161948,
we should avoid having short links in docker images.
|
|
Lit recently started failing on some machines when using a subshell. The
test used a subshell to handle kernels where SPE's brstack option is
unavailable (<6.14), but it appears to work without it now.
Tested with perf 6.8 and 6.17.
|
|
binaries with pac-ret hardening" (#162353) (#162435)
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)
This reverts commit c7d776b06897567e2d698e447d80279664b67d47.
#120064 was reverted for breaking builders.
Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.
---
Original message:
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
|
|
with pac-ret hardening" (#162353)
Reverts llvm/llvm-project#120064.
@gulfemsavrun reported that the patch broke toolchain builders.
|
|
pac-ret hardening (#120064)
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
|
|
If an address has both, a data marker "$d" and a function symbol
associated with it, treat it as code.
|