llvm-project.git/llvm/lib/DebugInfo/GSYM/GsymReader.cpp, branch users/chapuni/cov/single/switch

[GSYM] Add support for querying merged functions in llvm-gsymutil (#120991)

2025-01-06T19:55:27+00:00

Adds the ability to lookup and display all merged functions for an
address in llvm-gsymutil.

Now, when `--merged-functions` is used in combination with
`--address/--addresses-from-stdin`, lookup results will contain
information about merged functions, if available.

To support printing merged function information when using the
`--verbose` option, the `LookupResult` data structure also had to be
extended with pointers to the raw function data and raw merged function
data. This is because merged functions share the same address range, so
it's not easy to look up the raw merged function data for a particular
`LookupResult` that is based on a merged function.

[llvm-gsymutil] Fix dumping of call sites for merged functions (#119759)

2024-12-13T23:59:57+00:00

Currently, when dumping the contents of a GSYM there are three issues:
- Callsite information is not displayed for merged functions - this is
because of a bug in `CallSiteInfoLoader::buildFunctionMap` where when
enumerating through `Func.MergedFunctions` - we enumerate by value
instead of by reference.
- There is no variable indent for printing callsite info - meaning that
when printing callsites for merged functions, the indent will be
different than the other info of the merged function. To address this we
add configurable indent for printing callsite info
- Callsite info is printed right after merged function info. Meaning
that if the merged function also has call site information, the parent's
callsite info will appear right after the merged function's callsite
info - leading to confusion. To address this we print the callsite info
first, then the merged functions info.

This change addresses all the above 3 issues. 
Example of old vs new:

[GSYM] Callsites: Add data format support and loading from YAML (#109781)

2024-11-27T00:07:40+00:00

This PR adds support in the gSYM format for call site information and
adds support for loading call sites from a YAML file. The support for
YAML input is mostly for testing purposes - so we have a way to test the
functionality.

Note that this data is not currently used in the gSYM tooling - the
logic to use call sites will be added in a later PR.

The reason why we need call site information in gSYM files is so that we
can support better call stack function disambiguation in the case where
multiple functions have been merged due to optimization (linker ICF).
When resolving a merged function on the callstack, we can use the call
site information of the calling function to narrow down the actual
function that is being called, from the set of all merged functions.

See [this
RFC](https://discourse.llvm.org/t/rfc-extending-gsym-format-with-call-site-information-for-merged-function-disambiguation/80682)
for more details on this change.

[DebugInfo] Remove unused includes (NFC) (#116551)

2024-11-17T18:37:33+00:00

Identified with misc-include-cleaner.

[gSYM] Add support merged functions in gSYM format (#101604)

2024-08-07T21:34:20+00:00

This patch introduces support for storing debug info for merged
functions in the GSYM debug info. It allows GSYM to represent multiple
functions that share the same address range, which occur when multiple
functions are merged during linker ICF.

The core of this functionality is the new `MergedFunctionsInfo` class,
which is integrated into the existing `FunctionInfo` structure. During
GSYM creation, functions with identical address ranges are now grouped
together, with one function serving as the "master" and the others
becoming "merged" functions. This organization is preserved in the GSYM
format and can be read back and displayed when dumping GSYM information.

Old readers will only see the master function, and ther "merged"
functions will not be processed.

Note: This patch just adds the functionality to the gSYM format -
additional changes to the gsym format and algorithmic changes to logic
existing tooling are needed to take advantage of this data.

Exact output of `llvm-gsymutil --verify --verbose` for the included
test:
[gist](https://gist.github.com/alx32/b9c104d7f87c0b3e7b4171399fc2dca3)

Modify llvm-gsymutil lookups to handle overlapping ranges correctly. (#72350)

2023-11-17T18:31:12+00:00

llvm-gsymutil allows address ranges to overlap. There was a bug where if
we had debug info for a function with a range like [0x100-0x200) and a
symbol at the same start address yet with a larger range like
[0x100-0x300), we would randomly get either only information from the
first or second entry. This could cause lookups to fail due to the way
the binary search worked.

This patch makes sure that when lookups happen we find the first address
table entry that can match an address, and also ensures that we always
select the first FunctionInfo that could match. FunctionInfo entries are
sorted such that the most debug info rich entries come first. And if we
have two ranges that have the same start address, the smaller range
comes first and the larger one comes next. This patch also adds the
ability to iterate over all function infos with the same start address
to always find a range that contains the address.

Added a unit test to test this functionality that failed prior to this
fix and now succeeds.

Also fix an issue when dumping an entire GSYM file that has duplicate address entries where it used to always print out the binary search match for the FunctionInfo, not the actual data for the address index.

Use llvm::endianness::{big,little,native} (NFC)

2023-10-13T04:21:45+00:00

Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.

[Support] Deprecate system_endianness (#68279)

2023-10-05T16:17:09+00:00

system_endianness() just returns llvm::endianness::native, a
compile-time constant equivalent to std::native in C++20.  This patch
deprecates system_endianness() while replacing all invocations of
system_endianness() with llvm::endianness::native.

While we are at it, this patch replaces
llvm::support::endianness::{big,little} with
llvm::endianness::{big,little} in those statements that happen to call
system_endianness().  It does not go out of its way to replace other
occurrences of llvm::support::endianness::{big,little}.

[GSYM] Fix the initialization of DataExtractor (#67904)

2023-10-01T19:11:10+00:00

Without this patch, we pass Endian as one of the parameters to the
constructor of DataExtractor.  The problem is that Endian is of:

  enum endianness {big, little, native};

whereas the constructor is expecting "bool IsLittleEndian".  That is,
we are relying on an implicit conversion to convert big and little to
false and true, respectively.

When we migrate llvm::support::endianness to std::endian in future, we
can no longer rely on an implicit conversion because std::endian is
declared with "enum class".  Even if we could, the conversion would
not be guaranteed to work because, for example, libcxx defines:

  enum class endian {
    little = 0xDEAD,
    big = 0xFACE,
    :

where big and little are not boolean values.

This patch fixes the problem by properly converting Endian to a
boolean value.

[DebugInfo] llvm::Optional => std::optional

2022-12-05T00:09:22+00:00

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716