| Age | Commit message (Collapse) | Author |
|
Add the `--{no-}separate-cstring-literal-sections` option to emit
cstring literals into sections defined by their section name. This
allows for changes like https://github.com/swiftlang/swift/pull/84300
and https://github.com/swiftlang/swift/pull/84236 to actually have an
affect. The default behavior has not changed.
The reason this is useful is because strings in different sections might
have different access patterns at runtime. By splitting these strings
into separate sections, we may reduce the number of page faults during
startup. For example, the ObjC runtime accesses all strings in
`__objc_classname` before main.
|
|
|
|
This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
https://github.com/llvm/llvm-project/pull/120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
|
|
This patch extends the MachO linker's map file generation to include
branch extension thunk symbols. Previously, thunks were omitted from the
map file, making it difficult to understand the final layout of the
binary, especially when debugging issues related to long branch thunks.
This change ensures thunks are included and correctly interleaved with
other symbols based on their address, providing an accurate
representation of the linked output.
|
|
Currently, our `safe` ICF mode only merges non-address-significant code,
leaving duplicate address-significant functions in the output. This
patch introduces `safe_thunks` ICF mode, which keeps a single master
copy of each function and replaces address-significant duplicates with
thunks that branch to the master copy.
Currently `--icf=safe_thunks` is only supported for `arm64`
architectures.
**Perf stats for a large binary:**
| ICF Option | Total Size | __text Size | __unwind_info | % total |
|-------------------|------------|-------------|---------------------|---------------------------|
| `--icf=none` | 91.738 MB | 55.220 MB | 1.424 MB | 0% |
| `--icf=safe` | 85.042 MB | 49.572 MB | 1.168 MB | 7.30% |
| `--icf=safe_thunks` | 84.650 MB | 49.219 MB | 1.143 MB | 7.72% |
| `--icf=all` | 82.060 MB | 48.726 MB | 1.111 MB | 10.55% |
So overall we can expect a `~0.45%` binary size reduction for a typical
large binary compared to the `--icf=safe` option.
**Runtime:**
Linking the above binary took ~10 seconds. Comparing the link
performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was
observed.
|
|
Currently, when moving symbols from one `InputSection` to another (like
in ICF) we directly update the symbol's `isec`, `unwindEntry` and
`size`. By doing this we lose the original information. This information
will be needed in a future change. Since when moving symbols we always
set the symbol's `wasCoalesced` and `isec-> replacement`, we can just
use this info to conditionally get the information we need at access
time.
|
|
The MachO format supports relative offsets for ObjC method lists. This
support is present already in ld64. With this change we implement this
support in lld also.
Relative method lists can be identified by a specific flag (0x80000000)
in the method list header. When this flag is present, the method list
will contain 32-bit relative offsets to the current Program Counter
(PC), instead of absolute pointers.
Additionally, when relative method lists are used, the offset to the
selector name will now be relative and point to the selector reference
(selref) instead of the name itself.
|
|
This is also what ld64 does. This will make it easier to compare their
respective map files.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D145654
|
|
If a symbol is pulled in from an archive, we should include the archive
name in the map file output. This is what ld64 does.
Note that we aren't using `toString(InputFile*)` here because it
includes the install name for dylibs in its output, and ld64's map file
does not contain those.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D145623
|
|
We now handle the GOT, TLV, and stubs/lazy pointer sections.
Reviewed By: #lld-macho, thevinster, thakis
Differential Revision: https://reviews.llvm.org/D139762
|
|
This reverts commit ac3096e1dd77a2687797d38976d5f8c93f7353e5.
The buildbot failure from the earlier patch set has been fixed by 7c7e39db7a.
Differential Revision: https://reviews.llvm.org/D137369
|
|
The test added in https://reviews.llvm.org/D137368 has been failing
on our 32 bit arm bots:
https://lab.llvm.org/buildbot/#/builders/178/builds/3460
You get this for the strings:
<<dead>> 0x883255000000003 [ 10] literal string: Hello, it's me
Instead of the expected:
<<dead>> 0x0000000F [ 3] literal string: Hello, it's me
This is because unlike symbols whose size is a uint64_t, strings
use a StringRef whose size is size_t. size_t changes size between
32 and 64 bit platforms.
This fixes the test by using %z to print the size of the strings,
this works for 32 and 64 bit.
|
|
This reverts commit 38d6202a425462ce5923d038bc54532115a80a1f.
Differential Revision: https://reviews.llvm.org/D137368
|
|
This reverts commit 213dbdbef0bad835abca0753f9e59b17dc2bcde2.
This patch series breaks lld:map-file.s on arm v7 linux buildbots.
e.g https://lab.llvm.org/buildbot/#/builders/178/builds/3190
|
|
This reverts commit 7f0779967f0690482c2cef70fc49e1381d32af1e.
This patch series breaks lld:map-file.s on arm v7 linux buildbots.
e.g https://lab.llvm.org/buildbot/#/builders/178/builds/3190
|
|
Just like ld64 does.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D137369
|
|
The previous map file code left out was modeled after LLD-ELF's
implementation. However, ld64's map file differs quite a bit from
LLD-ELF's. I've revamped our map file implementation so it is better
able to emit ld64-style map files.
Notable differences:
* ld64 doesn't demangle symbols in map files, regardless of whether
`-demangle` is passed. So we don't have to bother with
`getSymbolStrings()`.
* ld64 doesn't emit symbols in cstring sections; it emits just the
literal values. Moreover, it emits these literal values regardless of
whether they are labeled with a symbol.
* ld64 emits map file entries for things that are not strictly symbols,
such as unwind info, GOT entries, etc. That isn't handled in this
diff, but this redesign makes them easy to implement.
Additionally, the previous implementation sorted the symbols so as to
emit them in address order. This was slow and unnecessary -- the symbols
can already be traversed in address order by walking the list of
OutputSections. This brings significant speedups. Here's the numbers
from the chromium_framework_less_dwarf benchmark on my Mac Pro, with the
`-map` argument added to the response file:
base diff difference (95% CI)
sys_time 2.922 ± 0.059 2.950 ± 0.085 [ -0.7% .. +2.5%]
user_time 11.464 ± 0.191 8.290 ± 0.123 [ -28.7% .. -26.7%]
wall_time 11.235 ± 0.175 9.184 ± 0.169 [ -19.3% .. -17.2%]
samples 16 23
(It's worth noting that map files are written in parallel with the
output binary, but they often took longer to write than the binary
itself.)
Finally, I did further cleanups to the map-file.s test -- there was no
real need to have a custom-named section. There were also alt_entry
symbol declarations that had no corresponding definition. Either way,
neither custom-named sections nor alt_entry symbols trigger special code
paths in our map file implementation.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D137368
|
|
ld64 emits them in address order but not in alphabetical order. This
sorting is particularly expensive for dead-stripped symbols (which don't
need to be sorted at all, unlike live symbols that need to be sorted by
address).
Timings for chromium_framework_less_dwarf (with the `-map` flag added to
the response file) on my 16-core Mac Pro:
base diff difference (95% CI)
sys_time 1.997 ± 0.038 2.004 ± 0.028 [ -0.6% .. +1.3%]
user_time 8.698 ± 0.085 8.167 ± 0.070 [ -6.6% .. -5.6%]
wall_time 7.965 ± 0.114 7.715 ± 0.347 [ -5.1% .. -1.2%]
samples 25 23
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D136536
|
|
... instead of mapping them to the intermediate object file.
This matches ld64.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D136380
|
|
Include symbol sizes (present after {D135883}) as well as an example of
a dead-stripped symbol.
|
|
- remove unused/duplicate includes
- reformatting/whitespaces
Differential Revision: https://reviews.llvm.org/D136266
|
|
This matches ld64's behavior.
Additionally, I edited the "Dead Stripped Symbols" header to omit "Address" --
this also matches ld64.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D135883
|
|
So @keith observed
[here](https://reviews.llvm.org/D128108#inline-1263900) that the
StringRefs we were returning from `CStringInputSection::getStringRef()`
included the null terminator in their total length, but regular
StringRefs do not. Let's fix that so these StringRefs are less confusing
to use.
Reviewed By: #lld-macho, keith, Roger
Differential Revision: https://reviews.llvm.org/D133728
|
|
No behavior change.
Differential Revision: https://reviews.llvm.org/D131355
|
|
Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: https://reviews.llvm.org/D128140
|
|
This diff has the C-string literals printed into the mapfile in the symbol table like how ld64 does.
Here is what ld64's mapfile looks like with C-string literals:
```
# Path: out
# Arch: x86_64
# Object files:
[ 0] linker synthesized
[ 1] foo.o
# Sections:
# Address Size Segment Section
0x100003F7D 0x0000001D __TEXT __text
0x100003F9A 0x0000001E __TEXT __cstring
0x100003FB8 0x00000048 __TEXT __unwind_info
# Symbols:
# Address Size File Name
0x100003F7D 0x0000001D [ 1] _main
0x100003F9A 0x0000000E [ 1] literal string: Hello world!\n
0x100003FA8 0x00000010 [ 1] literal string: Hello, it's me\n
0x100003FB8 0x00000048 [ 0] compact unwind info
```
Here is what the new lld's Mach-O mapfile looks like:
```
# Path: /Users/rgr/local/llvm-project/build/Debug/tools/lld/test/MachO/Output/map-file.s.tmp/c-string-liter
al-out
# Arch: x86_64
# Object files:
[ 0] linker synthesized
[ 1] /Users/rgr/local/llvm-project/build/Debug/tools/lld/test/MachO/Output/map-file.s.tmp/c-string-literal
.o
# Sections:
# Address Size Segment Section
0x1000002E0 0x0000001D __TEXT __text
0x1000002FD 0x0000001D __TEXT __cstring
# Symbols:
# Address File Name
0x1000002E0 [ 1] _main
0x1000002FD [ 1] literal string: Hello world!\n
0x10000030B [ 1] literal string: Hello, it's me\n
```
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D118077
|
|
ld64 outputs dead stripped symbols when using the -dead-strip flag. This change mimics that behavior for lld.
ld64's -dead_strip flag outputs:
```
$ ld -map map basics.o -o out -dead_strip -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem
$ cat map
# Path: out
# Arch: x86_64
# Object files:
[ 0] linker synthesized
[ 1] basics.o
# Sections:
# Address Size Segment Section
0x100003F97 0x00000021 __TEXT __text
0x100003FB8 0x00000048 __TEXT __unwind_info
0x100004000 0x00000008 __DATA_CONST __got
0x100008000 0x00000010 __DATA __ref_section
0x100008010 0x00000001 __DATA __common
# Symbols:
# Address Size File Name
0x100003F97 0x00000006 [ 1] _ref_local
0x100003F9D 0x00000001 [ 1] _ref_private_extern
0x100003F9E 0x0000000C [ 1] _main
0x100003FAA 0x00000006 [ 1] _no_dead_strip_globl
0x100003FB0 0x00000001 [ 1] _ref_from_no_dead_strip_globl
0x100003FB1 0x00000006 [ 1] _no_dead_strip_local
0x100003FB7 0x00000001 [ 1] _ref_from_no_dead_strip_local
0x100003FB8 0x00000048 [ 0] compact unwind info
0x100004000 0x00000008 [ 0] non-lazy-pointer-to-local: _ref_com
0x100008000 0x00000008 [ 1] _ref_data
0x100008008 0x00000008 [ 1] l_ref_data
0x100008010 0x00000001 [ 1] _ref_com
# Dead Stripped Symbols:
# Size File Name
<<dead>> 0x00000006 [ 1] _unref_extern
<<dead>> 0x00000001 [ 1] _unref_local
<<dead>> 0x00000007 [ 1] _unref_private_extern
<<dead>> 0x00000001 [ 1] _ref_private_extern_u
<<dead>> 0x00000008 [ 1] _unref_data
<<dead>> 0x00000008 [ 1] l_unref_data
<<dead>> 0x00000001 [ 1] _unref_com
```
Reviewed By: int3, #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D114737
|
|
As per [Bug 50689](https://bugs.llvm.org/show_bug.cgi?id=50689),
```
2. getSectionSyms() puts all the symbols into a map of section -> symbols, but this seems unnecessary. This was likely copied from the ELF port, which prints a section header before the list of symbols it contains. But the Mach-O map file doesn't print these headers.
```
This diff removes `getSectionSyms()` and keeps all symbols in a flat vector.
What does ld64's mapfile look like?
```
$ llvm-mc -filetype=obj -triple=x86_64-apple-darwin test.s -o test.o
$ llvm-mc -filetype=obj -triple=x86_64-apple-darwin foo.s -o foo.o
$ ld -map map test.o foo.o -o out -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem
```
```
[ 0] linker synthesized
[ 1] test.o
[ 2] foo.o
0x100003FB7 0x00000001 __TEXT __text
0x100003FB8 0x00000000 __TEXT obj
0x100003FB8 0x00000048 __TEXT __unwind_info
0x100004000 0x00000001 __DATA __common
0x100003FB7 0x00000001 [ 1] _main
0x100003FB8 0x00000000 [ 2] _foo
0x100003FB8 0x00000048 [ 0] compact unwind info
0x100004000 0x00000001 [ 1] _number
```
Perf numbers when linking chromium framework on a 16-Core Intel Xeon W Mac Pro:
```
base diff difference (95% CI)
sys_time 1.406 ± 0.020 1.388 ± 0.019 [ -1.9% .. -0.6%]
user_time 5.557 ± 0.023 5.914 ± 0.020 [ +6.2% .. +6.6%]
wall_time 4.455 ± 0.041 4.436 ± 0.035 [ -0.8% .. -0.0%]
samples 35 35
```
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D114735
|
|
source: https://bugs.llvm.org/show_bug.cgi?id=50689
When writing a map file, sort symbols in parallel using parallelSort.
Use address name to break ties if two symbols have the same address.
Reviewed By: thakis, int3
Differential Revision: https://reviews.llvm.org/D104346
|
|
I removed them in rG5de7467e982 but @thakis pointed out that
they were useful to keep, so here they are again. I've also converted
the `!isCoalescedWeak()` asserts into `!shouldOmitFromOutput()` asserts,
since the latter check subsumes the former.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104169
|
|
|
|
D103977 broke a bunch of stuff as I had only tested the release build
which eliminated asserts.
I've retained the asserts where possible, but I also removed a bunch
instead of adding a whole lot of verbose ConcatInputSection casts.
|
|
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.
Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).
Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:
% ministat lld_*
x lld_nostrip.txt
+ lld_strip.txt
N Min Max Median Avg Stddev
x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794
+ 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651
Difference at 95.0% confidence
-0.144711 +/- 0.0336749
-3.60967% +/- 0.839989%
(Student's t, pooled s = 0.0358398)
This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols
It's possible it interacts with more features I didn't think of,
of course.
I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests
Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.
Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
(`.no_dead_strip` in asm)
Fixes PR49276.
Differential Revision: https://reviews.llvm.org/D103324
|
|
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.
Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:
N Min Max Median Avg Stddev
x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097
+ 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012
Difference at 95.0% confidence
-0.172162 +/- 0.083847
-4.14165% +/- 2.01709%
(Student's t, pooled s = 0.0892373)
Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)
Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
(that is, catch block unwind information) and Personality Routines
associated with weak functions still not stripped. This is wasteful,
but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
one symbol (eg from .alt_entry) for now
Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
found it a bit more explicit)
- exports
Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).
This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.
I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.
Differential Revision: https://reviews.llvm.org/D102076
|
|
As discussed here: https://reviews.llvm.org/D100523#inline-951543
Reviewed By: #lld-macho, thakis, alexshap
Differential Revision: https://reviews.llvm.org/D100978
|
|
This diff adds initial support for the legacy LC_VERSION_MIN_* load commands.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D100523
|
|
references to Symbol
Within `lld/macho/`, only `InputFiles.cpp` and `Symbols.h` require the `macho::` namespace qualifier to disambiguate references to `class Symbol`.
Add braces to outer `for` of a 5-level single-line `if`/`for` nest.
Differential Revision: https://reviews.llvm.org/D99555
|
|
|
|
I added just enough to allow us to see a top-level breakdown of time taken. This
is the result of loading the time-trace output into `chrome:://tracing`:
https://gist.githubusercontent.com/int3/236c723cbb4b6fa3b2d340bb6395c797/raw/ef5e8234f3fdf609bf93b50f54f4e0d9bd439403/tracing.png
Reviewed By: oontvoo
Differential Revision: https://reviews.llvm.org/D99311
|
|
Implement command-line options -map
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D98323
|