<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Transforms/IPO/FunctionImport.cpp, branch users/chapuni/cov/single/condop</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[nfc][thinlto] remove unnecessary return from `renameModuleForThinLTO` (#121851)</title>
<updated>2025-01-06T23:19:09+00:00</updated>
<author>
<name>Mircea Trofin</name>
<email>mtrofin@google.com</email>
</author>
<published>2025-01-06T23:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4312075efa02ad861db0a19a0db8e6003aa06965'/>
<id>4312075efa02ad861db0a19a0db8e6003aa06965</id>
<content type='text'>
Same goes for `FunctionImportGlobalProcessing::run`.

The return value was used, but it was always `false`.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Same goes for `FunctionImportGlobalProcessing::run`.

The return value was used, but it was always `false`.</pre>
</div>
</content>
</entry>
<entry>
<title>[ThinLTO]Supports declaration import for global variables in distributed ThinLTO (#117616)</title>
<updated>2024-12-03T00:15:52+00:00</updated>
<author>
<name>Mingming Liu</name>
<email>mingmingl@google.com</email>
</author>
<published>2024-12-03T00:15:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=6faf17b7626bfdeea977a7a333c6e20ed677615d'/>
<id>6faf17b7626bfdeea977a7a333c6e20ed677615d</id>
<content type='text'>
When `-import-declaration` option is enabled, declaration import is
supported for functions. https://github.com/llvm/llvm-project/pull/88024
has the context for this option.

This patch supports declaration import for global variables in
distributed ThinLTO. The motivating use case is to propagate `dso_local`
attribute of global variables across modules, to optimize global
variable access when a binary is built with
`-fno-direct-access-external-data`.
* With `-fdirect-access-external-data`, non thread-local global
variables will [have `dso_local`
attributes](https://github.com/llvm/llvm-project/blob/fe3c23b439b9a2d00442d9bc6a4ca86f73066a3d/clang/lib/CodeGen/CodeGenModule.cpp#L1730-L1746).
This optimizes the global variable access as shown by
https://gcc.godbolt.org/z/vMzWcKdh3</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When `-import-declaration` option is enabled, declaration import is
supported for functions. https://github.com/llvm/llvm-project/pull/88024
has the context for this option.

This patch supports declaration import for global variables in
distributed ThinLTO. The motivating use case is to propagate `dso_local`
attribute of global variables across modules, to optimize global
variable access when a binary is built with
`-fno-direct-access-external-data`.
* With `-fdirect-access-external-data`, non thread-local global
variables will [have `dso_local`
attributes](https://github.com/llvm/llvm-project/blob/fe3c23b439b9a2d00442d9bc6a4ca86f73066a3d/clang/lib/CodeGen/CodeGenModule.cpp#L1730-L1746).
This optimizes the global variable access as shown by
https://gcc.godbolt.org/z/vMzWcKdh3</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Use .at instead of .lookup to avoid copies. (NFC) (#117888)</title>
<updated>2024-11-27T17:41:29+00:00</updated>
<author>
<name>Krzysztof Pszeniczny</name>
<email>kpszeniczny@google.com</email>
</author>
<published>2024-11-27T17:41:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=991154d0fbc951e2b999589a95dabc7deff7acd1'/>
<id>991154d0fbc951e2b999589a95dabc7deff7acd1</id>
<content type='text'>
`DenseMap::lookup` returns by value (because it default-creates the
returned value if the key isn't present in the map), which means that we
do a lot of copying here. Since we assert that something is present in
the returned value two lines below this call, it's safe to use `.at`
here instead.

Copying and then destroying dense maps here is responsible for 60% of
the time spent in LTO indexing in a large internal build.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
`DenseMap::lookup` returns by value (because it default-creates the
returned value if the key isn't present in the map), which means that we
do a lot of copying here. Since we assert that something is present in
the returned value two lines below this call, it's safe to use `.at`
here instead.

Copying and then destroying dense maps here is responsible for 60% of
the time spent in LTO indexing in a large internal build.</pre>
</div>
</content>
</entry>
<entry>
<title>Make WriteIndexesThinBackend multi threaded (#109847)</title>
<updated>2024-10-07T15:16:46+00:00</updated>
<author>
<name>Nuri Amari</name>
<email>nuri.amari99@gmail.com</email>
</author>
<published>2024-10-07T15:16:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2edd897a4227e481af33e8e43090ab088cd9d953'/>
<id>2edd897a4227e481af33e8e43090ab088cd9d953</id>
<content type='text'>
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.

In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.

---------

Co-authored-by: Nuri Amari &lt;nuriamari@fb.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.

In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.

---------

Co-authored-by: Nuri Amari &lt;nuriamari@fb.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[nfc][ctx_prof] Change some internal "set" types</title>
<updated>2024-09-12T17:34:53+00:00</updated>
<author>
<name>Mircea Trofin</name>
<email>mtrofin@google.com</email>
</author>
<published>2024-09-12T17:31:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=885ac29910a23db923292fe3fc09d0ec105186dc'/>
<id>885ac29910a23db923292fe3fc09d0ec105186dc</id>
<content type='text'>
- the set used for targets under a callsite is simpler to use if iterators
  are stable (it gets manipulated during updates)
- the set used to fetch the transitive closure of GUIDs under a node can
  be left as a choice to the user.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- the set used for targets under a callsite is simpler to use if iterators
  are stable (it gets manipulated during updates)
- the set used to fetch the transitive closure of GUIDs under a node can
  be left as a choice to the user.
</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Remove unused includes (NFC) (#108110)</title>
<updated>2024-09-11T02:36:04+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-09-11T02:36:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3dad29b677e427bf69c035605a16efd065576829'/>
<id>3dad29b677e427bf69c035605a16efd065576829</id>
<content type='text'>
clangd reports these as unused headers.  My manual inspection agrees
with the findings.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
clangd reports these as unused headers.  My manual inspection agrees
with the findings.</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Reduce memory usage for import lists (#106772)</title>
<updated>2024-09-01T15:36:06+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-09-01T15:36:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5c0d61e318a77434487fcec8361d8110fb06e59d'/>
<id>5c0d61e318a77434487fcec8361d8110fb06e59d</id>
<content type='text'>
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet&lt;uint32_t&gt; with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map&lt;GUID, ImportKind&gt; takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet&lt;uint32_t&gt; with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map&lt;GUID, ImportKind&gt; takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Make getImportType a proper function (NFC) (#106450)</title>
<updated>2024-08-28T20:53:07+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-08-28T20:53:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=eb9c49c900f43aa79811f80847c97c6596197430'/>
<id>eb9c49c900f43aa79811f80847c97c6596197430</id>
<content type='text'>
I'm planning to reduce the memory footprint of ThinLTO indexing by
changing ImportMapTy.  A look-up of the import type will involve data
private to ImportMapTy, so it must be done by a member function of
ImportMapTy.  This patch turns getImportType into a member function so
that a subsequent "real" change will just have to update the
implementation of the function in place.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I'm planning to reduce the memory footprint of ThinLTO indexing by
changing ImportMapTy.  A look-up of the import type will involve data
private to ImportMapTy, so it must be done by a member function of
ImportMapTy.  This patch turns getImportType into a member function so
that a subsequent "real" change will just have to update the
implementation of the function in place.</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Introduce new type alias ImportListsTy (NFC) (#106420)</title>
<updated>2024-08-28T17:42:12+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-08-28T17:42:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4f15039cf2d59dbb889903aff8ac32ff266c8c48'/>
<id>4f15039cf2d59dbb889903aff8ac32ff266c8c48</id>
<content type='text'>
The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  Once this patch lands, I'm
planning to change the type slightly.  The new type alias allows us to
update the type without touching many places.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  Once this patch lands, I'm
planning to change the type slightly.  The new type alias allows us to
update the type without touching many places.</pre>
</div>
</content>
</entry>
<entry>
<title>[LTO] Introduce a helper lambda in gatherImportedSummariesForModule (NFC) (#106251)</title>
<updated>2024-08-27T19:43:07+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-08-27T19:43:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=29bb523b7c3502b47737ced2a8b20678809ec9de'/>
<id>29bb523b7c3502b47737ced2a8b20678809ec9de</id>
<content type='text'>
This patch forward ports the heterogeneous std::map::operator[]() from
C++26 so that we can look up the map without allocating an instance of
std::string when the key-value pair exists in the map.

The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  The new list will be a hash set of
tuples (SourceModule, GUID, ImportType) represented in a space
efficient manner.  That means that as we iterate over the hash set, we
encounter SourceModule as many times as GUID.  We don't want to create
a temporary instance of std::string every time we look up
ModuleToSummariesForIndex like:

auto &amp;SummariesForIndex =
ModuleToSummariesForIndex[std::string(ILI.first)];

This patch removes the need to create the temporaries by enabling the
hetegeneous lookup with std::set&lt;K, V, std::less&lt;&gt;&gt; and forward
porting std::map::operator[]() from C++26.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch forward ports the heterogeneous std::map::operator[]() from
C++26 so that we can look up the map without allocating an instance of
std::string when the key-value pair exists in the map.

The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  The new list will be a hash set of
tuples (SourceModule, GUID, ImportType) represented in a space
efficient manner.  That means that as we iterate over the hash set, we
encounter SourceModule as many times as GUID.  We don't want to create
a temporary instance of std::string every time we look up
ModuleToSummariesForIndex like:

auto &amp;SummariesForIndex =
ModuleToSummariesForIndex[std::string(ILI.first)];

This patch removes the need to create the temporaries by enabling the
hetegeneous lookup with std::set&lt;K, V, std::less&lt;&gt;&gt; and forward
porting std::map::operator[]() from C++26.</pre>
</div>
</content>
</entry>
</feed>
