<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp, branch users/meinersbur/flang_runtime_split-headers</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU] Fix module split's assumption on kernels</title>
<updated>2024-12-06T22:56:31+00:00</updated>
<author>
<name>Siu Chi Chan</name>
<email>siuchi.chan@amd.com</email>
</author>
<published>2024-11-14T01:01:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=83ad90d851f9e32a51d56193125ab596cc3636b6'/>
<id>83ad90d851f9e32a51d56193125ab596cc3636b6</id>
<content type='text'>
Module split assumes that a kernel function must have an external
linkage; however, that isn't the case.  For example, a static kernel
function will have a weak_odr linkage

Change-Id: I1e5dee0de1fd866b365f4090a574e1b2961f8dca
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Module split assumes that a kernel function must have an external
linkage; however, that isn't the case.  For example, a static kernel
function will have a weak_odr linkage

Change-Id: I1e5dee0de1fd866b365f4090a574e1b2961f8dca
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][SplitModule] Fix unintentional integer division (#117586)</title>
<updated>2024-11-27T10:18:52+00:00</updated>
<author>
<name>Fraser Cormack</name>
<email>fraser@codeplay.com</email>
</author>
<published>2024-11-27T10:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=345b3319c8544a0124aed9602c31f8793228ab63'/>
<id>345b3319c8544a0124aed9602c31f8793228ab63</id>
<content type='text'>
A static analysis tool warned that a division was always being performed
in integer division, so was either 0.0 or 1.0.

This doesn't seem intentional, so has been fixed to return a true ratio
using floating-point division. This in turn showed a bug where a
comparison against this ratio was incorrect.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A static analysis tool warned that a division was always being performed
in integer division, so was either 0.0 or 1.0.

This doesn't seem intentional, so has been fixed to return a true ratio
using floating-point division. This in turn showed a bug where a
comparison against this ratio was incorrect.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][SplitModule] Fix potential divide by zero (#117602)</title>
<updated>2024-11-26T10:05:09+00:00</updated>
<author>
<name>Fraser Cormack</name>
<email>fraser@codeplay.com</email>
</author>
<published>2024-11-26T10:05:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3414993eaffcfa2cb4ff723c8468434d56dff213'/>
<id>3414993eaffcfa2cb4ff723c8468434d56dff213</id>
<content type='text'>
A static analysis tool found that ModuleCost could be zero, so would
perform divide by zero when being printed. Perhaps this is unreachable
in practice, but the fix is straightforward enough and unlikely to be a
performance concern.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A static analysis tool found that ModuleCost could be zero, so would
perform divide by zero when being printed. Perhaps this is unreachable
in practice, but the fix is straightforward enough and unlikely to be a
performance concern.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Remove unused includes (NFC) (#116154)</title>
<updated>2024-11-14T05:10:03+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2024-11-14T05:10:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=be187369a03bf2df8bdbc76ecd381377b3bb6074'/>
<id>be187369a03bf2df8bdbc76ecd381377b3bb6074</id>
<content type='text'>
Identified with misc-include-cleaner.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Identified with misc-include-cleaner.</pre>
</div>
</content>
</entry>
<entry>
<title>(reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)</title>
<updated>2024-10-15T05:16:57+00:00</updated>
<author>
<name>Pierre van Houtryve</name>
<email>pierre.vanhoutryve@amd.com</email>
</author>
<published>2024-10-14T06:55:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b3a8400afa460a8372016e1abe5729cd4949b7d2'/>
<id>b3a8400afa460a8372016e1abe5729cd4949b7d2</id>
<content type='text'>
(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"</title>
<updated>2024-10-14T21:26:15+00:00</updated>
<author>
<name>Nico Weber</name>
<email>thakis@chromium.org</email>
</author>
<published>2024-10-14T21:26:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=140cbca83d2cf9ebb1718671fdd251fef5bc63b3'/>
<id>140cbca83d2cf9ebb1718671fdd251fef5bc63b3</id>
<content type='text'>
This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][SplitModule] Handle !callees metadata (#108802)</title>
<updated>2024-10-14T06:55:12+00:00</updated>
<author>
<name>Pierre van Houtryve</name>
<email>pierre.vanhoutryve@amd.com</email>
</author>
<published>2024-10-14T06:55:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4a0dc3ef36ceff20787ff277a1fb6a1b513c4934'/>
<id>4a0dc3ef36ceff20787ff277a1fb6a1b513c4934</id>
<content type='text'>
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][SplitModule] Cleanup CallsExternal Handling (#106528)</title>
<updated>2024-10-11T06:37:20+00:00</updated>
<author>
<name>Pierre van Houtryve</name>
<email>pierre.vanhoutryve@amd.com</email>
</author>
<published>2024-10-11T06:37:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=d656b2063262d59c3565e63095104c01d1f6a5a3'/>
<id>d656b2063262d59c3565e63095104c01d1f6a5a3</id>
<content type='text'>
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Remove unused SplitGraph::Node::getFullCost</title>
<updated>2024-09-09T11:50:48+00:00</updated>
<author>
<name>pvanhout</name>
<email>pierre.vanhoutryve@amd.com</email>
</author>
<published>2024-09-09T11:50:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=959d84044a70da08923fe221f999f4e406094ee9'/>
<id>959d84044a70da08923fe221f999f4e406094ee9</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)</title>
<updated>2024-09-09T07:06:34+00:00</updated>
<author>
<name>Pierre van Houtryve</name>
<email>pierre.vanhoutryve@amd.com</email>
</author>
<published>2024-09-09T07:06:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9347b66cfcd9acf84dbbd500b6344041c587f6a9'/>
<id>9347b66cfcd9acf84dbbd500b6344041c587f6a9</id>
<content type='text'>
Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled first)
 - Fix for broken proposal selection
 - c3cb27370af40e491446164840766478d3258429 included

Original commit description below
---

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation &amp; the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being &gt;1% of the runtime.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled first)
 - Fix for broken proposal selection
 - c3cb27370af40e491446164840766478d3258429 included

Original commit description below
---

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation &amp; the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being &gt;1% of the runtime.</pre>
</div>
</content>
</entry>
</feed>
