<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Transforms/Utils/LoopUnroll.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>Remove unused &lt;type_traits&gt; inclusion</title>
<updated>2025-11-11T12:33:38+00:00</updated>
<author>
<name>serge-sans-paille</name>
<email>sguelton@mozilla.com</email>
</author>
<published>2025-11-10T14:06:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a6cd4003307b6ada73a3a8a0fb24ea1102164af0'/>
<id>a6cd4003307b6ada73a3a8a0fb24ea1102164af0</id>
<content type='text'>
Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible
this improves compilation time, while not being too intrusive on the
codebase.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible
this improves compilation time, while not being too intrusive on the
codebase.
</pre>
</div>
</content>
</entry>
<entry>
<title>[LoopUnroll] Fix block frequencies for epilogue (#159163)</title>
<updated>2025-10-31T15:01:42+00:00</updated>
<author>
<name>Joel E. Denny</name>
<email>jdenny.ornl@gmail.com</email>
</author>
<published>2025-10-31T15:01:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=cc8ff73fbab875e33071b23ff6e4b512d5adf64e'/>
<id>cc8ff73fbab875e33071b23ff6e4b512d5adf64e</id>
<content type='text'>
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.

For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.

This patch fixes branch weights in tests that PR #157754 adversely
affected.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.

For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.

This patch fixes branch weights in tests that PR #157754 adversely
affected.</pre>
</div>
</content>
</entry>
<entry>
<title>[LoopUnroll] Fix block frequencies when no runtime (#157754)</title>
<updated>2025-10-31T14:44:27+00:00</updated>
<author>
<name>Joel E. Denny</name>
<email>jdenny.ornl@gmail.com</email>
</author>
<published>2025-10-31T14:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=24557cce40b7d90c289b4a537ed95eaf6522f53c'/>
<id>24557cce40b7d90c289b4a537ed95eaf6522f53c</id>
<content type='text'>
This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block

Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a remainder
loop, this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
`llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
original estimated trip count (e.g., 10) divided by the unroll count
(e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a remainder loop and complete unrolling, and
there are two associated tests whose branch weights this patch adversely
affects. They will be addressed in future patches that should land with
this patch.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block

Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a remainder
loop, this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
`llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
original estimated trip count (e.g., 10) divided by the unroll count
(e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a remainder loop and complete unrolling, and
there are two associated tests whose branch weights this patch adversely
affects. They will be addressed in future patches that should land with
this patch.</pre>
</div>
</content>
</entry>
<entry>
<title>[llvm] Remove redundant control flow statements (NFC) (#163509)</title>
<updated>2025-10-15T13:54:30+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-10-15T13:54:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b82baf0f5bce0aa787ac280f2a43943d211b9cc5'/>
<id>b82baf0f5bce0aa787ac280f2a43943d211b9cc5</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64] Enable RT and partial unrolling with reductions for Apple CPUs. (#149699)</title>
<updated>2025-09-09T13:23:30+00:00</updated>
<author>
<name>Florian Hahn</name>
<email>flo@fhahn.com</email>
</author>
<published>2025-09-09T13:23:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3ea089ba197681aa3ccbb7325872c58e2eb542e1'/>
<id>3ea089ba197681aa3ccbb7325872c58e2eb542e1</id>
<content type='text'>
Update unrolling preferences for Apple Silicon CPUs to enable partial
unrolling and runtime unrolling for small loops with reductions.

This builds on top of unroller changes to introduce parallel reduction
phis, if possible: https://github.com/llvm/llvm-project/pull/149470.

PR: https://github.com/llvm/llvm-project/pull/149699</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update unrolling preferences for Apple Silicon CPUs to enable partial
unrolling and runtime unrolling for small loops with reductions.

This builds on top of unroller changes to introduce parallel reduction
phis, if possible: https://github.com/llvm/llvm-project/pull/149470.

PR: https://github.com/llvm/llvm-project/pull/149699</pre>
</div>
</content>
</entry>
<entry>
<title>[LoopUnroll] Introduce parallel reduction phis when unrolling. (#149470)</title>
<updated>2025-09-04T19:54:09+00:00</updated>
<author>
<name>Florian Hahn</name>
<email>flo@fhahn.com</email>
</author>
<published>2025-09-04T19:54:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2d9e452ab0f3ae864f587e2c9313541e499422e0'/>
<id>2d9e452ab0f3ae864f587e2c9313541e499422e0</id>
<content type='text'>
When partially or runtime unrolling loops with reductions, currently the
reductions are performed in-order in the loop, negating most benefits
from unrolling such loops.

This patch extends unrolling code-gen to keep a parallel reduction phi
per unrolled iteration and combining the final result after the loop.
For out-of-order CPUs, this allows executing mutliple reduction chains
in parallel.

For now, the initial transformation is restricted to cases where we
unroll a small number of iterations (hard-coded to 4, but should maybe
be capped by TTI depending on the execution units), to avoid introducing
an excessive amount of parallel phis.

It also requires single block loops for now, where the unrolled
iterations are known to not exit the loop (either due to runtime
unrolling or partial unrolling). This ensures that the unrolled loop
will have a single basic block, with a single exit block where we can
place the final reduction value computation.

The initial implementation also only supports parallelizing loops with a
single reduction and only integer reductions. Those restrictions are
just to keep the initial implementation simpler, and can easily be
lifted as follow-ups.

With corresponding TTI to the AArch64 unrolling preferences which I will
also share soon, this triggers in ~300 loops across a wide range of
workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender,
brotli, zstd and more.

PR: https://github.com/llvm/llvm-project/pull/149470</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When partially or runtime unrolling loops with reductions, currently the
reductions are performed in-order in the loop, negating most benefits
from unrolling such loops.

This patch extends unrolling code-gen to keep a parallel reduction phi
per unrolled iteration and combining the final result after the loop.
For out-of-order CPUs, this allows executing mutliple reduction chains
in parallel.

For now, the initial transformation is restricted to cases where we
unroll a small number of iterations (hard-coded to 4, but should maybe
be capped by TTI depending on the execution units), to avoid introducing
an excessive amount of parallel phis.

It also requires single block loops for now, where the unrolled
iterations are known to not exit the loop (either due to runtime
unrolling or partial unrolling). This ensures that the unrolled loop
will have a single basic block, with a single exit block where we can
place the final reduction value computation.

The initial implementation also only supports parallelizing loops with a
single reduction and only integer reductions. Those restrictions are
just to keep the initial implementation simpler, and can easily be
lifted as follow-ups.

With corresponding TTI to the AArch64 unrolling preferences which I will
also share soon, this triggers in ~300 loops across a wide range of
workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender,
brotli, zstd and more.

PR: https://github.com/llvm/llvm-project/pull/149470</pre>
</div>
</content>
</entry>
<entry>
<title>[Transforms] Remove unused includes (NFC) (#141357)</title>
<updated>2025-05-24T16:37:43+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-05-24T16:37:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0ef8ef66cc7a89c55ddeb739b0e3b807b4d2947c'/>
<id>0ef8ef66cc7a89c55ddeb739b0e3b807b4d2947c</id>
<content type='text'>
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.</pre>
</div>
</content>
</entry>
<entry>
<title>[KeyInstr][LoopUnroll] Remap atoms while unrolling (#133489)</title>
<updated>2025-05-09T12:47:16+00:00</updated>
<author>
<name>Orlando Cazalet-Hyams</name>
<email>orlando.hyams@sony.com</email>
</author>
<published>2025-05-09T12:47:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=c453da7b7c2d964352941484b9f06047d2cc919e'/>
<id>c453da7b7c2d964352941484b9f06047d2cc919e</id>
<content type='text'>
RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668</pre>
</div>
</content>
</entry>
<entry>
<title>[llvm] Use llvm::replace (NFC) (#137481)</title>
<updated>2025-04-27T01:18:09+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-04-27T01:18:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8210cdd764cb0b334f2bc205b316e94480c47c88'/>
<id>8210cdd764cb0b334f2bc205b316e94480c47c88</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[Transforms] Use *Set::insert_range (NFC) (#132652)</title>
<updated>2025-03-24T02:42:53+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-03-24T02:42:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=73dc2afd2c334aac735caba94292052d63014c9d'/>
<id>73dc2afd2c334aac735caba94292052d63014c9d</id>
<content type='text'>
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);

In some cases, we can further fold that into the set declaration.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);

In some cases, we can further fold that into the set declaration.</pre>
</div>
</content>
</entry>
</feed>
