<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/test/CodeGen/AArch64/vecreduce-add.ll, branch users/mingmingl-llvm/samplefdo-profile-format</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AArch64] Lower zero cycle FPR zeroing (#156261)</title>
<updated>2025-09-10T05:32:51+00:00</updated>
<author>
<name>Tomer Shafir</name>
<email>tomer.shafir8@gmail.com</email>
</author>
<published>2025-09-10T05:32:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f059d2bac034acca39ad60a1b13aaec6afa0a3d6'/>
<id>f059d2bac034acca39ad60a1b13aaec6afa0a3d6</id>
<content type='text'>
Lower FPR64, FPR32, FPR16 from `fmov` zeroing into NEON zeroing if the
target supports zero cycle zeroing of NEON registers but not for the
narrower classes.

It handles 2 cases: one in `AsmPrinter` where a FP zeroing from
immediate has been captured by pattern matching on instruction
selection, and second post RA in `AArch64InstrInfo::copyPhysReg` for
uncaptured/later-generated WZR/XZR fmovs.

Adds a subtarget feature called FeatureZCZeroingFPR128 that enables to
query wether the target supports zero cycle zeroing for FPR128 NEON
registers, and updates the appropriate processors.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lower FPR64, FPR32, FPR16 from `fmov` zeroing into NEON zeroing if the
target supports zero cycle zeroing of NEON registers but not for the
narrower classes.

It handles 2 cases: one in `AsmPrinter` where a FP zeroing from
immediate has been captured by pattern matching on instruction
selection, and second post RA in `AArch64InstrInfo::copyPhysReg` for
uncaptured/later-generated WZR/XZR fmovs.

Adds a subtarget feature called FeatureZCZeroingFPR128 that enables to
query wether the target supports zero cycle zeroing for FPR128 NEON
registers, and updates the appropriate processors.</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64] Transform add(x, abs(y)) -&gt; saba(x, y, 0) (#156615)</title>
<updated>2025-09-08T13:14:24+00:00</updated>
<author>
<name>Hari Limaye</name>
<email>hari.limaye@arm.com</email>
</author>
<published>2025-09-08T13:14:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e38392b19b3989222b1d2b248069f3fb36bfea7a'/>
<id>e38392b19b3989222b1d2b248069f3fb36bfea7a</id>
<content type='text'>
Add a DAGCombine to perform the following transformations: 
- add(x, abs(y)) -&gt; saba(x, y, 0)
- add(x, zext(abs(y))) -&gt; sabal(x, y, 0)

As well as being a useful generic transformation, this also fixes an
issue where LLVM de-optimises [US]ABA neon ACLE intrinsics into separate
ABD+ADD instructions when one of the operands is a zero vector.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a DAGCombine to perform the following transformations: 
- add(x, abs(y)) -&gt; saba(x, y, 0)
- add(x, zext(abs(y))) -&gt; sabal(x, y, 0)

As well as being a useful generic transformation, this also fixes an
issue where LLVM de-optimises [US]ABA neon ACLE intrinsics into separate
ABD+ADD instructions when one of the operands is a zero vector.</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64][GlobalISel] Add push_mul_through_s/zext (#141551)</title>
<updated>2025-07-31T06:38:11+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-07-31T06:38:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3313cf4a832ca73cb0c10c797ffddf84040fd36d'/>
<id>3313cf4a832ca73cb0c10c797ffddf84040fd36d</id>
<content type='text'>
This extends the existing push_add_through_zext to handle mul, similar
to performVectorExtCombine in SDAG. This allows muls to be pushed up the
tree of extends, operating on smaller vector types whilst keeping the
result the same (providing there are &gt; 2x bits in the output).

matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps
generating dot instructions from add(ext(mul(ext, ext))).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This extends the existing push_add_through_zext to handle mul, similar
to performVectorExtCombine in SDAG. This allows muls to be pushed up the
tree of extends, operating on smaller vector types whilst keeping the
result the same (providing there are &gt; 2x bits in the output).

matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps
generating dot instructions from add(ext(mul(ext, ext))).</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64][GlobalISel] Ensure we have a insert-subreg v4i32 GPR pattern (#142724)</title>
<updated>2025-06-06T16:44:33+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-06-06T16:44:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=645c0d509c43ef95b62503552c51e57c6e49f0e0'/>
<id>645c0d509c43ef95b62503552c51e57c6e49f0e0</id>
<content type='text'>
This is the GISel equivalent of scalar_to_vector, making sure that when
we insert into undef we use a fmov that avoids the artificial dependency
on the previous register. This adds v2i32 and v2i64 patterns too for
similar reasons.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the GISel equivalent of scalar_to_vector, making sure that when
we insert into undef we use a fmov that avoids the artificial dependency
on the previous register. This adds v2i32 and v2i64 patterns too for
similar reasons.</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64] Add patterns for addv(sext) and addv(zext)</title>
<updated>2025-02-15T17:04:32+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-02-15T17:04:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=bfdf30e9b3d0b49344a651a5c7cd87be31d255c4'/>
<id>bfdf30e9b3d0b49344a651a5c7cd87be31d255c4</id>
<content type='text'>
This adds patterns for v8i8-&gt;i16 vaddlv and v4i16-&gt;i32 vaddlv, for both signed
and unsigned extends.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds patterns for v8i8-&gt;i16 vaddlv and v4i16-&gt;i32 vaddlv, for both signed
and unsigned extends.
</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64][GlobalISel] Legalize more G_VECREDUCE_ADD operations. (#123392)</title>
<updated>2025-01-30T22:17:34+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-01-30T22:17:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=ac7c199a63ddb7ba675e9da76dd07ffdbf07153a'/>
<id>ac7c199a63ddb7ba675e9da76dd07ffdbf07153a</id>
<content type='text'>
Non-power-2 vectors will now be padded with zero elements, smaller
vectors will be widened using anyext, which I believe will be better in
many situations than padding with zeros, although some small types may
prefer being scalarized depending on the code. Padding with zeros may
not be best for all sizes (v5i8 being the worst), we can hopefully
improve that in the future but they no longer fall back. We scalarize
other types like i128.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Non-power-2 vectors will now be padded with zero elements, smaller
vectors will be widened using anyext, which I believe will be better in
many situations than padding with zeros, although some small types may
prefer being scalarized depending on the code. Padding with zeros may
not be best for all sizes (v5i8 being the worst), we can hopefully
improve that in the future but they no longer fall back. We scalarize
other types like i128.</pre>
</div>
</content>
</entry>
<entry>
<title>[llvm][aarch64] fix copypaste typo (#120725)</title>
<updated>2025-01-02T15:18:20+00:00</updated>
<author>
<name>klensy</name>
<email>klensy@users.noreply.github.com</email>
</author>
<published>2025-01-02T15:18:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4a890c2c605640f48ecbaefebda8f3a42043ff3d'/>
<id>4a890c2c605640f48ecbaefebda8f3a42043ff3d</id>
<content type='text'>
moved from #119881</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
moved from #119881</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[AArch64] Enable subreg liveness tracking by default."</title>
<updated>2024-12-12T17:22:15+00:00</updated>
<author>
<name>Sander de Smalen</name>
<email>sander.desmalen@arm.com</email>
</author>
<published>2024-12-12T17:19:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=61510b51c33464a6bc15e4cf5b1ee07e2e0ec1c9'/>
<id>61510b51c33464a6bc15e4cf5b1ee07e2e0ec1c9</id>
<content type='text'>
This reverts commit 9c319d5bb40785c969d2af76535ca62448dfafa7.

Some issues were discovered with the bootstrap builds, which
seem like they were caused by this commit. I'm reverting to investigate.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 9c319d5bb40785c969d2af76535ca62448dfafa7.

Some issues were discovered with the bootstrap builds, which
seem like they were caused by this commit. I'm reverting to investigate.
</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64] Enable subreg liveness tracking by default.</title>
<updated>2024-12-12T16:05:49+00:00</updated>
<author>
<name>Sander de Smalen</name>
<email>sander.desmalen@arm.com</email>
</author>
<published>2024-10-15T12:54:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9c319d5bb40785c969d2af76535ca62448dfafa7'/>
<id>9c319d5bb40785c969d2af76535ca62448dfafa7</id>
<content type='text'>
Internal testing didn't flag up any functional- or performance regressions.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Internal testing didn't flag up any functional- or performance regressions.
</pre>
</div>
</content>
</entry>
<entry>
<title>[AArch64] Add tablegen patterns for concat(extract-high, extract-high) (#118286)</title>
<updated>2024-12-03T22:13:40+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2024-12-03T22:13:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1e7171f692d0fad37aad0674c6b7c904540a9a0c'/>
<id>1e7171f692d0fad37aad0674c6b7c904540a9a0c</id>
<content type='text'>
A `concat(extract-high(x), extract-high(y))` is the top half of x
inserted into the bottom half of y. This patch adds a tablegen pattern
to make sure that we generate a single i64 lane insert.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A `concat(extract-high(x), extract-high(y))` is the top half of x
inserted into the bottom half of y. This patch adds a tablegen pattern
to make sure that we generate a single i64 lane insert.</pre>
</div>
</content>
</entry>
</feed>
