llvm-project.git/llvm/test/CodeGen/AArch64/vecreduce-add.ll, branch users/mingmingl-llvm/samplefdo-profile-format

[AArch64] Lower zero cycle FPR zeroing (#156261)

2025-09-10T05:32:51+00:00

Lower FPR64, FPR32, FPR16 from `fmov` zeroing into NEON zeroing if the
target supports zero cycle zeroing of NEON registers but not for the
narrower classes.

It handles 2 cases: one in `AsmPrinter` where a FP zeroing from
immediate has been captured by pattern matching on instruction
selection, and second post RA in `AArch64InstrInfo::copyPhysReg` for
uncaptured/later-generated WZR/XZR fmovs.

Adds a subtarget feature called FeatureZCZeroingFPR128 that enables to
query wether the target supports zero cycle zeroing for FPR128 NEON
registers, and updates the appropriate processors.

[AArch64] Transform add(x, abs(y)) -> saba(x, y, 0) (#156615)

2025-09-08T13:14:24+00:00

Add a DAGCombine to perform the following transformations: 
- add(x, abs(y)) -> saba(x, y, 0)
- add(x, zext(abs(y))) -> sabal(x, y, 0)

As well as being a useful generic transformation, this also fixes an
issue where LLVM de-optimises [US]ABA neon ACLE intrinsics into separate
ABD+ADD instructions when one of the operands is a zero vector.

[AArch64][GlobalISel] Add push_mul_through_s/zext (#141551)

2025-07-31T06:38:11+00:00

This extends the existing push_add_through_zext to handle mul, similar
to performVectorExtCombine in SDAG. This allows muls to be pushed up the
tree of extends, operating on smaller vector types whilst keeping the
result the same (providing there are > 2x bits in the output).

matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps
generating dot instructions from add(ext(mul(ext, ext))).

[AArch64][GlobalISel] Ensure we have a insert-subreg v4i32 GPR pattern (#142724)

2025-06-06T16:44:33+00:00

This is the GISel equivalent of scalar_to_vector, making sure that when
we insert into undef we use a fmov that avoids the artificial dependency
on the previous register. This adds v2i32 and v2i64 patterns too for
similar reasons.

[AArch64] Add patterns for addv(sext) and addv(zext)

2025-02-15T17:04:32+00:00

This adds patterns for v8i8->i16 vaddlv and v4i16->i32 vaddlv, for both signed
and unsigned extends.

[AArch64][GlobalISel] Legalize more G_VECREDUCE_ADD operations. (#123392)

2025-01-30T22:17:34+00:00

Non-power-2 vectors will now be padded with zero elements, smaller
vectors will be widened using anyext, which I believe will be better in
many situations than padding with zeros, although some small types may
prefer being scalarized depending on the code. Padding with zeros may
not be best for all sizes (v5i8 being the worst), we can hopefully
improve that in the future but they no longer fall back. We scalarize
other types like i128.

[llvm][aarch64] fix copypaste typo (#120725)

2025-01-02T15:18:20+00:00

moved from #119881

Revert "[AArch64] Enable subreg liveness tracking by default."

2024-12-12T17:22:15+00:00

This reverts commit 9c319d5bb40785c969d2af76535ca62448dfafa7.

Some issues were discovered with the bootstrap builds, which
seem like they were caused by this commit. I'm reverting to investigate.

[AArch64] Enable subreg liveness tracking by default.

2024-12-12T16:05:49+00:00

Internal testing didn't flag up any functional- or performance regressions.

[AArch64] Add tablegen patterns for concat(extract-high, extract-high) (#118286)

2024-12-03T22:13:40+00:00

A `concat(extract-high(x), extract-high(y))` is the top half of x
inserted into the bottom half of y. This patch adds a tablegen pattern
to make sure that we generate a single i64 lane insert.