llvm-project.git/llvm/test/CodeGen/RISCV/rvv/vxrm-insert.ll, branch users/mingmingl-llvm/samplefdo-profile-format

[RISCV] Replace undef with poison, NFC (#157396)

2025-09-09T07:16:16+00:00

Since undef is deprecated now, reuse of some tests case would cause CI
failure, this pr replaces most undef with poison

[RISCV] Use a precise size for MMO on scalable spill and fill (#133171)

2025-03-27T01:25:59+00:00

The primary effect of this is that we get proper scalable sizes printed
by the assembler, but this may also enable proper aliasing analysis. I
don't see any test changes resulting from the later.

Getting the size is slightly tricky as we store the scalable size as a
non-scalable quantity in the object size field for the frame index. We
really should remove that hack at some point...

For the synthetic tuple spills and fills, I dropped the size from the
split loads and stores to avoid incorrect (overly large) sizes. We could
also divide by the NF factor if we felt like writing the code to do so.

Revert "[RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608)" and follow up commit.

2025-02-13T17:57:33+00:00

This reverts commit 9cc8442a2b438962883bbbfd8ff62ad4b1a2b95d.
This reverts commit 859c871184bdfdebb47b5c7ec5e59348e0534e0b.

A performance regression was reported on the original review.  There appears
to have been an unexpected interaction here.  Reverting during investigation.

[RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608)

2025-02-12T20:31:39+00:00

This change introduces a default schedule model for the RISCV target
which leaves everything unchanged except the MicroOpBufferSize. The
default value of this flag in NoSched is 0. Both configurations
represent in order cores (i.e. no reorder window), the difference
between them comes down to whether heuristics other than latency are
allowed to apply. (Implementation details below)

I left the processor models which explicitly set MicroOpBufferSize=0
unchanged in this patch, but strongly suspect we should change those
too. Honestly, I think the LLVM wide default for this flag should be
changed, but don't have the energy to manage the updates for all
targets.

Implementation wise, the effect of this change is that schedule units
which are ready to run *except that* one of their predecessors may not
have completed yet are added to the Available list, not the Pending one.
The result of this is that it becomes possible to chose to schedule a
node before it's ready cycle if the heuristics prefer. This is
essentially chosing to insert a resource stall instead of e.g.
increasing register pressure.

Note that I was initially concerned there might be a correctness aspect
(as in some kind of exposed pipeline design), but the generic scheduler
doesn't seem to know how to insert noop instructions. Without that, a
program wouldn't be guaranteed to schedule on an exposed pipeline
depending on the program and schedule model in question.

The effect of this is that we sometimes prefer register pressure in
codegen results. This is mostly churn (or small wins) on scalar because
we have many more registers, but is of major importance on vector -
particularly high LMUL - because we effectively have many fewer
registers and the relative cost of spilling is much higher. This is a
significant improvement on high LMUL code quality for default rva23u
configurations - or any non -mcpu vector configuration for that matter.

Fixes #107532

[RISCV] Take known minimum vlen into account when calculating alignment padding in assignRVVStackObjectOffsets. (#110312)

2024-09-30T18:44:23+00:00

If we know vlen is a multiple of 16, we don't need any alignment
padding.

I wrote the code so that it would generate the minimum amount of padding
if the stack align was 32 or larger or if RVVBitsPerBlock was smaller
than half the stack alignment.

[RISCV] Support postRA vsetvl insertion pass (#70549)

2024-05-21T06:42:55+00:00

This patch try to get rid of vsetvl implict vl/vtype def-use chain and
improve the register allocation quality by moving the vsetvl insertion
pass after RVV register allocation

It will gain the benefit for the following optimization from

1. unblock scheduler's constraints by removing vl/vtype def-use chain
2. Support RVV re-materialization
3. Support partial spill

This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1|0>` to
control this feature and default set as disable.

[RISCV] Move RISCVInsertVSETVLI after CSR/VXRM passes (#91701)

2024-05-10T06:31:43+00:00

This further splits off #91440 to inch RISCVInsertVSETVLI closer to post
vector regalloc.

As noted in #91440, most of the diffs are from moving vsetvli insertion
after the vxrm/csr insertion passes, but these are getting conflated
with the changes from moving to LiveIntervals.

One idea was that we could try and remove some of these diffs by
manually moving back the vsetvlis past the vxrm/csr instructions. But
this meant having to touch up the LiveIntervals again which seemed to
lead to even more diffs.

This instead just moves RISCVInsertVSETVLI after RISCVInsertReadWriteCSR
and RISCVInsertWriteVXRM so we can isolate those changes.

[CodeGen] Convert tests to opaque pointers (NFC)

2024-02-05T13:07:09+00:00

[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)

2024-01-07T20:09:44+00:00

R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and
R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
`call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not
useful and can be removed now (matching AArch64 and PowerPC).

GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09
(70f35d72ef04cd23771875c1661c9975044a749c).

Without this patch, unconditionally changing MO_CALL to MO_PLT could
create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler
and GNU assembler.

[RISCV] Implement cross basic block VXRM write insertion. (#70382)

2023-11-02T21:09:27+00:00

This adds a new pass to insert VXRM writes for vector instructions. With
the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow to
calculate where a VXRM value is available. The second is a backwards
dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes
where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able to
eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require it.