llvm-project.git/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp, branch main

[llvm][RISCV] Implement Zilsd load/store pair optimization (#158640)

2025-11-21T05:04:05+00:00

This commit implements a complete load/store optimization pass for the
RISC-V Zilsd extension, which combines pairs of 32-bit load/store
instructions into single 64-bit LD/SD instructions when possible.
Default alignment is 8, it also provide zilsd-4byte-align feature for
looser condition.

Related work: https://reviews.llvm.org/D144002

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[RISCV][NewPM] Port RISCVCodeGenPrepare to the new pass manager (#168381)

2025-11-19T06:51:18+00:00

As suggested in the review for #160536 it would be good to follow up and
port the RISC-V passes to the new pass manager. This PR starts that
task. It provides the bare minimum necessary to run RISCVCodeGenPrepare
with opt -passes=riscv-codegenprepare. The approach used is modeled on
my observations of the AMDGPU backend and the recent work to port the
X86 passes.

The testing approach is to add a `-passes=riscv-foo` RUN line to at
least one test, if an appropriate test exists.

[RISCV] Add an option to enable CFIInstrInserter. (#164477)

2025-11-18T07:46:43+00:00

[RISCV] Introduce pass to promote double constants to a global array (#160536)

2025-11-05T14:31:17+00:00

As discussed in #153402, we have inefficiences in handling constant pool
access that are difficult to address. Using an IR pass to promote double
constants to a global allows a higher degree of control of code
generation for these accesses, resulting in improved performance on
benchmarks that might otherwise have high register pressure due to
accessing constant pool values separately rather than via a common base.

Directly promoting double constants to separate global values and
relying on the global merger to do a sensible thing would be one
potential avenue to explore, but it is _not_ done in this version of the
patch because:
* The global merger pass needs fixes. For instance it claims to be a
function pass, yet all of the work is done in initialisation. This means
that attempts by backends to schedule it after a given module pass don't
actually work as expected.
* The heuristics used can impact codegen unexpectedly, so I worry that
tweaking it to get the behaviour desired for promoted constants may lead
to other issues. This may be completely tractable though.

Now that #159352 has landed, the impact on terms if dynamically executed
instructions is slightly smaller (as we are starting from a better
baseline), but still worthwhile in lbm and nab from SPEC. Results below
are for rva22u64:

```
Benchmark                  Baseline         This PR   Diff (%)
============================================================
============================================================
500.perlbench_r         180668945687    180666122417     -0.00%
502.gcc_r               221274522161    221277565086      0.00%
505.mcf_r               134656204033    134656204066      0.00%
508.namd_r              217646645332    216699783858     -0.44%
510.parest_r            291731988950    291916190776      0.06%
511.povray_r             30983594866     31107718817      0.40%
519.lbm_r                91217999812     87405361395     -4.18%
520.omnetpp_r           137699867177    137674535853     -0.02%
523.xalancbmk_r         284730719514    284734023366      0.00%
525.x264_r              379107521547    379100250568     -0.00%
526.blender_r           659391437610    659447919505      0.01%
531.deepsjeng_r         350038121654    350038121656      0.00%
538.imagick_r           238568674979    238560772162     -0.00%
541.leela_r             405660852855    405654701346     -0.00%
544.nab_r               398215801848    391352111262     -1.72%
557.xz_r                129832192047    129832192055      0.00%

```

---
Notes for reviewers:
* As discussed at the sync-up meeting, the suggestion is to try to land
an incremental improvement to the status quo even if there is more work
to be done around the general issue of constant pool handling. We can
discuss here if that is actually the best next step or not, but I just
wanted to clarify that's why this is being posted with a somewhat narrow
scope.
* I've disabled transformations both for RV32 and on systems without D
as both cases saw some regressions.

[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC (#165121)

2025-10-26T04:20:20+00:00

- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.

[RISCV] Enabled debug entry support by default (#157703)

2025-09-12T08:49:33+00:00

This patch enables support for debug entry values. This improves quality
of debug info for RISC-V

[RISCV] Move MachineCombiner to addILPOpts() (#158071)

2025-09-12T07:23:38+00:00

So that it runs before `MachineCSE` and other passes.

Fixes https://github.com/llvm/llvm-project/issues/158063.

[llvm] Move data layout string computation to TargetParser (#157612)

2025-09-11T18:05:29+00:00

Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.

Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.

By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.

This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.

The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.

---------

Co-authored-by: Matt Arsenault 
Co-authored-by: Sergei Barannikov

Recommit "[RISCV] Don't run loop-idiom-vectorize pass in the O0 pipeline. (#156798)"

2025-09-04T16:50:12+00:00

With a dependency on the Passes library added this time.

Revert "[RISCV] Don't run loop-idiom-vectorize pass in the O0 pipeline. (#156798)"

2025-09-04T16:44:58+00:00

This reverts commit c51db9f6f3653859a05a073dab53274510edacff.

Getting build bot failures about undefined symbol: llvm::OptimizationLevel::O0.