llvm-project.git/llvm/test/Transforms/LoopDataPrefetch, branch main

[RISCV] Enable prefetching writes (#130561)

2025-03-11T03:39:21+00:00

We should prefetch writes since `Zicbop` has `prefetch.w`.

[IR] Convert from nocapture to captures(none) (#123181)

2025-01-29T15:56:47+00:00

This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.

[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass (#118437)

2025-01-20T08:20:15+00:00

Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:

- `getCacheLineSize()`: 64 for loongarch64.
- `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements
taken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
- `enableWritePrefetching()`: store prefetch is supported by LoongArch,
so set WritePrefetching to true in default.
- `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still
use default values: 1 and UINT_MAX, so not override them.

After this commit, the test added by https://reviews.llvm.org/D146600
can generate llvm.prefetch intrinsic IR correctly.

Results of spec2017rate benchmarks (testing date: ref, copies: 1):
- For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can
bring about -1.58%/0.31%/0.07% performance improvement for int
benchmarks and 3.26%/3.73%/3.78% improvement for floating point
benchmarks. (Only O3+novec+prefetch decreases when testing intrate.)
- But prefetch results in performance reduction almost for every Fortran
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.

FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to
false for now due to the bad effect for Fortran.

[test][LoongArch] Add -mattr=+d option. NFC

2024-05-14T12:23:04+00:00

Because most of tests assume target-abi=`lp64d`, adding the
corresponding feature is reasonable.

rg -l loongarch -g '!*.s' | xargs sed -i '/mtriple=loongarch/ {/-mattr=/!{/target-abi/! s/mtriple=loongarch.. /&-mattr=+d /}}'

[RISCV] Enable LoopDataPrefetch pass (#66201)

2023-11-10T07:39:58+00:00

So that we can benefit from data prefetch when `Zicbop` extension is
supported.

Tune information for data prefetching are added in `RISCVTuneInfo`.

BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads

2023-10-25T03:27:39+00:00

BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.

[SCEV] Fix incorrect nsw inference for multiply of addrec (#66500)

2023-09-18T06:23:10+00:00

SCEV currently preserves the nsw flag when performing an nsw multiply of
an nsw addrec. While this is legal for nuw, this is not generally the
case for nsw.

This is because nsw mul does not distribute over nsw add:
https://alive2.llvm.org/ce/z/mergCt

Instead, we need either both nuw and nsw to be set
(https://alive2.llvm.org/ce/z/7wpgGc) or explicitly prove that the
distributed multiplications are also nsw
(https://alive2.llvm.org/ce/z/wef9su).

Fixes https://github.com/llvm/llvm-project/issues/66066.

[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm

2023-05-17T15:03:15+00:00

This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours  and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762

[LoongArch] Enable LoopDataPrefetch pass

2023-03-24T03:09:18+00:00

Keep `EnableLoopDataPrefetch` option off for now because
we need a few more TTIs and ISels.

This patch is inspired by http://reviews.llvm.org/D17943.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D146600

[Transforms] Convert some tests to opaque pointers (NFC)

2023-01-05T11:43:45+00:00

These are all tests where conversion worked automatically, and
required no manual fixup.