llvm-project.git/llvm/test/Transforms/LoadStoreVectorizer, branch main

Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135)

2025-11-20T01:39:10+00:00

This is the fixed version of
https://github.com/llvm/llvm-project/pull/163019

Revert "[Transform][LoadStoreVectorizer] allow redundant in Chain (#1… (#168105)

2025-11-14T19:49:09+00:00

…63019)"

This reverts commit 92e5608ffa6ff39ac3707f29418cc9482471f5d9.

[Transform][LoadStoreVectorizer] allow redundant in Chain (#163019)

2025-11-13T20:19:29+00:00

This can absorb redundant loads when forming vector load. Can be used to
fix the situation created by VectorCombine. See:
https://discourse.llvm.org/t/what-is-the-purpose-of-vectorizeloadinsert-in-the-vectorcombine-pass/88532

[LoadStoreVectorizer] Batch alias analysis results to improve compile time (#147555)

2025-07-10T16:23:33+00:00

This should be generally good for a lot of LSV cases, but the attached
test demonstrates a specific compile time issue that appears in the
event where the `CaptureTracking` default max uses is raised.

Without using batching alias analysis, this test takes 6 seconds to
compile in a release build. With, less than a second. This is because
the mechanism that proves `NoAlias` in this case is very expensive
(`CaptureTracking.cpp`), and caching the result leads to 2 calls to that
mechanism instead of ~300,000 (run with -stats to see the difference)

This test only demonstrates the compile time issue if
`capture-tracking-max-uses-to-explore` is set to at least 1024, because
with the default value of 100, the `CaptureTracking` analysis is not
run, `NoAlias` is not proven, and the vectorizer gives up early.

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292)

2025-05-13T20:36:09+00:00

PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st

[NFC] Precommit tests for an LSV patch (#138167)

2025-05-01T16:50:31+00:00

Autogenerate checks for merge-vectors.ll and introduce
merge-vectors-complex.ll with mismatched types.
Related PR: https://github.com/llvm/llvm-project/pull/134436

This is a reland of https://github.com/llvm/llvm-project/pull/138155,
which was reverted due to missed nits.

Revert "[NFC] Precommit: Autogenerate checks for an LSV test" (#138161)

2025-05-01T16:09:51+00:00

Reverts llvm/llvm-project#138155

[NFC] Precommit: Autogenerate checks for an LSV test (#138155)

2025-05-01T16:00:43+00:00

Related PR: https://github.com/llvm/llvm-project/pull/134436

[LoadStoreVectorizer] Remove more unnecessary data layouts from tests

2025-04-30T17:58:33+00:00

The tests in this directory all depend on the AMDGPU target being
present so we can let opt infer the data layout.

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137924

[AMDGPU] Fix edge case of buffer OOB handling (#115479)

2025-03-07T07:56:44+00:00

Strengthen out-of-bounds guarantees for buffer accesses by disallowing
buffer accesses with alignment lower than natural alignment.

This is needed to specifically address the edge case where an access
starts out-of-bounds and then enters in-bounds, as the hardware would
treat the entire access as being out-of-bounds. This is normally not
needed for most users, but at least one graphics device extension
(VK_EXT_robustness2) has very strict requirements - in-bounds accesses
must return correct value, and out-of-bounds accesses must return zero.

The direct consequence of the patch is that a buffer access at negative
address is not merged by load-store-vectorizer with one at a positive
address, which fixes a CTS test.

Targets that do not care about the new behavior are advised to use the
new target feature relaxed-buffer-oob-mode that maintains the state from
before the patch.