<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/test/Transforms/LoadStoreVectorizer, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135)</title>
<updated>2025-11-20T01:39:10+00:00</updated>
<author>
<name>Gang Chen</name>
<email>gangc@amd.com</email>
</author>
<published>2025-11-20T01:39:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9e9fe08b16ea2c4d9867fb4974edf2a3776d6ece'/>
<id>9e9fe08b16ea2c4d9867fb4974edf2a3776d6ece</id>
<content type='text'>
This is the fixed version of
https://github.com/llvm/llvm-project/pull/163019</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the fixed version of
https://github.com/llvm/llvm-project/pull/163019</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[Transform][LoadStoreVectorizer] allow redundant in Chain (#1… (#168105)</title>
<updated>2025-11-14T19:49:09+00:00</updated>
<author>
<name>Gang Chen</name>
<email>gangc@amd.com</email>
</author>
<published>2025-11-14T19:49:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a407d02752f9d28fe01dd2fe5cdc12344ab38753'/>
<id>a407d02752f9d28fe01dd2fe5cdc12344ab38753</id>
<content type='text'>
…63019)"

This reverts commit 92e5608ffa6ff39ac3707f29418cc9482471f5d9.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
…63019)"

This reverts commit 92e5608ffa6ff39ac3707f29418cc9482471f5d9.</pre>
</div>
</content>
</entry>
<entry>
<title>[Transform][LoadStoreVectorizer] allow redundant in Chain (#163019)</title>
<updated>2025-11-13T20:19:29+00:00</updated>
<author>
<name>Gang Chen</name>
<email>gangc@amd.com</email>
</author>
<published>2025-11-13T20:19:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=92e5608ffa6ff39ac3707f29418cc9482471f5d9'/>
<id>92e5608ffa6ff39ac3707f29418cc9482471f5d9</id>
<content type='text'>
This can absorb redundant loads when forming vector load. Can be used to
fix the situation created by VectorCombine. See:
https://discourse.llvm.org/t/what-is-the-purpose-of-vectorizeloadinsert-in-the-vectorcombine-pass/88532</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This can absorb redundant loads when forming vector load. Can be used to
fix the situation created by VectorCombine. See:
https://discourse.llvm.org/t/what-is-the-purpose-of-vectorizeloadinsert-in-the-vectorcombine-pass/88532</pre>
</div>
</content>
</entry>
<entry>
<title>[LoadStoreVectorizer] Batch alias analysis results to improve compile time (#147555)</title>
<updated>2025-07-10T16:23:33+00:00</updated>
<author>
<name>Drew Kersnar</name>
<email>dkersnar@nvidia.com</email>
</author>
<published>2025-07-10T16:23:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8e7461e29a7c9f1721758b30eb99b0ccab45a7cd'/>
<id>8e7461e29a7c9f1721758b30eb99b0ccab45a7cd</id>
<content type='text'>
This should be generally good for a lot of LSV cases, but the attached
test demonstrates a specific compile time issue that appears in the
event where the `CaptureTracking` default max uses is raised.

Without using batching alias analysis, this test takes 6 seconds to
compile in a release build. With, less than a second. This is because
the mechanism that proves `NoAlias` in this case is very expensive
(`CaptureTracking.cpp`), and caching the result leads to 2 calls to that
mechanism instead of ~300,000 (run with -stats to see the difference)

This test only demonstrates the compile time issue if
`capture-tracking-max-uses-to-explore` is set to at least 1024, because
with the default value of 100, the `CaptureTracking` analysis is not
run, `NoAlias` is not proven, and the vectorizer gives up early.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This should be generally good for a lot of LSV cases, but the attached
test demonstrates a specific compile time issue that appears in the
event where the `CaptureTracking` default max uses is raised.

Without using batching alias analysis, this test takes 6 seconds to
compile in a release build. With, less than a second. This is because
the mechanism that proves `NoAlias` in this case is very expensive
(`CaptureTracking.cpp`), and caching the result leads to 2 calls to that
mechanism instead of ~300,000 (run with -stats to see the difference)

This test only demonstrates the compile time issue if
`capture-tracking-max-uses-to-explore` is set to at least 1024, because
with the default value of 100, the `CaptureTracking` analysis is not
run, `NoAlias` is not proven, and the vectorizer gives up early.</pre>
</div>
</content>
</entry>
<entry>
<title>[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292)</title>
<updated>2025-05-13T20:36:09+00:00</updated>
<author>
<name>Drew Kersnar</name>
<email>dkersnar@nvidia.com</email>
</author>
<published>2025-05-13T20:36:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a1e1a84d2c8bb627945cda2a991539d39b034269'/>
<id>a1e1a84d2c8bb627945cda2a991539d39b034269</id>
<content type='text'>
PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st</pre>
</div>
</content>
</entry>
<entry>
<title>[NFC] Precommit tests for an LSV patch (#138167)</title>
<updated>2025-05-01T16:50:31+00:00</updated>
<author>
<name>Anshil Gandhi</name>
<email>95053726+gandhi56@users.noreply.github.com</email>
</author>
<published>2025-05-01T16:50:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=dadd91e793a7622e0ca34ad9c3993a01a437b651'/>
<id>dadd91e793a7622e0ca34ad9c3993a01a437b651</id>
<content type='text'>
Autogenerate checks for merge-vectors.ll and introduce
merge-vectors-complex.ll with mismatched types.
Related PR: https://github.com/llvm/llvm-project/pull/134436

This is a reland of https://github.com/llvm/llvm-project/pull/138155,
which was reverted due to missed nits.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Autogenerate checks for merge-vectors.ll and introduce
merge-vectors-complex.ll with mismatched types.
Related PR: https://github.com/llvm/llvm-project/pull/134436

This is a reland of https://github.com/llvm/llvm-project/pull/138155,
which was reverted due to missed nits.</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[NFC] Precommit: Autogenerate checks for an LSV test" (#138161)</title>
<updated>2025-05-01T16:09:51+00:00</updated>
<author>
<name>Anshil Gandhi</name>
<email>95053726+gandhi56@users.noreply.github.com</email>
</author>
<published>2025-05-01T16:09:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a7aca819d44b8d67f2cffd452e6b63741c83cd62'/>
<id>a7aca819d44b8d67f2cffd452e6b63741c83cd62</id>
<content type='text'>
Reverts llvm/llvm-project#138155</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reverts llvm/llvm-project#138155</pre>
</div>
</content>
</entry>
<entry>
<title>[NFC] Precommit: Autogenerate checks for an LSV test (#138155)</title>
<updated>2025-05-01T16:00:43+00:00</updated>
<author>
<name>Anshil Gandhi</name>
<email>95053726+gandhi56@users.noreply.github.com</email>
</author>
<published>2025-05-01T16:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0e9740ea1783ceaf8686b13ab7bf9278f34aef6a'/>
<id>0e9740ea1783ceaf8686b13ab7bf9278f34aef6a</id>
<content type='text'>
Related PR: https://github.com/llvm/llvm-project/pull/134436</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Related PR: https://github.com/llvm/llvm-project/pull/134436</pre>
</div>
</content>
</entry>
<entry>
<title>[LoadStoreVectorizer] Remove more unnecessary data layouts from tests</title>
<updated>2025-04-30T17:58:33+00:00</updated>
<author>
<name>Alexander Richardson</name>
<email>alexrichardson@google.com</email>
</author>
<published>2025-04-30T17:58:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a57847232f3b6d23c2d4d183be1964607140bf7b'/>
<id>a57847232f3b6d23c2d4d183be1964607140bf7b</id>
<content type='text'>
The tests in this directory all depend on the AMDGPU target being
present so we can let opt infer the data layout.

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137924
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The tests in this directory all depend on the AMDGPU target being
present so we can let opt infer the data layout.

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137924
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Fix edge case of buffer OOB handling (#115479)</title>
<updated>2025-03-07T07:56:44+00:00</updated>
<author>
<name>Piotr Sobczak</name>
<email>piotr.sobczak@amd.com</email>
</author>
<published>2025-03-07T07:56:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=170c0dac4488f9cfbc67e9593ebe6ad01cfa8f32'/>
<id>170c0dac4488f9cfbc67e9593ebe6ad01cfa8f32</id>
<content type='text'>
Strengthen out-of-bounds guarantees for buffer accesses by disallowing
buffer accesses with alignment lower than natural alignment.

This is needed to specifically address the edge case where an access
starts out-of-bounds and then enters in-bounds, as the hardware would
treat the entire access as being out-of-bounds. This is normally not
needed for most users, but at least one graphics device extension
(VK_EXT_robustness2) has very strict requirements - in-bounds accesses
must return correct value, and out-of-bounds accesses must return zero.

The direct consequence of the patch is that a buffer access at negative
address is not merged by load-store-vectorizer with one at a positive
address, which fixes a CTS test.

Targets that do not care about the new behavior are advised to use the
new target feature relaxed-buffer-oob-mode that maintains the state from
before the patch.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Strengthen out-of-bounds guarantees for buffer accesses by disallowing
buffer accesses with alignment lower than natural alignment.

This is needed to specifically address the edge case where an access
starts out-of-bounds and then enters in-bounds, as the hardware would
treat the entire access as being out-of-bounds. This is normally not
needed for most users, but at least one graphics device extension
(VK_EXT_robustness2) has very strict requirements - in-bounds accesses
must return correct value, and out-of-bounds accesses must return zero.

The direct consequence of the patch is that a buffer access at negative
address is not merged by load-store-vectorizer with one at a positive
address, which fixes a CTS test.

Targets that do not care about the new behavior are advised to use the
new target feature relaxed-buffer-oob-mode that maintains the state from
before the patch.</pre>
</div>
</content>
</entry>
</feed>
