<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)</title>
<updated>2025-11-14T11:00:57+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2025-11-14T11:00:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=72c69aefbae8bfb087622e642acbd0cba7578747'/>
<id>72c69aefbae8bfb087622e642acbd0cba7578747</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPUPromoteAlloca][NFC] Avoid unnecessary APInt/int64_t conversions (#157864)</title>
<updated>2025-09-12T07:51:55+00:00</updated>
<author>
<name>Fabian Ritter</name>
<email>fabian.ritter@amd.com</email>
</author>
<published>2025-09-12T07:51:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=7982980e078481fb1c52360691206f10160b1e5a'/>
<id>7982980e078481fb1c52360691206f10160b1e5a</id>
<content type='text'>
Follow-up to #157682</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Follow-up to #157682</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (#157810)</title>
<updated>2025-09-10T12:46:46+00:00</updated>
<author>
<name>Fabian Ritter</name>
<email>fabian.ritter@amd.com</email>
</author>
<published>2025-09-10T12:46:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5b81367960e71d40b948f433664790ec8a19f224'/>
<id>5b81367960e71d40b948f433664790ec8a19f224</id>
<content type='text'>
When we know that one operand of an addition is a constant, we might was
well put it on the right-hand side and avoid the work to canonicalize it
in a later pass.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we know that one operand of an addition is a constant, we might was
well put it on the right-hand side and avoid the work to canonicalize it
in a later pass.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca (#157682)</title>
<updated>2025-09-10T09:32:14+00:00</updated>
<author>
<name>Fabian Ritter</name>
<email>fabian.ritter@amd.com</email>
</author>
<published>2025-09-10T09:32:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b965f265388abd6abba7d553927ba7c154026af2'/>
<id>b965f265388abd6abba7d553927ba7c154026af2</id>
<content type='text'>
[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca

AMDGPUPromoteAlloca can transform i32 GEP offsets that operate on
allocas into i64 extractelement indices. Before this patch, negative GEP
offsets would be zero-extended, leading to wrong extractelement indices
with values around (2**32-1).

This fixes failing LlvmLibcCharacterConverterUTF32To8Test tests for
AMDGPU.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca

AMDGPUPromoteAlloca can transform i32 GEP offsets that operate on
allocas into i64 extractelement indices. Before this patch, negative GEP
offsets would be zero-extended, leading to wrong extractelement indices
with values around (2**32-1).

This fixes failing LlvmLibcCharacterConverterUTF32To8Test tests for
AMDGPU.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] AMDGPUPromoteAlloca: increase default max-regs to 32 (#155076)</title>
<updated>2025-08-26T00:30:16+00:00</updated>
<author>
<name>Carl Ritson</name>
<email>carl.ritson@amd.com</email>
</author>
<published>2025-08-26T00:30:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1f6648ccaaa6a578339ccddc6c1c70aa61b66b06'/>
<id>1f6648ccaaa6a578339ccddc6c1c70aa61b66b06</id>
<content type='text'>
Increase promote-alloca-to-vector-max-regs to 32 from 16.
This restores default promotion of 16 x double which was disabled by
#127973.

Fixes SWDEV-525817.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Increase promote-alloca-to-vector-max-regs to 32 from 16.
This restores default promotion of 16 x double which was disabled by
#127973.

Fixes SWDEV-525817.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)</title>
<updated>2025-06-24T09:09:36+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-06-24T09:09:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a201f8872a63aa336e4f79a40e196b6c20c9001e'/>
<id>a201f8872a63aa336e4f79a40e196b6c20c9001e</id>
<content type='text'>
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Remove legacy PM version of AMDGPUPromoteAllocaToVector (#144986)</title>
<updated>2025-06-20T07:43:39+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-06-20T07:43:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1cae21da47b1f53c3946534b12507a035fb283d2'/>
<id>1cae21da47b1f53c3946534b12507a035fb283d2</id>
<content type='text'>
This is only run in the middle end with the new pass manager now,
so garbage collect the old PM version.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is only run in the middle end with the new pass manager now,
so garbage collect the old PM version.</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[AMDGPU] Extended vector promotion to aggregate types." (#144366)</title>
<updated>2025-06-16T15:06:18+00:00</updated>
<author>
<name>zGoldthorpe</name>
<email>Zach.Goldthorpe@amd.com</email>
</author>
<published>2025-06-16T15:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4692f0d3448e32381a2b21c7359c7daed07a8850'/>
<id>4692f0d3448e32381a2b21c7359c7daed07a8850</id>
<content type='text'>
Reverts llvm/llvm-project#143784

Patch fails some internal tests. Will investigate more thoroughly before
attempting to remerge.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reverts llvm/llvm-project#143784

Patch fails some internal tests. Will investigate more thoroughly before
attempting to remerge.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Extended vector promotion to aggregate types. (#143784)</title>
<updated>2025-06-13T18:22:21+00:00</updated>
<author>
<name>zGoldthorpe</name>
<email>Zach.Goldthorpe@amd.com</email>
</author>
<published>2025-06-13T18:22:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=79e06bf1ae9961c5045134288fd8acc9173f6be2'/>
<id>79e06bf1ae9961c5045134288fd8acc9173f6be2</id>
<content type='text'>
Extends the `amdgpu-promote-alloca-to-vector` pass to also promote
aggregate types whose elements are all the same type to vector
registers.

The motivation for this extension was to account for IR generated by the
frontend containing several singleton struct types containing vectors or
vector-like elements, though the implementation is strictly more
general.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extends the `amdgpu-promote-alloca-to-vector` pass to also promote
aggregate types whose elements are all the same type to vector
registers.

The motivation for this extension was to account for IR generated by the
frontend containing several singleton struct types containing vectors or
vector-like elements, though the implementation is strictly more
general.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Promote nestedGEP allocas to vectors (#141199)</title>
<updated>2025-06-02T08:20:14+00:00</updated>
<author>
<name>Harrison Hao</name>
<email>57025411+harrisonGPU@users.noreply.github.com</email>
</author>
<published>2025-06-02T08:20:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1a7f5f58332d91f88a4305399d7f79aba046e19a'/>
<id>1a7f5f58332d91f88a4305399d7f79aba046e19a</id>
<content type='text'>
Supports the `nestedGEP`pattern that
 appears when an alloca is first indexed as an array element and then
 shifted with a byte‑offset GEP:

```llvm
  %SortedFragments = alloca [10 x &lt;2 x i32&gt;], addrspace(5), align 8
  %row  = getelementptr [10 x &lt;2 x i32&gt;], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1
```

The pass folds the two levels of addressing into a single vector lane
 index and keeps the whole object in a VGPR:

```llvm
  %vec  = freeze &lt;20 x i32&gt; poison              ; alloca promote  &lt;20 x i32&gt;
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement &lt;20 x i32&gt; %vec, i32 %idx
```

This eliminates the scratch read.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Supports the `nestedGEP`pattern that
 appears when an alloca is first indexed as an array element and then
 shifted with a byte‑offset GEP:

```llvm
  %SortedFragments = alloca [10 x &lt;2 x i32&gt;], addrspace(5), align 8
  %row  = getelementptr [10 x &lt;2 x i32&gt;], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1
```

The pass folds the two levels of addressing into a single vector lane
 index and keeps the whole object in a VGPR:

```llvm
  %vec  = freeze &lt;20 x i32&gt; poison              ; alloca promote  &lt;20 x i32&gt;
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement &lt;20 x i32&gt; %vec, i32 %idx
```

This eliminates the scratch read.</pre>
</div>
</content>
</entry>
</feed>
