<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[NFC] Check operand type instead of opcode (#168641)</title>
<updated>2025-11-19T02:37:56+00:00</updated>
<author>
<name>Shilei Tian</name>
<email>i@tianshilei.me</email>
</author>
<published>2025-11-19T02:37:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b4aa3d3ae334fea392f62df9693fab07142443ae'/>
<id>b4aa3d3ae334fea392f62df9693fab07142443ae</id>
<content type='text'>
A folow-up of #168458.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A folow-up of #168458.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit (#168458)</title>
<updated>2025-11-18T22:11:10+00:00</updated>
<author>
<name>Shilei Tian</name>
<email>i@tianshilei.me</email>
</author>
<published>2025-11-18T22:11:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=6665642ce40c70b65624a5aa67566725c5a87da5'/>
<id>6665642ce40c70b65624a5aa67566725c5a87da5</id>
<content type='text'>
On some targets, a packed f32 instruction can only read 32 bits from a
scalar operand (SGPR or literal) and replicates the bits to both
channels. In this case, we should not fold an immediate value if it
can't be replicated from its lower 32-bit.

Fixes SWDEV-567139.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On some targets, a packed f32 instruction can only read 32 bits from a
scalar operand (SGPR or literal) and replicates the bits to both
channels. In this case, we should not fold an immediate value if it
can't be replicated from its lower 32-bit.

Fixes SWDEV-567139.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] When shrinking and/or to bitset*, remove implicit scc def (#168128)</title>
<updated>2025-11-15T15:21:43+00:00</updated>
<author>
<name>LU-JOHN</name>
<email>John.Lu@amd.com</email>
</author>
<published>2025-11-15T15:21:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9fa15ef91601a2b254ee3c52dab03c1c33eb48d5'/>
<id>9fa15ef91601a2b254ee3c52dab03c1c33eb48d5</id>
<content type='text'>
When shrinking and/or to bitset* remove leftover implicit scc def.
bitset* instructions do not set scc.

Signed-off-by: John Lu &lt;John.Lu@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When shrinking and/or to bitset* remove leftover implicit scc def.
bitset* instructions do not set scc.

Signed-off-by: John Lu &lt;John.Lu@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[TableGen] Split *GenRegisterInfo.inc. (#167700)</title>
<updated>2025-11-14T16:30:51+00:00</updated>
<author>
<name>Ivan Kosarev</name>
<email>ivan.kosarev@amd.com</email>
</author>
<published>2025-11-14T16:30:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=71eaf14094f5750bf01bb7f26a5dd61afbbc74a5'/>
<id>71eaf14094f5750bf01bb7f26a5dd61afbbc74a5</id>
<content type='text'>
Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)</title>
<updated>2025-11-14T11:00:57+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2025-11-14T11:00:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=72c69aefbae8bfb087622e642acbd0cba7578747'/>
<id>72c69aefbae8bfb087622e642acbd0cba7578747</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Remove wrapper around TRI::getRegClass (#159885)</title>
<updated>2025-11-11T23:31:52+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-11-11T23:31:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e3a9ac5e24d08cef3160fe3e242a4afe1b6d95a4'/>
<id>e3a9ac5e24d08cef3160fe3e242a4afe1b6d95a4</id>
<content type='text'>
This shadows the member in the base class, but differs slightly
in behavior. The base method doesn't check for the invalid case.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This shadows the member in the base class, but differs slightly
in behavior. The base method doesn't check for the invalid case.</pre>
</div>
</content>
</entry>
<entry>
<title>CodeGen: Remove TRI argument from getRegClass (#158225)</title>
<updated>2025-11-10T23:43:55+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-11-10T23:43:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=55422e804b3bd2fcb1a330673af40240e359540f'/>
<id>55422e804b3bd2fcb1a330673af40240e359540f</id>
<content type='text'>
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][MachineVerifier] test failures in SIFoldOperands (#166600)</title>
<updated>2025-11-08T05:12:19+00:00</updated>
<author>
<name>Abhay Kanhere</name>
<email>abhay@kanhere.net</email>
</author>
<published>2025-11-08T05:12:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b4b57adb89b731953338073815c52e98d906aceb'/>
<id>b4b57adb89b731953338073815c52e98d906aceb</id>
<content type='text'>
After PR:https://github.com/llvm/llvm-project/pull/151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault &lt;arsenm2@gmail.com&gt;
Co-authored-by: Shilei Tian &lt;i@tianshilei.me&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After PR:https://github.com/llvm/llvm-project/pull/151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault &lt;arsenm2@gmail.com&gt;
Co-authored-by: Shilei Tian &lt;i@tianshilei.me&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Delete redundant recursive copy handling code (#157032)</title>
<updated>2025-11-06T02:01:12+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-11-06T02:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=67b6fd04dd464554bf82c52f7cac8160e7219c0f'/>
<id>67b6fd04dd464554bf82c52f7cac8160e7219c0f</id>
<content type='text'>
This fixes a regression exposed after
445415219708f9539801018e03282049ca33e0e2.
This introduces a few small regressions for true16. There are more cases
where the value can propagate through subregister extracts which need
new handling. They're also small enough that perhaps there's a way to
avoid needing to deal with this case in the first place.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes a regression exposed after
445415219708f9539801018e03282049ca33e0e2.
This introduces a few small regressions for true16. There are more cases
where the value can propagate through subregister extracts which need
new handling. They're also small enough that perhaps there's a way to
avoid needing to deal with this case in the first place.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272)</title>
<updated>2025-10-08T02:19:54+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-10-08T02:19:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1a5494ca4a7d2e6884e17c064e5215b34fbe4b40'/>
<id>1a5494ca4a7d2e6884e17c064e5215b34fbe4b40</id>
<content type='text'>
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.</pre>
</div>
</content>
</entry>
</feed>
