<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)</title>
<updated>2025-11-22T22:03:14+00:00</updated>
<author>
<name>tyb0807</name>
<email>sontuan.vu119@gmail.com</email>
</author>
<published>2025-11-22T22:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=29d1e1857d445ca9a6e60c69fe2e1e5b30767e62'/>
<id>29d1e1857d445ca9a6e60c69fe2e1e5b30767e62</id>
<content type='text'>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Track minNumAGPRs in MFI instead of mayUseAGPRs (#161996)</title>
<updated>2025-10-06T23:48:45+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-10-06T23:48:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=476a6ea9575f1a8498e080b4ace0700c3b91f9c1'/>
<id>476a6ea9575f1a8498e080b4ace0700c3b91f9c1</id>
<content type='text'>
Fix mfma agpr allocation failures with -O0. Previously we were getting
lucky
on cases that can use AV registers with the normal optimization
pipeline.

This logic needs to be consistent with getMaxNumVectorRegs,
as that is what getReservedRegs to determine the AGPR budget. In the
future we should directly check the minimum AGPR budget, and individual
selection patterns need to know the minimum budget required for them.

Start accounting for the number of AGPRs required to perform the
allocation. Refine the selection predicates to check this number is
available, and default to selecting the VGPR case if there aren't
enough. This also avoids register allocation failures for the largest
MFMAs with the default register budget.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix mfma agpr allocation failures with -O0. Previously we were getting
lucky
on cases that can use AV registers with the normal optimization
pipeline.

This logic needs to be consistent with getMaxNumVectorRegs,
as that is what getReservedRegs to determine the AGPR budget. In the
future we should directly check the minimum AGPR budget, and individual
selection patterns need to know the minimum budget required for them.

Start accounting for the number of AGPRs required to perform the
allocation. Refine the selection predicates to check this number is
available, and default to selecting the VGPR case if there aren't
enough. This also avoids register allocation failures for the largest
MFMAs with the default register budget.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Set TGID_EN_X/Y/Z when cluster ID intrinsics are used (#159120)</title>
<updated>2025-09-16T19:37:01+00:00</updated>
<author>
<name>Shilei Tian</name>
<email>i@tianshilei.me</email>
</author>
<published>2025-09-16T19:37:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8122ccdca9dd38d15927ba35d2c13fec1160320e'/>
<id>8122ccdca9dd38d15927ba35d2c13fec1160320e</id>
<content type='text'>
Hardware initializes a single value in ttmp9 which is either the
workgroup ID X or cluster ID X. Most of this patch is a refactoring to
use a single `PreloadedValue` enumerator for this value, instead of two
enumerators `WORKGROUP_ID_X` and `CLUSTER_ID_X` referring to the same
value.

This makes it simpler to have a single attribute
`amdgpu-no-workgroup-id-x` indicating that this value is not used, which
in turns sets the TGID_EN_X bit appropriately to tell the hardware
whether to initialize it.

All of the above applies to Y and Z similarly.

Fixes: LWPSCGFX13-568

Co-authored-by: Jay Foad &lt;jay.foad@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hardware initializes a single value in ttmp9 which is either the
workgroup ID X or cluster ID X. Most of this patch is a refactoring to
use a single `PreloadedValue` enumerator for this value, instead of two
enumerators `WORKGROUP_ID_X` and `CLUSTER_ID_X` referring to the same
value.

This makes it simpler to have a single attribute
`amdgpu-no-workgroup-id-x` indicating that this value is not used, which
in turns sets the TGID_EN_X bit appropriately to tell the hardware
whether to initialize it.

All of the above applies to Y and Z similarly.

Fixes: LWPSCGFX13-568

Co-authored-by: Jay Foad &lt;jay.foad@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Support lowering of cluster related instrinsics (#157978)</title>
<updated>2025-09-13T01:11:17+00:00</updated>
<author>
<name>Shilei Tian</name>
<email>i@tianshilei.me</email>
</author>
<published>2025-09-13T01:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1180c2ced008e33b0a4b2b91b3cb24724f06147c'/>
<id>1180c2ced008e33b0a4b2b91b3cb24724f06147c</id>
<content type='text'>
Since many code are connected, this also changes how workgroup id is lowered.

Co-authored-by: Jay Foad &lt;jay.foad@amd.com&gt;
Co-authored-by: Ivan Kosarev &lt;ivan.kosarev@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since many code are connected, this also changes how workgroup id is lowered.

Co-authored-by: Jay Foad &lt;jay.foad@amd.com&gt;
Co-authored-by: Ivan Kosarev &lt;ivan.kosarev@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Use subtarget call to determine number of VGPRs (#157927)</title>
<updated>2025-09-11T07:39:56+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>Stanislav.Mekhanoshin@amd.com</email>
</author>
<published>2025-09-11T07:39:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=d267fac3bcd35d726e45ebfa7b716ef9832e254f'/>
<id>d267fac3bcd35d726e45ebfa7b716ef9832e254f</id>
<content type='text'>
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.

Fixes: SWDEV-550425</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.

Fixes: SWDEV-550425</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Add missing static to cl::opt (#152747)</title>
<updated>2025-08-08T23:33:09+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-08-08T23:33:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0a0f077b9474f6b746418f38360ab28db6f5a1e1'/>
<id>0a0f077b9474f6b746418f38360ab28db6f5a1e1</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] AsmPrinter: Unify arg handling (#151672)</title>
<updated>2025-08-08T10:00:37+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-08-08T10:00:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a910a6a8b5f96aa247f079016c9553a0aa445ba4'/>
<id>a910a6a8b5f96aa247f079016c9553a0aa445ba4</id>
<content type='text'>
When computing the number of registers required by entry functions, the
`AMDGPUAsmPrinter` needs to take into account both the register usage
computed by the `AMDGPUResourceUsageAnalysis` pass, and the number
of registers initialized by the hardware. At the moment, the way it
computes the latter is different for graphics vs compute, due to differences in
the implementation. For kernels, all the information needed is available in
the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over
the `Function`  arguments in the `AMDGPUAsmPrinter`. This pretty much 
repeats some of the logic from instruction selection.

This patch introduces 2 new members to `SIMachineFunctionInfo`, one
for SGPRs and one for VGPRs. Both will be computed during instruction
selection and then used during `AMDGPUAsmPrinter`, removing the need
to refer to the `Function` when printing assembly.

This patch is NFC except for the fact that we now add the extra SGPRs
(VCC, XNACK etc) to the number of SGPRs computed for graphics entry points.
I'm not sure why these weren't included before. It would be nice if
someone could confirm if that was just an oversight or if we have some docs
somewhere that I haven't managed to find. Only one test is affected (its SGPR
usage increases because we now take into account the XNACK registers).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When computing the number of registers required by entry functions, the
`AMDGPUAsmPrinter` needs to take into account both the register usage
computed by the `AMDGPUResourceUsageAnalysis` pass, and the number
of registers initialized by the hardware. At the moment, the way it
computes the latter is different for graphics vs compute, due to differences in
the implementation. For kernels, all the information needed is available in
the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over
the `Function`  arguments in the `AMDGPUAsmPrinter`. This pretty much 
repeats some of the logic from instruction selection.

This patch introduces 2 new members to `SIMachineFunctionInfo`, one
for SGPRs and one for VGPRs. Both will be computed during instruction
selection and then used during `AMDGPUAsmPrinter`, removing the need
to refer to the `Function` when printing assembly.

This patch is NFC except for the fact that we now add the extra SGPRs
(VCC, XNACK etc) to the number of SGPRs computed for graphics entry points.
I'm not sure why these weren't included before. It would be nice if
someone could confirm if that was just an oversight or if we have some docs
somewhere that I haven't managed to find. Only one test is affected (its SGPR
usage increases because we now take into account the XNACK registers).</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Fix -amdgpu-mfma-vgpr-form flag on gfx908 (#150599)</title>
<updated>2025-07-25T10:49:56+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-07-25T10:49:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2b1ce25e21765a07f69c48196bd15239d98cae92'/>
<id>2b1ce25e21765a07f69c48196bd15239d98cae92</id>
<content type='text'>
This should be ignored since there are no VGPR forms. This
makes it possible to flip the default for the flag to true.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This should be ignored since there are no VGPR forms. This
makes it possible to flip the default for the flag to true.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] ISel &amp; PEI for whole wave functions (#145858)</title>
<updated>2025-07-21T08:39:09+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-07-21T08:39:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=20d8398825a799008ae508d8463dbb9b11df81e7'/>
<id>20d8398825a799008ae508d8463dbb9b11df81e7</id>
<content type='text'>
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault &lt;Matthew.Arsenault@amd.com&gt;
Co-authored-by: Shilei Tian &lt;i@tianshilei.me&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault &lt;Matthew.Arsenault@amd.com&gt;
Co-authored-by: Shilei Tian &lt;i@tianshilei.me&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Provide control to force VGPR MFMA form (#148079)</title>
<updated>2025-07-18T20:53:17+00:00</updated>
<author>
<name>Jeffrey Byrnes</name>
<email>jeffrey.byrnes@amd.com</email>
</author>
<published>2025-07-18T20:53:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=695660cdfd1ca65cd6e02e6950d10c990dfa0036'/>
<id>695660cdfd1ca65cd6e02e6950d10c990dfa0036</id>
<content type='text'>
This gives an override to the user to force select VGPR form of MFMA.
Eventually we will drop this in favor of compiler making better
decisions, but this provides a mechanism for users to address the cases
where MayNeedAGPRs favors the AGPR form and performance is degraded due
to poor RA.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This gives an override to the user to force select VGPR form of MFMA.
Eventually we will drop this in favor of compiler making better
decisions, but this provides a mechanism for users to address the cases
where MayNeedAGPRs favors the AGPR form and performance is degraded due
to poor RA.</pre>
</div>
</content>
</entry>
</feed>
