<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU] Make use of getFunction and getMF. NFC. (#167872)</title>
<updated>2025-11-14T11:00:57+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2025-11-14T11:00:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=72c69aefbae8bfb087622e642acbd0cba7578747'/>
<id>72c69aefbae8bfb087622e642acbd0cba7578747</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (#165035)</title>
<updated>2025-10-31T19:21:59+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>Stanislav.Mekhanoshin@amd.com</email>
</author>
<published>2025-10-31T19:21:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=be2ae264dd6a590f7f4ba96949af3b4d220a1fad'/>
<id>be2ae264dd6a590f7f4ba96949af3b4d220a1fad</id>
<content type='text'>
Fixes: SWDEV-562450</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes: SWDEV-562450</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][True16][CodeGen] true16 isel pattern for fma_mix_f16/bf16 (#159648)</title>
<updated>2025-09-24T15:27:26+00:00</updated>
<author>
<name>Brox Chen</name>
<email>guochen2@amd.com</email>
</author>
<published>2025-09-24T15:27:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=934f802731887f5ed183d13f1b538414c518b004'/>
<id>934f802731887f5ed183d13f1b538414c518b004</id>
<content type='text'>
This patch includes:
1. fma_mix inst takes fp16 type as input, but place the operand in
vgpr32. Update selector to insert vgpr32 for true16 mode if necessary.
2. fma_mix inst returns fp16 type as output, but place the vdst in
vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it
to mix_lo/hi in the mc lowering pass.

These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve
code quality</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch includes:
1. fma_mix inst takes fp16 type as input, but place the operand in
vgpr32. Update selector to insert vgpr32 for true16 mode if necessary.
2. fma_mix inst returns fp16 type as output, but place the vdst in
vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it
to mix_lo/hi in the mc lowering pass.

These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve
code quality</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] High VGPR lowering on gfx1250 (#156965)</title>
<updated>2025-09-04T23:20:47+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>Stanislav.Mekhanoshin@amd.com</email>
</author>
<published>2025-09-04T23:20:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=1f0f3473e60a7f0ce13ce30994d8ca66cdb02326'/>
<id>1f0f3473e60a7f0ce13ce30994d8ca66cdb02326</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] gfx1250 64-bit relocations and fixups (#148951)</title>
<updated>2025-07-16T00:13:42+00:00</updated>
<author>
<name>Stanislav Mekhanoshin</name>
<email>rampitec@users.noreply.github.com</email>
</author>
<published>2025-07-16T00:13:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2d6534b7daa0483f11f84d218fa1dc65eee44a93'/>
<id>2d6534b7daa0483f11f84d218fa1dc65eee44a93</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Reland [AMDGPU] Support block load/store for CSR #130013 (#137169)</title>
<updated>2025-04-25T09:29:27+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-04-25T09:29:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5bad5d84a15a4f36b8ed5b33dde924683bcc9ea1'/>
<id>5bad5d84a15a4f36b8ed5b33dde924683bcc9ea1</id>
<content type='text'>
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).

This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).

This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[AMDGPU] Support block load/store for CSR" (#136846)</title>
<updated>2025-04-23T12:01:00+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-04-23T12:01:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=6bb2f90557fb2b4b216299cc2beb4afb641476aa'/>
<id>6bb2f90557fb2b4b216299cc2beb4afb641476aa</id>
<content type='text'>
Reverts llvm/llvm-project#130013 due to failures with expensive checks
on.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reverts llvm/llvm-project#130013 due to failures with expensive checks
on.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Support block load/store for CSR (#130013)</title>
<updated>2025-04-23T08:33:36+00:00</updated>
<author>
<name>Diana Picus</name>
<email>Diana-Magda.Picus@amd.com</email>
</author>
<published>2025-04-23T08:33:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4a58071d87265dfccba72134b25cf4d1595d98c5'/>
<id>4a58071d87265dfccba72134b25cf4d1595d98c5</id>
<content type='text'>
Add support for using the existing `SCRATCH_STORE_BLOCK` and
`SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`.
It does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in `SIFrameLowering`. We also add new pseudos,
`SI_BLOCK_SPILL_V1024_SAVE/RESTORE`.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for using the existing `SCRATCH_STORE_BLOCK` and
`SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`.
It does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in `SIFrameLowering`. We also add new pseudos,
`SI_BLOCK_SPILL_V1024_SAVE/RESTORE`.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).</pre>
</div>
</content>
</entry>
<entry>
<title>Move relocation specifiers to AMDGPUMCExpr::Specifier</title>
<updated>2025-03-30T19:12:38+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>i@maskray.me</email>
</author>
<published>2025-03-30T19:12:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5a3d4036cff159e32aa4ab1b11fd6a25a50a456c'/>
<id>5a3d4036cff159e32aa4ab1b11fd6a25a50a456c</id>
<content type='text'>
Similar to previous migration done for all other ELF targets.
Switch from the confusing `VariantKind` to `Specifier`, which aligns
with Arm and IBM AIX's documentation.

Moving forward, relocation specifiers should be integrated into
AMDGPUMCExpr rather than MCSymbolRefExpr::SubclassData.

(Note: the term AMDGPUMCExpr::VariantKind is for expressions
without relocation specifiers:
https://github.com/llvm/llvm-project/pull/82022

It's up to AMDGPU maintainers to integrate these constants into Specifier.
)

Pull Request: https://github.com/llvm/llvm-project/pull/133608
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similar to previous migration done for all other ELF targets.
Switch from the confusing `VariantKind` to `Specifier`, which aligns
with Arm and IBM AIX's documentation.

Moving forward, relocation specifiers should be integrated into
AMDGPUMCExpr rather than MCSymbolRefExpr::SubclassData.

(Note: the term AMDGPUMCExpr::VariantKind is for expressions
without relocation specifiers:
https://github.com/llvm/llvm-project/pull/82022

It's up to AMDGPU maintainers to integrate these constants into Specifier.
)

Pull Request: https://github.com/llvm/llvm-project/pull/133608
</pre>
</div>
</content>
</entry>
<entry>
<title>[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel</title>
<updated>2025-03-29T18:08:13+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>i@maskray.me</email>
</author>
<published>2025-03-29T18:08:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=fe6fb910df9d1b9a9e2e7a6e8228d020668e0129'/>
<id>fe6fb910df9d1b9a9e2e7a6e8228d020668e0129</id>
<content type='text'>
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: https://github.com/llvm/llvm-project/pull/132569
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: https://github.com/llvm/llvm-project/pull/132569
</pre>
</div>
</content>
</entry>
</feed>
