<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp, branch users/chapuni/cov/single/loop</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)</title>
<updated>2025-01-03T19:49:51+00:00</updated>
<author>
<name>Jun Wang</name>
<email>jwang86@yahoo.com</email>
</author>
<published>2025-01-03T19:49:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b2adeae8650fb720873ad7fa39153beaa8194afc'/>
<id>b2adeae8650fb720873ad7fa39153beaa8194afc</id>
<content type='text'>
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)</title>
<updated>2024-11-26T03:51:01+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-26T03:51:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=716364ebd6649aeca8658680ebb8b0424d028006'/>
<id>716364ebd6649aeca8658680ebb8b0424d028006</id>
<content type='text'>
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande &lt;Sirish.Pande@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande &lt;Sirish.Pande@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)</title>
<updated>2024-11-26T03:27:01+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-26T03:27:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=22503a9df16e8bf320c81ffbd3b4c70de45f8053'/>
<id>22503a9df16e8bf320c81ffbd3b4c70de45f8053</id>
<content type='text'>
Co-authored-by: Pravin Jagtap &lt;Pravin.Jagtap@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-authored-by: Pravin Jagtap &lt;Pravin.Jagtap@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (#117590)</title>
<updated>2024-11-26T03:20:51+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-26T03:20:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5dd48c4901c60f2a38aa4e78160cc72eafcbbc5b'/>
<id>5dd48c4901c60f2a38aa4e78160cc72eafcbbc5b</id>
<content type='text'>
Co-authored-by: Pravin Jagtap &lt;Pravin.Jagtap@amd.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-authored-by: Pravin Jagtap &lt;Pravin.Jagtap@amd.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)</title>
<updated>2024-11-23T17:27:47+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-23T17:27:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=cd20fc07720520856c385a12b0daa26b9d8a8e44'/>
<id>cd20fc07720520856c385a12b0daa26b9d8a8e44</id>
<content type='text'>
This is a refinement for the existing hack. With this,
the default target will have neither wavefrontsize feature
present, unless it was explicitly specified. That is,
getWavefrontSize() == 64 no longer implies +wavefrontsize64.
getWavefrontSize() == 32 does imply +wavefrontsize32.

Continue to assume the value is 64 with no wavesize feature.
This maintains the codegenable property without any code
that directly cares about the wavesize needing to worry about it.

Introduce an isWaveSizeKnown helper to check if we know the
wavesize is accurate based on having one of the features explicitly
set, or a known target-cpu.

I'm not sure what's going on in wave_any.s. It's testing what
happens when both wavesizes are enabled, but this is treated
as an error in codegen. We now treat wave32 as the winning
case, so some cases that were previously printed as vcc are now
vcc_lo.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a refinement for the existing hack. With this,
the default target will have neither wavefrontsize feature
present, unless it was explicitly specified. That is,
getWavefrontSize() == 64 no longer implies +wavefrontsize64.
getWavefrontSize() == 32 does imply +wavefrontsize32.

Continue to assume the value is 64 with no wavesize feature.
This maintains the codegenable property without any code
that directly cares about the wavesize needing to worry about it.

Introduce an isWaveSizeKnown helper to check if we know the
wavesize is accurate based on having one of the features explicitly
set, or a known target-cpu.

I'm not sure what's going on in wave_any.s. It's testing what
happens when both wavesizes are enabled, but this is treated
as an error in codegen. We now treat wave32 as the winning
case, so some cases that were previously printed as vcc are now
vcc_lo.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Move default wavesize hack for disassembler (#117422)</title>
<updated>2024-11-23T17:24:44+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-23T17:24:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8b087d64222276090702fb70dec5939eb22a017c'/>
<id>8b087d64222276090702fb70dec5939eb22a017c</id>
<content type='text'>
You cannot adjust the disassembler's subtarget. llvm-mc passes
the originally constructed MCSubtargetInfo around, rather than
querying the pointer in the disassembler instance.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
You cannot adjust the disassembler's subtarget. llvm-mc passes
the originally constructed MCSubtargetInfo around, rather than
querying the pointer in the disassembler instance.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)</title>
<updated>2024-11-21T16:51:58+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2024-11-21T16:51:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=01c9a14ccf98dba257bb36d9e9242b0bf5cdcaf2'/>
<id>01c9a14ccf98dba257bb36d9e9242b0bf5cdcaf2</id>
<content type='text'>
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)</title>
<updated>2024-11-20T16:33:04+00:00</updated>
<author>
<name>Brox Chen</name>
<email>guochen2@amd.com</email>
</author>
<published>2024-11-20T16:33:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9fb01fcd9fd5ccffa2421096e5e058156b86aa84'/>
<id>9fb01fcd9fd5ccffa2421096e5e058156b86aa84</id>
<content type='text'>
Support true16 format for VOP2 instructions in MC

This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16

It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support true16 format for VOP2 instructions in MC

This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16

It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634)</title>
<updated>2024-11-14T23:22:37+00:00</updated>
<author>
<name>Brox Chen</name>
<email>guochen2@amd.com</email>
</author>
<published>2024-11-14T23:22:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=abff8fe2a940212b1c43af2d86a68fc92849f019'/>
<id>abff8fe2a940212b1c43af2d86a68fc92849f019</id>
<content type='text'>
Update VInterp instructions with true16 and fake16 formats.

This patch includes instructions:
v_interp_p10_f16_f32
v_interp_p2_f16_f32
v_interp_p10_rtz_f16_f32
v_interp_p2_rtz_f16_f32

dasm test vinterp-fake16.txt is removed and the testline are merged into
vinterp.txt which handles both true16/fake16 cases</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update VInterp instructions with true16 and fake16 formats.

This patch includes instructions:
v_interp_p10_f16_f32
v_interp_p2_f16_f32
v_interp_p10_rtz_f16_f32
v_interp_p2_rtz_f16_f32

dasm test vinterp-fake16.txt is removed and the testline are merged into
vinterp.txt which handles both true16/fake16 cases</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Qualify auto. NFC. (#110878)</title>
<updated>2024-10-03T12:07:54+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2024-10-03T12:07:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8d13e7b8c382499c1cf0c2a3184b483e760f266b'/>
<id>8d13e7b8c382499c1cf0c2a3184b483e760f266b</id>
<content type='text'>
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)</pre>
</div>
</content>
</entry>
</feed>
