llvm-project.git/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp, branch users/chapuni/cov/single/loop

[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)

2025-01-03T19:49:51+00:00

For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.

AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)

2024-11-26T03:51:01+00:00

The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande

AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)

2024-11-26T03:27:01+00:00

Co-authored-by: Pravin Jagtap

AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (#117590)

2024-11-26T03:20:51+00:00

Co-authored-by: Pravin Jagtap

AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)

2024-11-23T17:27:47+00:00

This is a refinement for the existing hack. With this,
the default target will have neither wavefrontsize feature
present, unless it was explicitly specified. That is,
getWavefrontSize() == 64 no longer implies +wavefrontsize64.
getWavefrontSize() == 32 does imply +wavefrontsize32.

Continue to assume the value is 64 with no wavesize feature.
This maintains the codegenable property without any code
that directly cares about the wavesize needing to worry about it.

Introduce an isWaveSizeKnown helper to check if we know the
wavesize is accurate based on having one of the features explicitly
set, or a known target-cpu.

I'm not sure what's going on in wave_any.s. It's testing what
happens when both wavesizes are enabled, but this is treated
as an error in codegen. We now treat wave32 as the winning
case, so some cases that were previously printed as vcc are now
vcc_lo.

AMDGPU: Move default wavesize hack for disassembler (#117422)

2024-11-23T17:24:44+00:00

You cannot adjust the disassembler's subtarget. llvm-mc passes
the originally constructed MCSubtargetInfo around, rather than
querying the pointer in the disassembler instance.

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

2024-11-21T16:51:58+00:00

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.

[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)

2024-11-20T16:33:04+00:00

Support true16 format for VOP2 instructions in MC

This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16

It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately

[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634)

2024-11-14T23:22:37+00:00

Update VInterp instructions with true16 and fake16 formats.

This patch includes instructions:
v_interp_p10_f16_f32
v_interp_p2_f16_f32
v_interp_p10_rtz_f16_f32
v_interp_p2_rtz_f16_f32

dasm test vinterp-fake16.txt is removed and the testline are merged into
vinterp.txt which handles both true16/fake16 cases

[AMDGPU] Qualify auto. NFC. (#110878)

2024-10-03T12:07:54+00:00

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)