llvm-project.git/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp, branch users/chapuni/cov/single/unify

[AMDGPU] Do not fold into v_accvpr_mov/write/read (#120475)

2025-01-07T15:25:01+00:00

In SIFoldOperands, leave copies for moving between agpr and vgpr
registers. The register coalescer is able to handle the copies
more efficiently than v_accvgpr_mov, v_accvgpr_write, and
v_accvgpr_read. Otherwise, the compiler generates unneccesary
instructions such as v_accvgpr_mov a0, a0.

[AMDGPU][True16][MC] true16 for v_fma_f16 (#119477)

2025-01-06T20:02:04+00:00

Support true16 format for v_fma_f16 in MC.

Since we are replacing v_fma_f16 to v_fma_f16_t16/v_fma_f16_fake16 in
Post-GFX11, have to update the CodeGen pattern for v_fma_f16_fake16 to
get CodeGen test passing. There is no pattern modified/created, but just
replacing the v_fma_f16 with fake16 format.

AMDGPU: Do not fold copy to physreg from operation on frame index (#115977)

2024-11-13T05:35:51+00:00

AMDGPU: Fold more scalar operations on frame index to VALU (#115059)

2024-11-08T03:02:20+00:00

Further extend workaround for the lack of proper regbankselect
for frame indexes.

AMDGPU: Fold copy of scalar add of frame index (#115058)

2024-11-06T17:10:58+00:00

This is a pre-optimization to avoid a regression in a future
commit. Currently we almost always emit frame index with
a v_mov_b32 and use vector adds for the pointer operations. We
need to consider the users of the frame index (or rather, the
transitive users of derived pointer operations) to know whether
the value will be used in a vector or scalar context. This saves
an sgpr->vgpr copy.

This optimization could be more general for any opcode that's
trivially convertible from a scalar to vector form (although this
is a workaround for a proper regbankselect).

[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)

2024-11-05T21:12:49+00:00

Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file

[AMDGPU] Fix machine verification failure after SIFoldOperandsImpl::tryFoldOMod (#113544)

2024-10-29T14:59:37+00:00

Fixes #54201

AMDGPU: Handle folding frame indexes into add with immediate (#110738)

2024-10-19T19:33:03+00:00

AMDGPU/NewPM: Port SIFoldOperands to new pass manager (#105801)

2024-08-29T06:04:54+00:00

[AMDGPU][True16][CodeGen] support v_mov_b16 and v_swap_b16 in true16 format (#102198)

2024-08-08T20:52:59+00:00

support v_swap_b16 in true16 format.
update tableGen pattern and folding for v_mov_b16.

---------

Co-authored-by: guochen2