llvm-project.git/llvm/test/CodeGen/AMDGPU/function-returns.ll, branch main

[AMDGPU] Ensure divergence for v_alignbit (#129159)

2025-09-26T17:55:56+00:00

Selecting vgpr for the uniform version of this pattern may lead to
unnecessary vgpr and waterfall loops.

[RFC][NFC][AMDGPU] Remove `-verify-machineinstrs` from `llvm/test/CodeGen/AMDGPU/*.ll` (#150024)

2025-07-23T17:42:46+00:00

Recent upstream trends have moved away from explicitly using `-verify-machineinstrs`, as it's already covered by the expensive checks. This PR removes almost all `-verify-machineinstrs` from tests in `llvm/test/CodeGen/AMDGPU/*.ll`, leaving only those tests where its removal currently causes failures.

[AMDGPU][True16][CodeGen] stop emitting spgr_lo16 from isel (#144819)

2025-07-09T20:17:14+00:00

When true16 is enabled, isel start to emit sgpr_lo16 register when a
trunc/sext i16/i32 is generated, or a salu32 is used by vgpr16 or vice
versa. And this causes a problem as sgpr_lo16 is not fully supported in
the pipeline.

True16 mode works fine in -O3 mode since folding pass remove sgpr_lo16
from the pipeline. However it hit a problem in -O0 mode as folding pass
is skipped.

This patch did:
1. stop emitting sgpr_lo16 from isel
2. update codegen pattern to split uniformed/divergent pattern for
i16/i32 conversion
3. update fix-sgpr-copy pass to address legalization requirement in
true16 mode, update fix-sgpr-copies-f16-true16.mir
test to include all possible combinations

This patch is tested with cts and downstream repo with -O0 testing

[AMDGPU][True16][CodeGen] update GFX11Plus codegen test with true16 flag (#135078)

2025-04-23T17:06:52+00:00

This is a NFC patch.

This patch run a bulk update on CodeGen tests that are impacted by the
true16 features. This patch applies:
1. duplicate GFX11plus runlines and apply them with
"+mattr=+real-true16" and "+mattr=-real-true16"
2. update the test with the update script

For some GISEL runlines, the current CodeGen do not fully support the
true16 version. Still update the runlines, but comment out the failing
one, and added a "FIXME-TRUE16" comment to that test for easier
tracking. These test will be fixed in the following patches.

This is in a transition state that we support both
"+real-true16/-real-true16" in our code base. We plan to move to
"+real-true16" as default, and finally remove "-real-true16" mode and
test lines.

AMDGPU: Replace ptr addrspace(4) undef uses with poison in tests (#131095)

2025-03-14T02:47:54+00:00

AMDGPU: Replace ptr addrspace(3) undef in tests with poison (#131049)

2025-03-13T06:28:55+00:00

AMDGPU: Replace ptr addrspace(1) undefs with poison (#130900)

2025-03-13T01:25:02+00:00

Many tests use store to undef as a placeholder use, so just replace
all of these with poison.

AMDGPU: Replace insertelement poison with insertelement undef (#130896)

2025-03-12T13:33:33+00:00

This is the bulk update with perl, with cases which require additional
update left for later.

AMDGPU: Replace undef with poison in tests using insertvalue (#130895)

2025-03-12T09:11:11+00:00

perl -p -i -e 's/insertvalue (.*) undef/insertvalue \1 poison/g'

[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748)

2025-01-23T15:07:57+00:00

Occupancy (i.e., the number of waves per EU) depends, in addition to
register usage, on per-workgroup LDS usage as well as on the range of
possible workgroup sizes. Mirroring the latter, occupancy should
therefore be expressed as a range since different group sizes generally
yield different achievable occupancies.

`getOccupancyWithLocalMemSize` currently returns a scalar occupancy
based on the maximum workgroup size and LDS usage. With respect to the
workgroup size range, this scalar can be the minimum, the maximum, or
neither of the two of the range of achievable occupancies. This commit
fixes the function by making it compute and return the range of
achievable occupancies w.r.t. workgroup size and LDS usage; it also
renames it to `getOccupancyWithWorkGroupSizes` since it is the range of
workgroup sizes that produces the range of achievable occupancies.

Computing the achievable occupancy range is surprisingly involved.
Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum
occupancies i.e., sometimes workgroup sizes inside the range yield the
occupancy bounds. The implementation finds these sizes in constant time;
heavy documentation explains the rationale behind the sometimes
relatively obscure calculations.

As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU,
64-wide waves. Also consider a function with no LDS usage and a flat
workgroup size range of [513,1024].

- A group of 513 items requires 9 waves per group. Only 4 groups made up
of 9 waves each can fit fully on a CU at any given time, for a total of
36 waves on the CU, or 9 per EU. However, filling as much as possible
the remaining 40-36=4 wave slots without decreasing the number of groups
reveals that a larger group of 640 items yields 40 waves on the CU, or
10 per EU.
- Similarly, a group of 1024 items requires 16 waves per group. Only 2
groups made up of 16 waves each can fit fully on a CU ay any given time,
for a total of 32 waves on the CU, or 8 per EU. However, removing as
many waves as possible from the groups without being able to fit another
equal-sized group on the CU reveals that a smaller group of 896 items
yields 28 waves on the CU, or 7 per EU.

Therefore the achievable occupancy range for this function is not [8,9]
as the group size bounds directly yield, but [7,10].

Naturally this change causes a lot of test churn as instruction
scheduling is driven by achievable occupancy estimates. In most unit
tests the flat workgroup size range is the default [1,1024] which,
ignoring potential LDS limitations, would previously produce a scalar
occupancy of 8 (derived from 1024) on a lot of targets, whereas we now
consider the maximum occupancy to be 10 in such cases. Most tests are
updated automatically and checked manually for sanity. I also manually
changed some non-automatically generated assertions when necessary.

Fixes #118220.