llvm-project.git/llvm/test/CodeGen/AMDGPU/srem64.ll, branch users/mingmingl-llvm/samplefdo-profile-format

AMDGPU: Allow folding multiple uses of some immediates into copies (#154757)

2025-09-05T23:22:09+00:00

In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.

[RFC][NFC][AMDGPU] Remove `-verify-machineinstrs` from `llvm/test/CodeGen/AMDGPU/*.ll` (#150024)

2025-07-23T17:42:46+00:00

Recent upstream trends have moved away from explicitly using `-verify-machineinstrs`, as it's already covered by the expensive checks. This PR removes almost all `-verify-machineinstrs` from tests in `llvm/test/CodeGen/AMDGPU/*.ll`, leaving only those tests where its removal currently causes failures.

[AMDGPU] Add freeze for LowerSELECT (#148796)

2025-07-18T05:29:33+00:00

Trying to solve https://github.com/llvm/llvm-project/issues/147635

Add freeze for legalizer when breaking i64 select to 2 i32 select.

Several tests changed, still need to investigate why.

---------

Co-authored-by: Shilei Tian

[PHIElimination] Revert #131837 #146320 #146337 (#146850)

2025-07-03T11:48:08+00:00

Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337

[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)

2025-06-29T18:28:42+00:00

The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.

AMDGPU: Try constant fold after folding immediate (#141862)

2025-06-10T02:44:44+00:00

This helps avoid some regressions in a future patch. The or 0
pattern appears in the division tests because the reduce 64-bit
bit operation to a 32-bit one with half identity value is only
implemented for constants. We could fix that by using computeKnownBits.
Additionally the pattern disappears if I optimize the IR division
expansion, so that IR should probably be emitted more optimally in
the first place.

[AMDGPU] Extend SRA i64 simplification for shift amts in range [33:62] (#138913)

2025-05-30T14:21:38+00:00

Extend sra i64 simplification to shift constants in range [33:62]. Shift
amounts 32 and 63 were already handled.
New testing for shift amts 33 and 62 added in sra.ll. Changes to other
test files were to adapt previous test results to this extension.

---------

Signed-off-by: John Lu

Reapply [AMDGPU] SIFixSgprCopies should not process twice VGPR to SGPR copies inserted by PHI preprocessing. (#135243)

2025-04-10T20:18:21+00:00

LIT tests which were incorrectly merged are corrected.

Revert "[AMDGPU] SIFixSgprCopies should not process twice VGPR to SGPR copies inserted by PHI preprocessing. (#134153)"

2025-04-10T17:20:01+00:00

This reverts commit 0563569978fee1e780a560494c89869074933f58.

Breaks tests, see comments on https://github.com/llvm/llvm-project/pull/134153

[AMDGPU] SIFixSgprCopies should not process twice VGPR to SGPR copies inserted by PHI preprocessing. (#134153)

2025-04-10T16:30:52+00:00

PHI operands and results must belong to the same register class.
If a PHI node produces an SGPR, but one of its operands is a VGPR, we
insert a VGPR-to-SGPR copy in the operand’s source block. The PHI
operand is then updated to use the destination register of the inserted
copy.

These inserted copies are processed immediately when they are created.
Therefore, we should avoid reprocessing them when handling their parent
block later.

---------

Co-authored-by: Matt Arsenault