llvm-project.git/llvm/lib/Target, branch main

[RISCV] Support zilsd-4byte-align for i64 load/store in SelectionDAG. (#169182)

2025-11-23T07:16:31+00:00

I think we need to keep the SelectionDAG code for volatile load/store so
we should support 4 byte alignment when possible.

[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)

2025-11-22T22:03:14+00:00

- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.

AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)

2025-11-21T19:33:13+00:00

These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.

AMDGPU: Handle invariant when lowering global loads (#168914)

2025-11-21T18:58:35+00:00

Global with invariant should be treated identically to
constant.

[HLSL] Add Load overload with status (#166449)

2025-11-21T18:11:38+00:00

This PR adds a Load method for resources, which takes an additional
parameter by reference, status. It fills the status parameter with a 1
or 0, depending on whether or not the resource access was mapped.
CheckAccessFullyMapped is also added as an intrinsic, and called in the
production of this status bit.
Only addresses DXIL for the below issue:
https://github.com/llvm/llvm-project/issues/138910
Also only addresses the DXIL variant for the below issue:
https://github.com/llvm/llvm-project/issues/99204

Revert "[AMDGPU] Remove leftover implicit operands from SI_SPILL/SI_RESTORE." (#169068)

2025-11-21T17:52:08+00:00

PR causes build failures with expensive checks enabled

Reverts llvm/llvm-project#168546

[RISCV] Incorporate scalar addends to extend vector multiply accumulate chains (#168660)

2025-11-21T17:49:15+00:00

Previously, the following:
      %mul0 = mul nsw <8 x i32> %m00, %m01
      %mul1 = mul nsw <8 x i32> %m10, %m11
      %add0 = add <8 x i32> %mul0, splat (i32 32)
      %add1 = add <8 x i32> %add0, %mul1

    lowered to:
      vsetivli zero, 8, e32, m2, ta, ma
      vmul.vv v8, v8, v9
      vmacc.vv v8, v11, v10
      li a0, 32
      vadd.vx v8, v8, a0

    After this patch, now lowers to:
      li a0, 32
      vsetivli zero, 8, e32, m2, ta, ma
      vmv.v.x v12, a0
      vmadd.vv v8, v9, v12
      vmacc.vv v8, v11, v10

Modeled on 0cc981e0 from the AArch64 backend.

C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`):
```
int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) {
  for (int i = 0; i < loop_bound; i += 2) {
    dst[i] = src[i] * a + src[i + 1] * b + 32;
  }
}
```

[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)

2025-11-21T16:16:41+00:00

ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).

Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.

After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.

Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.

[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)

2025-11-21T15:17:55+00:00

Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.

Fixes: #168761

AMDGPU: Stop implementing shouldCoalesce (#168988)

2025-11-21T15:10:35+00:00

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.