<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[RISCV] Support zilsd-4byte-align for i64 load/store in SelectionDAG. (#169182)</title>
<updated>2025-11-23T07:16:31+00:00</updated>
<author>
<name>Craig Topper</name>
<email>craig.topper@sifive.com</email>
</author>
<published>2025-11-23T07:16:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b9107bfc1faa8aa74e736169626e0cf7eb0925ba'/>
<id>b9107bfc1faa8aa74e736169626e0cf7eb0925ba</id>
<content type='text'>
I think we need to keep the SelectionDAG code for volatile load/store so
we should support 4 byte alignment when possible.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I think we need to keep the SelectionDAG code for volatile load/store so
we should support 4 byte alignment when possible.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)</title>
<updated>2025-11-22T22:03:14+00:00</updated>
<author>
<name>tyb0807</name>
<email>sontuan.vu119@gmail.com</email>
</author>
<published>2025-11-22T22:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=29d1e1857d445ca9a6e60c69fe2e1e5b30767e62'/>
<id>29d1e1857d445ca9a6e60c69fe2e1e5b30767e62</id>
<content type='text'>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated

Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)</title>
<updated>2025-11-21T19:33:13+00:00</updated>
<author>
<name>Nicolai Hähnle</name>
<email>nicolai.haehnle@amd.com</email>
</author>
<published>2025-11-21T19:33:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=69589dd2c0b34a664c24f7ffbb084d2eea848ab6'/>
<id>69589dd2c0b34a664c24f7ffbb084d2eea848ab6</id>
<content type='text'>
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Handle invariant when lowering global loads (#168914)</title>
<updated>2025-11-21T18:58:35+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-11-21T18:58:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f8bbb21fb9d25c18cad6e197eb29378f060975aa'/>
<id>f8bbb21fb9d25c18cad6e197eb29378f060975aa</id>
<content type='text'>
Global with invariant should be treated identically to
constant.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Global with invariant should be treated identically to
constant.</pre>
</div>
</content>
</entry>
<entry>
<title>[HLSL] Add Load overload with status (#166449)</title>
<updated>2025-11-21T18:11:38+00:00</updated>
<author>
<name>Joshua Batista</name>
<email>jbatista@microsoft.com</email>
</author>
<published>2025-11-21T18:11:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=fea070b610e0dc08447be60db7f13c150b2892d5'/>
<id>fea070b610e0dc08447be60db7f13c150b2892d5</id>
<content type='text'>
This PR adds a Load method for resources, which takes an additional
parameter by reference, status. It fills the status parameter with a 1
or 0, depending on whether or not the resource access was mapped.
CheckAccessFullyMapped is also added as an intrinsic, and called in the
production of this status bit.
Only addresses DXIL for the below issue:
https://github.com/llvm/llvm-project/issues/138910
Also only addresses the DXIL variant for the below issue:
https://github.com/llvm/llvm-project/issues/99204</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This PR adds a Load method for resources, which takes an additional
parameter by reference, status. It fills the status parameter with a 1
or 0, depending on whether or not the resource access was mapped.
CheckAccessFullyMapped is also added as an intrinsic, and called in the
production of this status bit.
Only addresses DXIL for the below issue:
https://github.com/llvm/llvm-project/issues/138910
Also only addresses the DXIL variant for the below issue:
https://github.com/llvm/llvm-project/issues/99204</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "[AMDGPU] Remove leftover implicit operands from SI_SPILL/SI_RESTORE." (#169068)</title>
<updated>2025-11-21T17:52:08+00:00</updated>
<author>
<name>Nathan Corbyn</name>
<email>n_corbyn@apple.com</email>
</author>
<published>2025-11-21T17:52:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4511c355c35153c6b8f5fd3d0b75f77c126fe8e6'/>
<id>4511c355c35153c6b8f5fd3d0b75f77c126fe8e6</id>
<content type='text'>
PR causes build failures with expensive checks enabled

Reverts llvm/llvm-project#168546</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PR causes build failures with expensive checks enabled

Reverts llvm/llvm-project#168546</pre>
</div>
</content>
</entry>
<entry>
<title>[RISCV] Incorporate scalar addends to extend vector multiply accumulate chains (#168660)</title>
<updated>2025-11-21T17:49:15+00:00</updated>
<author>
<name>Ryan Buchner</name>
<email>buchner.ryan@gmail.com</email>
</author>
<published>2025-11-21T17:49:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=39d4dfbe55cbea6ca7d506b8acd8455ed0443bf9'/>
<id>39d4dfbe55cbea6ca7d506b8acd8455ed0443bf9</id>
<content type='text'>
Previously, the following:
      %mul0 = mul nsw &lt;8 x i32&gt; %m00, %m01
      %mul1 = mul nsw &lt;8 x i32&gt; %m10, %m11
      %add0 = add &lt;8 x i32&gt; %mul0, splat (i32 32)
      %add1 = add &lt;8 x i32&gt; %add0, %mul1

    lowered to:
      vsetivli zero, 8, e32, m2, ta, ma
      vmul.vv v8, v8, v9
      vmacc.vv v8, v11, v10
      li a0, 32
      vadd.vx v8, v8, a0

    After this patch, now lowers to:
      li a0, 32
      vsetivli zero, 8, e32, m2, ta, ma
      vmv.v.x v12, a0
      vmadd.vv v8, v9, v12
      vmacc.vv v8, v11, v10

Modeled on 0cc981e0 from the AArch64 backend.

C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`):
```
int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) {
  for (int i = 0; i &lt; loop_bound; i += 2) {
    dst[i] = src[i] * a + src[i + 1] * b + 32;
  }
}
```</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, the following:
      %mul0 = mul nsw &lt;8 x i32&gt; %m00, %m01
      %mul1 = mul nsw &lt;8 x i32&gt; %m10, %m11
      %add0 = add &lt;8 x i32&gt; %mul0, splat (i32 32)
      %add1 = add &lt;8 x i32&gt; %add0, %mul1

    lowered to:
      vsetivli zero, 8, e32, m2, ta, ma
      vmul.vv v8, v8, v9
      vmacc.vv v8, v11, v10
      li a0, 32
      vadd.vx v8, v8, a0

    After this patch, now lowers to:
      li a0, 32
      vsetivli zero, 8, e32, m2, ta, ma
      vmv.v.x v12, a0
      vmadd.vv v8, v9, v12
      vmacc.vv v8, v11, v10

Modeled on 0cc981e0 from the AArch64 backend.

C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`):
```
int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) {
  for (int i = 0; i &lt; loop_bound; i += 2) {
    dst[i] = src[i] * a + src[i + 1] * b + 32;
  }
}
```</pre>
</div>
</content>
</entry>
<entry>
<title>[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)</title>
<updated>2025-11-21T16:16:41+00:00</updated>
<author>
<name>Sergei Barannikov</name>
<email>barannikov88@gmail.com</email>
</author>
<published>2025-11-21T16:16:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f56ddde410cd143792ce165958bc42fdcc9f7bb5'/>
<id>f56ddde410cd143792ce165958bc42fdcc9f7bb5</id>
<content type='text'>
ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).

Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.

After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.

Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).

Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.

After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.

Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.</pre>
</div>
</content>
</entry>
<entry>
<title>[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)</title>
<updated>2025-11-21T15:17:55+00:00</updated>
<author>
<name>Jay Foad</name>
<email>jay.foad@amd.com</email>
</author>
<published>2025-11-21T15:17:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0b6db777ba9821bc17b969ddf6fefee54519c4f4'/>
<id>0b6db777ba9821bc17b969ddf6fefee54519c4f4</id>
<content type='text'>
Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.

Fixes: #168761</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.

Fixes: #168761</pre>
</div>
</content>
</entry>
<entry>
<title>AMDGPU: Stop implementing shouldCoalesce (#168988)</title>
<updated>2025-11-21T15:10:35+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-11-21T15:10:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=bc323b609bd54747b8acda45d91a19f7a343a91b'/>
<id>bc323b609bd54747b8acda45d91a19f7a343a91b</id>
<content type='text'>
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.</pre>
</div>
</content>
</entry>
</feed>
