<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, branch users/fmayer/spr/main.selectiondag-allow-kcfi-on-invoke</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[𝘀𝗽𝗿] changes to main this commit is based on</title>
<updated>2025-07-14T22:12:01+00:00</updated>
<author>
<name>Florian Mayer</name>
<email>fmayer@google.com</email>
</author>
<published>2025-07-14T22:12:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=eb39f2414e4cb4b9ada7998df3f73969141ccff4'/>
<id>eb39f2414e4cb4b9ada7998df3f73969141ccff4</id>
<content type='text'>
Created using spr 1.3.4

[skip ci]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Created using spr 1.3.4

[skip ci]
</pre>
</div>
</content>
</entry>
<entry>
<title>[SDAG] Prefer scalar for prefix of vector GEP expansion (#146719)</title>
<updated>2025-07-03T01:16:27+00:00</updated>
<author>
<name>Philip Reames</name>
<email>preames@rivosinc.com</email>
</author>
<published>2025-07-03T01:16:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=220a00239696257a02fe625a4819fcd038e9dd07'/>
<id>220a00239696257a02fe625a4819fcd038e9dd07</id>
<content type='text'>
When generating SDAG for a getelementptr with a vector result, we were
previously generating splats for each scalar operand. This essentially
has the effect of aggressively vectorizing the sequence, and leaving it
later combines to scalarize if profitable.

Instead, we can keep the accumulating address as a scalar for as long as
the prefix of operands allows before lazily converting to vector on the
first vector operand. This both better fits hardware which frequently
has a scalar base on the scatter/gather instructions, and reduces the
addressing cost even when not as otherwise we end up with a scalar to
vector domain crossing for each scalar operand.

Note that constant splat offsets are treated as scalar for the above,
and only variable offsets can force a conversion to vector.

---------

Co-authored-by: Craig Topper &lt;craig.topper@sifive.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When generating SDAG for a getelementptr with a vector result, we were
previously generating splats for each scalar operand. This essentially
has the effect of aggressively vectorizing the sequence, and leaving it
later combines to scalarize if profitable.

Instead, we can keep the accumulating address as a scalar for as long as
the prefix of operands allows before lazily converting to vector on the
first vector operand. This both better fits hardware which frequently
has a scalar base on the scatter/gather instructions, and reduces the
addressing cost even when not as otherwise we end up with a scalar to
vector domain crossing for each scalar operand.

Note that constant splat offsets are treated as scalar for the above,
and only variable offsets can force a conversion to vector.

---------

Co-authored-by: Craig Topper &lt;craig.topper@sifive.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>DAG: Move get_dynamic_area_offset type check to IR verifier (#145268)</title>
<updated>2025-06-24T02:11:52+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-06-24T02:11:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f0d898f36bceff94d40e2022814978fa5fd0cc8f'/>
<id>f0d898f36bceff94d40e2022814978fa5fd0cc8f</id>
<content type='text'>
Also fix the LangRef to match the implementation. This was checking
against the alloca address space size rather than the default address
space.

The check was also more permissive than the LangRef. The error
check permitted any size less than the pointer size; follow the
stricter wording of the LangRef.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also fix the LangRef to match the implementation. This was checking
against the alloca address space size rather than the default address
space.

The check was also more permissive than the LangRef. The error
check permitted any size less than the pointer size; follow the
stricter wording of the LangRef.</pre>
</div>
</content>
</entry>
<entry>
<title>[RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel (#144702)</title>
<updated>2025-06-18T15:04:18+00:00</updated>
<author>
<name>Orlando Cazalet-Hyams</name>
<email>orlando.hyams@sony.com</email>
</author>
<published>2025-06-18T15:04:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=36038a1048b2aab87ed18f982e960c044ad97670'/>
<id>36038a1048b2aab87ed18f982e960c044ad97670</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)</title>
<updated>2025-06-17T14:55:14+00:00</updated>
<author>
<name>Jeremy Morse</name>
<email>jeremy.morse@sony.com</email>
</author>
<published>2025-06-17T14:55:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9eb0020555fc643582b2802abb8c1bc92059c248'/>
<id>9eb0020555fc643582b2802abb8c1bc92059c248</id>
<content type='text'>
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov &lt;github@npopov.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov &lt;github@npopov.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[LLVM][CodeGen] Lower ConstantInt vectors like shufflevector base splats. (#144395)</title>
<updated>2025-06-17T10:09:22+00:00</updated>
<author>
<name>Paul Walker</name>
<email>paul.walker@arm.com</email>
</author>
<published>2025-06-17T10:09:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=465e3ce9f10019db071dc7794ae9ab22f9fc76f7'/>
<id>465e3ce9f10019db071dc7794ae9ab22f9fc76f7</id>
<content type='text'>
ConstantInt vectors utilise DAG.getConstant() when constructing the
initial DAG. This can have the effect of legalising the constant before
the DAG combiner is run, significant altering the generated code. To
mitigate this (hopefully as a temporary measure) we instead try to
construct the DAG in the same way as shufflevector based splats.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ConstantInt vectors utilise DAG.getConstant() when constructing the
initial DAG. This can have the effect of legalising the constant before
the DAG combiner is run, significant altering the generated code. To
mitigate this (hopefully as a temporary measure) we instead try to
construct the DAG in the same way as shufflevector based splats.</pre>
</div>
</content>
</entry>
<entry>
<title>[CodeGen] Inline stack guard check on Windows (#136290)</title>
<updated>2025-06-12T14:38:42+00:00</updated>
<author>
<name>Omair Javaid</name>
<email>omair.javaid@linaro.org</email>
</author>
<published>2025-06-12T14:38:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e1e1836bbd70e4f30bd0be97b9d81eabfd6b45c8'/>
<id>e1e1836bbd70e4f30bd0be97b9d81eabfd6b45c8</id>
<content type='text'>
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR #121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR #121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.</pre>
</div>
</content>
</entry>
<entry>
<title>[SelectionDAG][X86] Handle `llvm.type.test` in DAGBuilder (#142939)</title>
<updated>2025-06-09T05:14:55+00:00</updated>
<author>
<name>Abhishek Kaushik</name>
<email>abhishek.kaushik@intel.com</email>
</author>
<published>2025-06-09T05:14:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=d6ecd6a658103adc76cf67de64e0b5b399e6fcef'/>
<id>d6ecd6a658103adc76cf67de64e0b5b399e6fcef</id>
<content type='text'>
Closes #142937</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Closes #142937</pre>
</div>
</content>
</entry>
<entry>
<title>[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613)</title>
<updated>2025-06-04T07:33:05+00:00</updated>
<author>
<name>Nikita Popov</name>
<email>npopov@redhat.com</email>
</author>
<published>2025-06-04T07:33:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b3ce9883f32e3b5b16e2b5fa54c3d98e85b66869'/>
<id>b3ce9883f32e3b5b16e2b5fa54c3d98e85b66869</id>
<content type='text'>
Replace the asserts with reportFatalUsageError(), as these can be
reached with invalid user-provided IR.

Fixes https://github.com/llvm/llvm-project/issues/142531.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace the asserts with reportFatalUsageError(), as these can be
reached with invalid user-provided IR.

Fixes https://github.com/llvm/llvm-project/issues/142531.</pre>
</div>
</content>
</entry>
<entry>
<title>[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)</title>
<updated>2025-06-03T07:44:13+00:00</updated>
<author>
<name>Simon Tatham</name>
<email>simon.tatham@arm.com</email>
</author>
<published>2025-06-03T07:44:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=56acb06bc6a1bbde6b1f1a3aa3633d049f1821dc'/>
<id>56acb06bc6a1bbde6b1f1a3aa3633d049f1821dc</id>
<content type='text'>
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.</pre>
</div>
</content>
</entry>
</feed>
