<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/X86/X86ISelLoweringCall.cpp, branch users/mingmingl-llvm/samplefdo-profile-format</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034)</title>
<updated>2025-09-05T14:08:59+00:00</updated>
<author>
<name>Phoebe Wang</name>
<email>phoebe.wang@intel.com</email>
</author>
<published>2025-09-05T14:08:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=94b164c21814ab74201aba39e060f3a876fcb335'/>
<id>94b164c21814ab74201aba39e060f3a876fcb335</id>
<content type='text'>
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.</pre>
</div>
</content>
</entry>
<entry>
<title>RuntimeLibcalls: Add entries for stackprotector globals (#154930)</title>
<updated>2025-08-23T01:21:00+00:00</updated>
<author>
<name>Matt Arsenault</name>
<email>Matthew.Arsenault@amd.com</email>
</author>
<published>2025-08-23T01:21:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=65d12622faa2a16f7e5d1fc6b101a670dbf26391'/>
<id>65d12622faa2a16f7e5d1fc6b101a670dbf26391</id>
<content type='text'>
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.</pre>
</div>
</content>
</entry>
<entry>
<title>[X86] Elect to tail call when `sret` ptr is passed to the callee</title>
<updated>2025-08-05T07:50:52+00:00</updated>
<author>
<name>Antonio Frighetto</name>
<email>me@antoniofrighetto.com</email>
</author>
<published>2025-08-05T07:06:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=00a648ab110583ed45cc931a0c1aabf4bfcd17e3'/>
<id>00a648ab110583ed45cc931a0c1aabf4bfcd17e3</id>
<content type='text'>
We may be able to allow the callee to be tail-called when the caller
expects a `sret` pointer argument, as long as this pointer is forwarded
to the callee.

Fixes: https://github.com/llvm/llvm-project/issues/146303.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We may be able to allow the callee to be tail-called when the caller
expects a `sret` pointer argument, as long as this pointer is forwarded
to the callee.

Fixes: https://github.com/llvm/llvm-project/issues/146303.
</pre>
</div>
</content>
</entry>
<entry>
<title>[llvm] Extract and propagate callee_type metadata</title>
<updated>2025-07-30T21:56:39+00:00</updated>
<author>
<name>Prabhu Rajasekaran</name>
<email>prabhukr@google.com</email>
</author>
<published>2025-07-30T21:56:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=17ccb849f349e246e7e7a02726ecb4210a26676f'/>
<id>17ccb849f349e246e7e7a02726ecb4210a26676f</id>
<content type='text'>
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
</pre>
</div>
</content>
</entry>
<entry>
<title>[X86] Align f128 and i128 to 16 bytes when passing on x86-32 (#138092)</title>
<updated>2025-07-17T09:30:36+00:00</updated>
<author>
<name>Trevor Gross</name>
<email>tgross@intrepidcs.com</email>
</author>
<published>2025-07-17T09:30:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a78a0f8d204393a0cce367b63395bad90311c1b8'/>
<id>a78a0f8d204393a0cce367b63395bad90311c1b8</id>
<content type='text'>
The i386 psABI specifies that `__float128` has 16 byte alignment and
must be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.

i386 does not specify an `__int128`, but it seems reasonable to keep the
same behavior as `__float128` so this is changed as well. There also
isn't a good way to distinguish whether a set of four registers came
from an integer or a float.

The main test demonstrating this change is `store_perturbed` in
`llvm/test/CodeGen/X86/i128-fp128-abi.ll`.

Referenced ABI:
https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: https://github.com/llvm/llvm-project/issues/77401</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The i386 psABI specifies that `__float128` has 16 byte alignment and
must be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.

i386 does not specify an `__int128`, but it seems reasonable to keep the
same behavior as `__float128` so this is changed as well. There also
isn't a good way to distinguish whether a set of four registers came
from an integer or a float.

The main test demonstrating this change is `store_perturbed` in
`llvm/test/CodeGen/X86/i128-fp128-abi.ll`.

Referenced ABI:
https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: https://github.com/llvm/llvm-project/issues/77401</pre>
</div>
</content>
</entry>
<entry>
<title>[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)</title>
<updated>2025-07-10T03:11:09+00:00</updated>
<author>
<name>Boyao Wang</name>
<email>wangboyao@bytedance.com</email>
</author>
<published>2025-07-10T03:11:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=697beb3f174c530de2af7124cfe2dcedea11c487'/>
<id>697beb3f174c530de2af7124cfe2dcedea11c487</id>
<content type='text'>
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).</pre>
</div>
</content>
</entry>
<entry>
<title>[Target] Prevent copying in loop variables (NFC)</title>
<updated>2025-06-28T11:29:00+00:00</updated>
<author>
<name>Jie Fu</name>
<email>jiefu@tencent.com</email>
</author>
<published>2025-06-28T11:29:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=feb61f5b0529a18ce819e9be91d8510bbdd737f7'/>
<id>feb61f5b0529a18ce819e9be91d8510bbdd737f7</id>
<content type='text'>
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:8: note: use reference type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const &amp;' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &amp;
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass)
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:8: note: use reference type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const &amp;' to prevent copying
  for (const auto [Reg, N] : RegsToPass)
       ^~~~~~~~~~~~~~~~~~~~~
                  &amp;
2 errors generated.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass) {
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2769:8: note: use reference type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const &amp;' to prevent copying
  for (const auto [Reg, N] : RegsToPass) {
       ^~~~~~~~~~~~~~~~~~~~~
                  &amp;
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:19: error: loop variable '[Reg, N]' creates a copy from type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const' [-Werror,-Wrange-loop-construct]
  for (const auto [Reg, N] : RegsToPass)
                  ^
/llvm-project/llvm/lib/Target/ARM/ARMISelLowering.cpp:2954:8: note: use reference type 'std::pair&lt;unsigned int, llvm::SDValue&gt; const &amp;' to prevent copying
  for (const auto [Reg, N] : RegsToPass)
       ^~~~~~~~~~~~~~~~~~~~~
                  &amp;
2 errors generated.
</pre>
</div>
</content>
</entry>
<entry>
<title>X86: Rename X86MCExpr::VK_ to X86::S_</title>
<updated>2025-06-28T05:22:43+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>i@maskray.me</email>
</author>
<published>2025-06-28T05:22:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e121f72c945f30b2f18f02317fe970b7fa2efdeb'/>
<id>e121f72c945f30b2f18f02317fe970b7fa2efdeb</id>
<content type='text'>
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).

Move constants to X86MCAsmInfo.h, with the goal of eventually removing
X86MCExpr.h.

Similar to #144633 for AArch64.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).

Move constants to X86MCAsmInfo.h, with the goal of eventually removing
X86MCExpr.h.

Similar to #144633 for AArch64.
</pre>
</div>
</content>
</entry>
<entry>
<title>[Target] Use range-based for loops (NFC) (#146198)</title>
<updated>2025-06-28T05:07:58+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-06-28T05:07:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=094a7087b83bb37544c1a7db4cc1841ee463140c'/>
<id>094a7087b83bb37544c1a7db4cc1841ee463140c</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[x64][win] Add compiler support for x64 import call optimization (equivalent to MSVC /d2guardretpoline) (#126631)</title>
<updated>2025-05-20T21:48:41+00:00</updated>
<author>
<name>Daniel Paoliello</name>
<email>danpao@microsoft.com</email>
</author>
<published>2025-05-20T21:48:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a414877a7a5f000d01370acb1162eb1dea87f48c'/>
<id>a414877a7a5f000d01370acb1162eb1dea87f48c</id>
<content type='text'>
This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.</pre>
</div>
</content>
</entry>
</feed>
