llvm-project.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-11-20	TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)	Matt Arsenault
	Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.
2025-11-19	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)	Matt Arsenault
	Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.
2025-11-14	opt: Fix bad merge of #167996 (#168110)	Matt Arsenault
	After the base branch was moved to main, this somehow ended up adding a second definition of RTLCI, instead of modifying the existing one. Also fix other build error with gcc bots.
2025-11-14	RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996)	Matt Arsenault
	This fixes the -fveclib flag getting lost on its way to the backend. Previously this was its own cl::opt with a random boolean. Move the flag handling into CommandFlags with other backend ABI-ish options, and have clang directly set it, rather than forcing it to go through command line parsing. Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector function. Clang has special handling for TargetLibraryInfo, where it would directly construct one with the vector library in the pass pipeline. RuntimeLibcallsInfo currently is not used as an analysis in codegen, and needs to know the vector library when constructed. RuntimeLibraryAnalysis could follow the same trick that TargetLibraryInfo is using in the future, but a lot more boilerplate changes are needed to thread that analysis through codegen. Ideally this would come from an IR module flag, and nothing would be in TargetOptions. For now, it's better for all of these sorts of controls to be consistent.
2025-11-11	DAG: Use modf vector libcalls through RuntimeLibcalls (#166986)	Matt Arsenault
	Copy new process from sincos/sincospi
2025-11-11	DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984)	Matt Arsenault
	Copy new process from sincospi.
2025-11-11	Remove unused <iterator> inclusion	serge-sans-paille
	Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible this improves compilation time, while not being too intrusive on the codebase.
2025-11-10	RuntimeLibcalls: Add entries for vector sincospi functions (#166981)	Matt Arsenault
	Add libcall entries for sleef and armpl sincospi implementations. This is the start of adding the vector library functions; eventually they should all be tracked here. I'm starting with this case because this is a prerequisite to fix reporting sincospi calls which do not exist on any common targets without regressing vector codegen when these libraries are available.
2025-11-05	RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (#164987)	Matt Arsenault
	Introduce a new class for the TargetLowering usage. This tracks the subtarget specific lowering decisions for which libcall to use. RuntimeLibcallsInfo is a module level property, which may have multiple implementations of a particular libcall available. This attempts to be a minimum boilerplate patch to introduce the new concept. In the future we should have a tablegen way of selecting which implementations should be used for a subtarget. Currently we do have some conflicting implementations added, it just happens to work out that the default cases to prefer is alphabetically first (plus some of these still are using manual overrides in TargetLowering constructors).
2025-10-28	DAG: Consider __sincos_stret when deciding to form fsincos (#165169)	Matt Arsenault

2025-10-28	[PPC] Set minimum of largest number of comparisons to use bit test for ↵	Shimin Cui
	switch lowering (#155910) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\| (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
2025-10-24	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044)	Matt Arsenault
	All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.
2025-10-13	Wasm fmuladd relaxed (#163177)	Sam Parker
	Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.
2025-10-13	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)	Sam Parker
	Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.
2025-10-13	[WebAssembly] Lower fmuladd to madd and nmadd (#161355)	Sam Parker
	Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.
2025-09-02	[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850)	Daniel Paoliello
	As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.
2025-09-02	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes ↵	Sam Tebbs
	(#117007) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.
2025-08-27	[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication ↵	daniel-trujillo-bsc
	instructions. (#153033) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.
2025-08-23	RuntimeLibcalls: Add entries for stackprotector globals (#154930)	Matt Arsenault
	Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.
2025-08-13	[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)	Nikita Popov
	https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.
2025-08-12	PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744)	Stephen Long
	Similar to ab976a1, but for llvm.log.
2025-08-11	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)	Nikita Popov
	It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.
2025-08-08	[IR] Introduce the `ptrtoaddr` instruction	Alexander Richardson
	This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-07	[CodeGen] Remove an unnecessary cast (NFC) (#152441)	Kazu Hirata
	getActiveBits() already returns unsigned.
2025-08-07	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)	Nikita Popov
	The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).
2025-08-05	[LLVM][CGP] Allow finer control for sinking compares. (#151366)	Paul Walker
	Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.
2025-08-04	[DAG] Combine `store + vselect` to `masked_store` (#145176)	Abhishek Kaushik
	Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.
2025-07-29	[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638)	jeremyd2019
	Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.
2025-07-28	[CodeGen] More consistently expand float ops by default (#150597)	Nikita Popov
	These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-15	SafeStack: Check if __safestack_pointer_address is available (#147917)	Matt Arsenault
	Start using RuntimeLibcalls in the base implementation of getSafeStackPointerLocation instead of hardcoding the function names.
2025-07-10	TargetLowering: Avoid a use of PointerType::getUnqual (#147884)	Matt Arsenault
	Use the default globals address space
2025-07-09	RuntimeLibcalls: Remove table of soft float compare cond codes (#146082)	Matt Arsenault
	Previously we had a table of entries for every Libcall for the comparison to use against an integer 0 if it was a soft float compare function. This was only relevant to a handful of opcodes, so it was wasteful. Now that we can distinguish the abstract libcall for the compare with the concrete implementation, we can just directly hardcode the comparison against the libcall impl without this configuration system.
2025-07-09	DAG: Fall back to separate sin and cos when softening sincos (#147468)	Matt Arsenault
	Fix asserting in the error case.
2025-07-08	[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)	Dominik Steenken
	This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use `__builtin_canonicalize` for all `f(16\|32\|64\|128)` data types. Canonicalization is implemented here as multiplication by `1.0`, as suggested in [the docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-07	DAG: Add RTLIB::getPOW helper (#147274)	Matt Arsenault
	Co-authored-by: Paul Walker <paul.walker@arm.com>
2025-07-04	[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)	Austin
	Use llvm::fill instead of std::fill
2025-06-23	RuntimeLibcalls: Pass in ABI name from MCOptions (#144894)	Matt Arsenault
	ARM needs this to compute the available libcalls.
2025-06-19	RuntimeLibcalls: Pass in exception handling type (#144696)	Matt Arsenault
	All of the ABI options that influence libcall decisions need to be passed in.
2025-06-19	RuntimeLibcalls: Pass in FloatABI and EABI type (#144691)	Matt Arsenault
	We need the full set of ABI options to accurately compute the full set of libcalls. This partially resolves missing information required to compute the set of ARM calls.
2025-06-16	[TargetLowering][RISCV] Allow scalable non-simple EVTs to be split even if ↵	Craig Topper
	the element type isn't a legal scalar type. (#144007) This fixes an inconsistency in i64 vector handling between RV32 and RV64. Even if i64 isn't legal as a scalar, we should still be able to split a large i64 vector to get down to a legal vector type. We only need to give up if we need to split a vscale x 1 vector.
2025-05-27	IR: Make Module::getOrInsertGlobal() return a GlobalVariable.	Peter Collingbourne
	After pointer element types were removed this function can only return a GlobalVariable, so reflect that in the type and comments and clean up callers. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/141323
2025-04-23	[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933)	Nicholas Guy
	Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-04-14	[CodeGen] Prune headers and move code out of line for build efficiency, NFC ↵	Reid Kleckner
	(#135622) I noticed these destructors taking time with -ftime-trace and moved some of them for minor build efficiency improvements. The main impact of moving destructors out of line is that it avoids requiring container fields containing other types from being complete, i.e. one can have uptr<T> or vector<T> as a field with an incomplete type T, and that means we can reduce transitive includes, as with LegalizerInfo.h. Move expensive getDebugOperandsForReg template out-of-line. The std::function instantiation shows up in time trace even if you don't use the function.
2025-03-31	Fix crash lowering stack guard on OpenBSD/aarch64. (#125416)	3405691582
	TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.
2025-03-07	[RISCV][LibCall] Add libcall for i64 -> bf16 (#130024)	Jim Lin
	Add support for lowering i64 -> bf16 with libcall.
2025-02-18	[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)	James Chesterman
	Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.
2025-02-11	[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)	Benjamin Maxwell
	This makes the name more consistent with the other helpers.
2025-02-11	[IR] Add llvm.sincospi intrinsic (#125873)	Benjamin Maxwell
	This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f\|l]` functions being available in the target's runtime (e.g. libc).
2025-02-07	[IR] Add `llvm.modf` intrinsic (#121948)	Benjamin Maxwell
	This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.
2025-01-24	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec ↵	Stephen Long
	arg (#117568)