summaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)Author
2025-11-22[RISCV] Support zilsd-4byte-align for i64 load/store in SelectionDAG. (#169182)Craig Topper
I think we need to keep the SelectionDAG code for volatile load/store so we should support 4 byte alignment when possible.
2025-11-23Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg ↵Aiden Grossman
(#168661)" This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc. This caused a couple test failures, likely due to a mid-air collision. Reverting for now to get the tree back to green and allow the original author to run UTC/friends and verify the output.
2025-11-23[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)hstk30-hw
This maybe a bug which is introduced by commit 6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever since. In this case, `OtherReg` always overlaps with `DstReg` cause they from the `Copy` all.
2025-11-22[llvm] Use llvm::equal (NFC) (#169173)Kazu Hirata
While I am at it, this patch uses const l-value references for std::shared_ptr. We don't need to increment the reference count by passing std::shared_ptr by value. Identified with llvm-use-ranges.
2025-11-22[llvm] Remove unused local variables (NFC) (#169171)Kazu Hirata
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-22[VPlan] Share PreservesUniformity logic between isSingleScalar and ↵Florian Hahn
isUniformAcrossVFsAndUFs Extract the PreservesUniformity logic from isSingleScalar into a shared static helper function. Update isUniformAcrossVFsAndUFs to use this logic for VPWidenRecipe and VPInstruction, so that any opcode that preserves uniformity is considered uniform-across-vf-and-uf if its operands are. This unifies the uniformity checking logic and makes it easier to extend in the future. This should effectively by NFC currently.
2025-11-22[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)tyb0807
- Support serialization of the number of allocated preload kernarg SGPRs - Support serialization of the first preload kernarg SGPR allocated Together they enable reconstructing correctly MIR with preload kernarg SGPRs.
2025-11-22[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099)Florian Hahn
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction. PR: https://github.com/llvm/llvm-project/pull/166099
2025-11-22[DFAJumpThreading] Try harder to avoid cycles in paths. (#169151)Usman Nadeem
If a threading path has cycles within it then the transformation is not correct. This patch fixes a couple of cases that create such cycles. Fixes https://github.com/llvm/llvm-project/issues/166868
2025-11-22[InstCombine] Generalize trunc-shift-icmp fold from (1 << Y) to (Pow2 << Y) ↵Pedro Lobo
(#169163) Extends the `icmp(trunc(shl))` fold to handle any power of 2 constant as the shift base, not just 1. This generalizes the following patterns by adjusting the comparison offsets by `log2(Pow2)`. ```llvm (trunc (1 << Y) to iN) == 0 --> Y u>= N (trunc (1 << Y) to iN) != 0 --> Y u< N (trunc (1 << Y) to iN) == 2**C --> Y == C (trunc (1 << Y) to iN) != 2**C --> Y != C ; to (trunc (Pow2 << Y) to iN) == 0 --> Y u>= N - log2(Pow2) (trunc (Pow2 << Y) to iN) != 0 --> Y u< N - log2(Pow2) (trunc (Pow2 << Y) to iN) == 2**C --> Y == C - log2(Pow2) (trunc (Pow2 << Y) to iN) != 2**C --> Y != C - log2(Pow2) ``` Proof: https://alive2.llvm.org/ce/z/2zwTkp
2025-11-22Add new llvm.dbg.declare_value intrinsic. (#168132)Shubham Sandeep Rastogi
For swift async code, we need to use a debug intrinsic that behaves like an llvm.dbg.declare but can take any location type rather than just a pointer or integer. To solve this, a new debug instrinsic called llvm.dbg.declare_value has been created, which behaves exactly like an llvm.dbg.declare but can take non pointer and integer location types. More information here: https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269 This is the first patch as part of a stack of patches, with the one succeeding it being: https://github.com/llvm/llvm-project/pull/168134
2025-11-22[CallBrPrepare] Prefer Function &F over Function &FnAiden Grossman
Function &F is the more standard abbreviation (~4000 uses in llvm versus ~300 uses).
2025-11-21[SCEVExp] Remove early exit, rely on InstSimplifyFolder (NFCI).Florian Hahn
Remove the SCEV-based check refined in https://github.com/llvm/llvm-project/pull/156910, as InstSimplifyFolder manages to simplify the generated code to false directly as well.
2025-11-21[DA] remove Constraints class (#168963)Sebastian Pop
Remove the Constraints class from dependence analysis as it is now unused.
2025-11-22[DAGCombiner] Don't optimize insert_vector_elt into shuffle if implicit ↵Hongyu Chen
truncation exists (#169022) Fixes #169017
2025-11-21AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)Nicolai Hähnle
These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.
2025-11-21[profcheck] Propagate profile metadata to Wrapper function in optimize mode ↵Jin Huang
of ExpandVariadic. (#168161) This PR fixes the issue where profile metadata (`!prof`) is dropped from the `VariadicWrapper` when `ExpandVariadics` runs in `--expand-variadics-override=optimize` mode. In optimize mode, the pass splits the original variadic function into two parts: - A **VariadicWrapper** (retaining the original name) that handles the `va_list` setup. - A **FixedArityReplacement** (new function) that contains the original core logic. During this process, the basic blocks and associated metadata are spliced into the `FixedArityReplacement`. Consequently, the `VariadicWrapper`—which serves as the entry point for callers—is left without function entry count metadata. This change explicitly copies the `MD_prof` metadata from the `FixedArityReplacement` back to the `VariadicWrapper` after the split is defined. Co-authored-by: Jin Huang <jingold@google.com>
2025-11-21AMDGPU: Handle invariant when lowering global loads (#168914)Matt Arsenault
Global with invariant should be treated identically to constant.
2025-11-21Revert "[ORC] Tailor ELF debugger support plugin to load-address patching ↵Stefan Gränitz
only" (#169073) Reverts llvm/llvm-project#168518
2025-11-21[HLSL] Add Load overload with status (#166449)Joshua Batista
This PR adds a Load method for resources, which takes an additional parameter by reference, status. It fills the status parameter with a 1 or 0, depending on whether or not the resource access was mapped. CheckAccessFullyMapped is also added as an intrinsic, and called in the production of this status bit. Only addresses DXIL for the below issue: https://github.com/llvm/llvm-project/issues/138910 Also only addresses the DXIL variant for the below issue: https://github.com/llvm/llvm-project/issues/99204
2025-11-21[Support] Use range-based for loops (NFC) (#169001)Kazu Hirata
Identified with modernize-loop-convert.
2025-11-21Revert "[AMDGPU] Remove leftover implicit operands from ↵Nathan Corbyn
SI_SPILL/SI_RESTORE." (#169068) PR causes build failures with expensive checks enabled Reverts llvm/llvm-project#168546
2025-11-21[RISCV] Incorporate scalar addends to extend vector multiply accumulate ↵Ryan Buchner
chains (#168660) Previously, the following: %mul0 = mul nsw <8 x i32> %m00, %m01 %mul1 = mul nsw <8 x i32> %m10, %m11 %add0 = add <8 x i32> %mul0, splat (i32 32) %add1 = add <8 x i32> %add0, %mul1 lowered to: vsetivli zero, 8, e32, m2, ta, ma vmul.vv v8, v8, v9 vmacc.vv v8, v11, v10 li a0, 32 vadd.vx v8, v8, a0 After this patch, now lowers to: li a0, 32 vsetivli zero, 8, e32, m2, ta, ma vmv.v.x v12, a0 vmadd.vv v8, v9, v12 vmacc.vv v8, v11, v10 Modeled on 0cc981e0 from the AArch64 backend. C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`): ``` int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) { for (int i = 0; i < loop_bound; i += 2) { dst[i] = src[i] * a + src[i + 1] * b + 32; } } ```
2025-11-21[profdata] Skip probes with missing counter and function pointers (#163254)Ellis Hoag
2025-11-21[LTO] Use a range-based for loop (NFC) (#169000)Kazu Hirata
Identified with modernize-loop-convert.
2025-11-21[DA] remove getSplitIteration (#167698)Sebastian Pop
Remove getSplitIteration. A follow-up patch will also remove DVEntry::Splitable and Dependnece::isSplitable.
2025-11-21[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)Sergei Barannikov
ARM relies on deprecated TableGen behavior of guessing instruction properties from patterns (`def ARM : Target` doesn't have `guessInstructionProperties` set to false). Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup` has side effects because the instruction wasn't matched by any pattern. After the patch, TableGen guesses it has no side effects because the added pattern uses only `arm_wlssetup` node, which has no side effects. Add `SDNPSideEffect` to the node so that TableGen guesses the property right, and also `hasSideEffects = 1` to the instruction in case ARM ever sets `guessInstructionProperties` to false.
2025-11-21[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)Nick Sarnie
Some targets have a specific calling convention that should be used for generated calls to runtime functions. Pass that down and use it. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-21[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)Jay Foad
Fix a problem exposed by #166483 using AV classes in more places. `isVectorRegister` only accepts registers of VGPR or AGPR classes. `hasVectorRegisters` additionally accepts the combined AV classes. Fixes: #168761
2025-11-21AMDGPU: Stop implementing shouldCoalesce (#168988)Matt Arsenault
Use the default, which freely coalesces anything it can. This mostly shows improvements, with a handful of regressions. The main concern would be if introducing wider registers is more likely to push the register usage up to the next occupancy tier.
2025-11-21[TySan][Clang] Add clang flag to use tysan outlined instrumentation a… ↵Matthew Nagy
(#166170) …nd update docs
2025-11-21[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)Ramkumar Ramachandra
This allows us to strip an unnecessary TypeSwitch.
2025-11-21[VPlan] Only apply forced cost to recipes with underlying values. (#168372)Florian Hahn
Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: https://github.com/llvm/llvm-project/pull/168372
2025-11-21[AMDGPU] Enable multi-group xnack replay in hardware (GFX1250) (#169016)Christudasan Devadasan
This patch enables the multi-group xnack replay mode by configuring the hardware MODE register at kernel entry. This aligns the hardware behavior with the compiler's existing multi-group s_wait_xcnt insertion logic.
2025-11-21[LoopCacheAnalysis] Replace delinearization for fixed size array (#164798)Ryotaro Kasuga
This patch replaces the delinearization function used in LoopCacheAnalysis, switching from one that depends on type information in GEPs to one that does not. Once this patch and https://github.com/llvm/llvm-project/pull/161822 are landed, we can delete `tryDelinearizeFixedSize` from Delienarization, which is an optimization heuristic guided by GEP type information. After Polly eliminates its use of `getIndexExpressionsFromGEP`, we will be able to completely delete GEP-driven heuristics from Delinearization.
2025-11-21[ORC] Tailor ELF debugger support plugin to load-address patching only (#168518)Stefan Gränitz
In 4 years the ELF debugger support plugin wasn't adapted to other object formats or debugging approaches. After the renaming NFC in https://github.com/llvm/llvm-project/pull/168343, this patch tailors the plugin to ELF and section load-address patching. It allows removal of abstractions and consolidate processing steps with the newly enabled AllocActions from https://github.com/llvm/llvm-project/pull/168343. The key change is to process debug sections in one place in a post-allocation pass. Since we can handle the endianness of the ELF file the single `visitSectionLoadAddresses()` visitor function now, we don't need to track debug objects and sections in template classes anymore. We keep using the `DebugObject` class and drop `DebugObjectSection`, `ELFDebugObjectSection<ELFT>` and `ELFDebugObject`. Furthermore, we now use the allocation's working memory for load-address fixups directly. We can drop the `WritableMemoryBuffer` from the debug object and most of the `finalizeWorkingMemory()` step, which saves one copy of the entire debug object buffer. Inlining `finalizeAsync()` into the pre-fixup pass simplifies quite some logic. We still track `RegisteredObjs` here, because we want to free memory once the corresponding code is freed. There will be a follow-up patch that turns it into a dealloc action.
2025-11-21[RISCV] Update SpacemiT-X60 vector mask instructions latencies (#150644)Mikhail R. Gadelha
This PR adds hardware-measured latencies for all instructions defined in Section 15 of the RVV specification: "Vector Mask Instructions" to the SpacemiT-X60 scheduling model.
2025-11-21[OpenMP] Introduce "loop sequence" as directive association (#168934)Krzysztof Parzyszek
OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence" as the associated code. What used to be "loop association" has become "loop-nest association". Rename Association::Loop to LoopNest, add Association::LoopSeq to represent the "loop sequence" association. Change the association of fuse from "block" to "loop sequence".
2025-11-21[AArch64] Avoid introducing illegal types in LowerVECTOR_COMPRESS (NFC) ↵Benjamin Maxwell
(#168520) This does not seem to be an issue currently, but when using VECTOR_COMPRESS as part of another lowering, I found these BITCASTs would result in "Unexpected illegal type!" errors. For example, this would convert the legal nxv2f32 type into the illegal nxv2i32 type. This patch avoids this by using no-op casts for unpacked types.
2025-11-21[NVPTX] Support for dense and sparse MMA intrinsics with block scaling. ↵Kirill Vedernikov
(#163561) This change adds dense and sparse MMA intrinsics with block scaling. The implementation is based on [PTX ISA version 9.0](https://docs.nvidia.com/cuda/parallel-thread-execution/). Tests for new intrinsics are added for PTX 8.7 and SM 120a and are generated by `llvm/test/CodeGen/NVPTX/wmma-ptx87-sm120a.py`. The tests have been verified with ptxas from CUDA-13.0 release. Dense MMA intrinsics with block scaling were supported by @schwarzschild-radius.
2025-11-21[VPlan] Drop poison-generating flags on induction trunc (#168922)Ramkumar Ramachandra
After truncating an integer-induction, neither nuw nor nsw hold. Fixes #168902. Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-21[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for ↵Himadhith
adding the vector {1, 1, 1, 1} (#160882) This patch optimizes vector addition operations involving **`all-ones`** vectors by leveraging the generation of vectors of -1s(using `xxleqv`, which is cheaper than generating vectors of 1s(`vspltisw`). These are the respective vector types. `v2i64`: **`A + vector {1, 1}`** `v4i32`: **`A + vector {1, 1, 1, 1}`** `v8i16`: **`A + vector {1, 1, 1, 1, 1, 1, 1, 1}`** `v16i8`: **`A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}`** The optimized version replaces `vspltisw (4 cycles)` with `xxleqv (2 cycles)` using the following identity: `A - (-1) = A + 1`. --------- Co-authored-by: himadhith <himadhith.v@ibm.com> Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
2025-11-21[NVPTX] Fix PTX and SM conditions for narrow FP conversions (#168680)Srinivasa Ravi
This change fixes the PTX and SM conditions for narrow FP conversion intrinsics and adds support for family-conditionals.
2025-11-21[PowerPC] Fix Wparentheses warningJim Lin
PPCISelLowering.cpp:15567:27: warning: suggest parentheses around '&&' within '||' [-Wparentheses] 15567 | CC == ISD::SETEQ && "CC mus be ISD::SETNE or ISD::SETEQ"); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-11-21[llvm][RISCV] Implement Zilsd load/store pair optimization (#158640)Brandon Wu
This commit implements a complete load/store optimization pass for the RISC-V Zilsd extension, which combines pairs of 32-bit load/store instructions into single 64-bit LD/SD instructions when possible. Default alignment is 8, it also provide zilsd-4byte-align feature for looser condition. Related work: https://reviews.llvm.org/D144002 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-20TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)Matt Arsenault
Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.
2025-11-20AMDGPU: Don't duplicate implicit operands in 3-address conversion (#168426)Nicolai Hähnle
We previously got a duplicate implicit $exec operand. It didn't really hurt anything (other than being a slight drag on compile-time performance). Still, let's keep things clean.
2025-11-20[RISCV] Use SDT_RISCVIntUnaryOpW for RISCVISD::ABSW type profile. NFC (#168932)Craig Topper
This removes an unnecessary isel pattern for the RV32 HwMode.
2025-11-20[RISCV] Only add v2i32 to GPR regclass in the RV64 hardware mode. (#168930)Craig Topper
Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc
2025-11-20[DA] remove constraint propagation (#160924)Sebastian Pop
Remove all constraint propagation functions in Dependence Analysis.