summaryrefslogtreecommitdiff
path: root/llvm
AgeCommit message (Collapse)Author
2025-11-23[TableGen] Remove unnecessary use of MVT::SimpleTy. NFCHEADmainCraig Topper
This was missed in 0ef522ff68fff4266bf85e7b7a507a16a8fd34ee
2025-11-22[RISCV] Support zilsd-4byte-align for i64 load/store in SelectionDAG. (#169182)Craig Topper
I think we need to keep the SelectionDAG code for volatile load/store so we should support 4 byte alignment when possible.
2025-11-23Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg ↵Aiden Grossman
(#168661)" This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc. This caused a couple test failures, likely due to a mid-air collision. Reverting for now to get the tree back to green and allow the original author to run UTC/friends and verify the output.
2025-11-23[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)hstk30-hw
This maybe a bug which is introduced by commit 6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever since. In this case, `OtherReg` always overlaps with `DstReg` cause they from the `Copy` all.
2025-11-22[TableGen] Use MVT instead of MVT::SimpleValueType. NFC (#169180)Craig Topper
This improves type safety and is less verbose. Use SimpleTy only where an integer is needed like switches or emitting a VBR. --------- Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-11-23[TableGen] Constify CodeGenInstruction where possible (NFC) (#169193)Sergei Barannikov
2025-11-22[llvm] Use llvm::equal (NFC) (#169173)Kazu Hirata
While I am at it, this patch uses const l-value references for std::shared_ptr. We don't need to increment the reference count by passing std::shared_ptr by value. Identified with llvm-use-ranges.
2025-11-22[llvm] Remove unused local variables (NFC) (#169171)Kazu Hirata
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-22[VPlan] Share PreservesUniformity logic between isSingleScalar and ↵Florian Hahn
isUniformAcrossVFsAndUFs Extract the PreservesUniformity logic from isSingleScalar into a shared static helper function. Update isUniformAcrossVFsAndUFs to use this logic for VPWidenRecipe and VPInstruction, so that any opcode that preserves uniformity is considered uniform-across-vf-and-uf if its operands are. This unifies the uniformity checking logic and makes it easier to extend in the future. This should effectively by NFC currently.
2025-11-22[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374)tyb0807
- Support serialization of the number of allocated preload kernarg SGPRs - Support serialization of the first preload kernarg SGPR allocated Together they enable reconstructing correctly MIR with preload kernarg SGPRs.
2025-11-22ELF: Use index 0 for unversioned undefined symbols (#168189)Fangrui Song
The GNU documentation is ambiguous about the version index for unversioned undefined symbols. The current specification at https://sourceware.org/gnu-gabi/program-loading-and-dynamic-linking.txt defines VER_NDX_LOCAL (0) as "The symbol is private, and is not available outside this object." However, this naming is misleading for undefined symbols. As suggested in discussions, VER_NDX_LOCAL should conceptually be VER_NDX_NONE and apply to unversioned undefined symbols as well. GNU ld has used index 0 for unversioned undefined symbols both before version 2.35 (see https://sourceware.org/PR26002) and in the upcoming 2.46 release (see https://sourceware.org/PR33577). This change aligns with GNU ld's behavior by switching from index 1 to index 0. While here, add a test to dso-undef-extract-lazy.s that undefined symbols of index 0 in DSO are treated as unversioned symbols.
2025-11-22[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099)Florian Hahn
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction. PR: https://github.com/llvm/llvm-project/pull/166099
2025-11-22[DFAJumpThreading] Try harder to avoid cycles in paths. (#169151)Usman Nadeem
If a threading path has cycles within it then the transformation is not correct. This patch fixes a couple of cases that create such cycles. Fixes https://github.com/llvm/llvm-project/issues/166868
2025-11-22[InstCombine] Generalize trunc-shift-icmp fold from (1 << Y) to (Pow2 << Y) ↵Pedro Lobo
(#169163) Extends the `icmp(trunc(shl))` fold to handle any power of 2 constant as the shift base, not just 1. This generalizes the following patterns by adjusting the comparison offsets by `log2(Pow2)`. ```llvm (trunc (1 << Y) to iN) == 0 --> Y u>= N (trunc (1 << Y) to iN) != 0 --> Y u< N (trunc (1 << Y) to iN) == 2**C --> Y == C (trunc (1 << Y) to iN) != 2**C --> Y != C ; to (trunc (Pow2 << Y) to iN) == 0 --> Y u>= N - log2(Pow2) (trunc (Pow2 << Y) to iN) != 0 --> Y u< N - log2(Pow2) (trunc (Pow2 << Y) to iN) == 2**C --> Y == C - log2(Pow2) (trunc (Pow2 << Y) to iN) != 2**C --> Y != C - log2(Pow2) ``` Proof: https://alive2.llvm.org/ce/z/2zwTkp
2025-11-22Add new llvm.dbg.declare_value intrinsic. (#168132)Shubham Sandeep Rastogi
For swift async code, we need to use a debug intrinsic that behaves like an llvm.dbg.declare but can take any location type rather than just a pointer or integer. To solve this, a new debug instrinsic called llvm.dbg.declare_value has been created, which behaves exactly like an llvm.dbg.declare but can take non pointer and integer location types. More information here: https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269 This is the first patch as part of a stack of patches, with the one succeeding it being: https://github.com/llvm/llvm-project/pull/168134
2025-11-22[CallBrPrepare] Prefer Function &F over Function &FnAiden Grossman
Function &F is the more standard abbreviation (~4000 uses in llvm versus ~300 uses).
2025-11-21[NFC][LLDB] Make it possible to detect if the compiler used in tests ↵Dan Liew
supports -fbounds-safety (#169112) This patch makes it possible to detect in LLDB shell and API tests if `-fbounds-safety` is supported by the compiler used for testing. The motivation behind this is to allow upstreaming https://github.com/swiftlang/llvm-project/pull/11835 but with the tests disabled in upstream because the full implementation of -fbounds-safety isn't available in Clang yet. For shell tests when -fbounds-safety is available the `clang-bounds-safety` feature is available which means tests can be annotated with `# REQUIRES: clang-bounds-safety`. API tests that need -fbounds-safety support in the compiler can use the new `@skipUnlessBoundsSafety` decorator. rdar://165225507
2025-11-21DAG: Handle poison in m_Undef (#168288)Matt Arsenault
2025-11-21[test][Support] Disable SignalsTest.PrintsSymbolizerMarkup (#168974)Jessica Clarke
This test checks that DSOMarkupPrinter::printDSOMarkup prints the module and segment mappings, but that is only done if we can determine the GNU build ID for the given object, and in many environments that is not enabled by default (e.g. the FreeBSD Clang driver never enables it, and various other Clang drivers only do so when ENABLE_LINKER_BUILD_ID is opted into at configure time). GCC tends to enable it by default, and many distributions enable it for Clang, so this has gone unnoticed for a while, but this test has been failing on FreeBSD since its creation. Fixes: 22b9404f09dc ("Optionally print symbolizer markup backtraces.") See: https://github.com/llvm/llvm-project/issues/168891
2025-11-21[SCEVExp] Remove early exit, rely on InstSimplifyFolder (NFCI).Florian Hahn
Remove the SCEV-based check refined in https://github.com/llvm/llvm-project/pull/156910, as InstSimplifyFolder manages to simplify the generated code to false directly as well.
2025-11-21[unroll-and-jam] Document dependencies_multidims.ll and fix loop bounds ↵Sebastian Pop
(NFC) (#156578) Add detailed comments explaining why each function should/shouldn't be unroll-and-jammed based on memory access patterns and dependencies. Fix loop bounds to ensure array accesses are within array bounds: * sub_sub_less: j starts from 1 (not 0) to ensure j-1 >= 0 * sub_sub_less_3d: k starts from 1 (not 0) to ensure k-1 >= 0 * sub_sub_outer_scalar: j starts from 1 (not 0) to ensure j-1 >= 0
2025-11-21[DA] remove Constraints class (#168963)Sebastian Pop
Remove the Constraints class from dependence analysis as it is now unused.
2025-11-22[DAGCombiner] Don't optimize insert_vector_elt into shuffle if implicit ↵Hongyu Chen
truncation exists (#169022) Fixes #169017
2025-11-21AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818)Nicolai Hähnle
These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.
2025-11-21[profcheck] Propagate profile metadata to Wrapper function in optimize mode ↵Jin Huang
of ExpandVariadic. (#168161) This PR fixes the issue where profile metadata (`!prof`) is dropped from the `VariadicWrapper` when `ExpandVariadics` runs in `--expand-variadics-override=optimize` mode. In optimize mode, the pass splits the original variadic function into two parts: - A **VariadicWrapper** (retaining the original name) that handles the `va_list` setup. - A **FixedArityReplacement** (new function) that contains the original core logic. During this process, the basic blocks and associated metadata are spliced into the `FixedArityReplacement`. Consequently, the `VariadicWrapper`—which serves as the entry point for callers—is left without function entry count metadata. This change explicitly copies the `MD_prof` metadata from the `FixedArityReplacement` back to the `VariadicWrapper` after the split is defined. Co-authored-by: Jin Huang <jingold@google.com>
2025-11-21AMDGPU: Handle invariant when lowering global loads (#168914)Matt Arsenault
Global with invariant should be treated identically to constant.
2025-11-21AMDGPU: Add baseline test for split/widen invariant loads (#168913)Matt Arsenault
This works fine on main, but broke after a future patch.
2025-11-21Revert "[ORC] Tailor ELF debugger support plugin to load-address patching ↵Stefan Gränitz
only" (#169073) Reverts llvm/llvm-project#168518
2025-11-21[HLSL] Add Load overload with status (#166449)Joshua Batista
This PR adds a Load method for resources, which takes an additional parameter by reference, status. It fills the status parameter with a 1 or 0, depending on whether or not the resource access was mapped. CheckAccessFullyMapped is also added as an intrinsic, and called in the production of this status bit. Only addresses DXIL for the below issue: https://github.com/llvm/llvm-project/issues/138910 Also only addresses the DXIL variant for the below issue: https://github.com/llvm/llvm-project/issues/99204
2025-11-21[Support] Use range-based for loops (NFC) (#169001)Kazu Hirata
Identified with modernize-loop-convert.
2025-11-21Revert "[AMDGPU] Remove leftover implicit operands from ↵Nathan Corbyn
SI_SPILL/SI_RESTORE." (#169068) PR causes build failures with expensive checks enabled Reverts llvm/llvm-project#168546
2025-11-21[RISCV] Incorporate scalar addends to extend vector multiply accumulate ↵Ryan Buchner
chains (#168660) Previously, the following: %mul0 = mul nsw <8 x i32> %m00, %m01 %mul1 = mul nsw <8 x i32> %m10, %m11 %add0 = add <8 x i32> %mul0, splat (i32 32) %add1 = add <8 x i32> %add0, %mul1 lowered to: vsetivli zero, 8, e32, m2, ta, ma vmul.vv v8, v8, v9 vmacc.vv v8, v11, v10 li a0, 32 vadd.vx v8, v8, a0 After this patch, now lowers to: li a0, 32 vsetivli zero, 8, e32, m2, ta, ma vmv.v.x v12, a0 vmadd.vv v8, v9, v12 vmacc.vv v8, v11, v10 Modeled on 0cc981e0 from the AArch64 backend. C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`): ``` int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) { for (int i = 0; i < loop_bound; i += 2) { dst[i] = src[i] * a + src[i + 1] * b + 32; } } ```
2025-11-21llvm: Disable copy for SingleThreadExecutor (#168782)Fabrice de Gans
This is a workaround for the MSVC compiler, which attempts to generate a copy assignment operator implementation for classes marked as `__declspec(dllexport)`. Explicitly marking the copy assignment operator as deleted works around the problem. DevCom ticket: https://developercommunity.microsoft.com/t/Classes-marked-with-__declspecdllexport/11003192
2025-11-21[profdata] Skip probes with missing counter and function pointers (#163254)Ellis Hoag
2025-11-21[CAS] Remove redundant casts (NFC) (#169002)Kazu Hirata
FileOffset::get already returns uint64_t. Identified with readability-redundant-casting.
2025-11-21[LTO] Use a range-based for loop (NFC) (#169000)Kazu Hirata
Identified with modernize-loop-convert.
2025-11-21[DA] remove getSplitIteration (#167698)Sebastian Pop
Remove getSplitIteration. A follow-up patch will also remove DVEntry::Splitable and Dependnece::isSplitable.
2025-11-21[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)Sergei Barannikov
ARM relies on deprecated TableGen behavior of guessing instruction properties from patterns (`def ARM : Target` doesn't have `guessInstructionProperties` set to false). Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup` has side effects because the instruction wasn't matched by any pattern. After the patch, TableGen guesses it has no side effects because the added pattern uses only `arm_wlssetup` node, which has no side effects. Add `SDNPSideEffect` to the node so that TableGen guesses the property right, and also `hasSideEffects = 1` to the instruction in case ARM ever sets `guessInstructionProperties` to false.
2025-11-21[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)Nick Sarnie
Some targets have a specific calling convention that should be used for generated calls to runtime functions. Pass that down and use it. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-21[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)Jay Foad
Fix a problem exposed by #166483 using AV classes in more places. `isVectorRegister` only accepts registers of VGPR or AGPR classes. `hasVectorRegisters` additionally accepts the combined AV classes. Fixes: #168761
2025-11-21AMDGPU: Stop implementing shouldCoalesce (#168988)Matt Arsenault
Use the default, which freely coalesces anything it can. This mostly shows improvements, with a handful of regressions. The main concern would be if introducing wider registers is more likely to push the register usage up to the next occupancy tier.
2025-11-21[TySan][Clang] Add clang flag to use tysan outlined instrumentation a… ↵Matthew Nagy
(#166170) …nd update docs
2025-11-21Fix test from #168609 (#169041)Walter Lee
2025-11-21[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)Ramkumar Ramachandra
This allows us to strip an unnecessary TypeSwitch.
2025-11-21[VPlan] Only apply forced cost to recipes with underlying values. (#168372)Florian Hahn
Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: https://github.com/llvm/llvm-project/pull/168372
2025-11-21[AMDGPU] Enable multi-group xnack replay in hardware (GFX1250) (#169016)Christudasan Devadasan
This patch enables the multi-group xnack replay mode by configuring the hardware MODE register at kernel entry. This aligns the hardware behavior with the compiler's existing multi-group s_wait_xcnt insertion logic.
2025-11-21[LoopCacheAnalysis] Replace delinearization for fixed size array (#164798)Ryotaro Kasuga
This patch replaces the delinearization function used in LoopCacheAnalysis, switching from one that depends on type information in GEPs to one that does not. Once this patch and https://github.com/llvm/llvm-project/pull/161822 are landed, we can delete `tryDelinearizeFixedSize` from Delienarization, which is an optimization heuristic guided by GEP type information. After Polly eliminates its use of `getIndexExpressionsFromGEP`, we will be able to completely delete GEP-driven heuristics from Delinearization.
2025-11-21[ORC] Tailor ELF debugger support plugin to load-address patching only (#168518)Stefan Gränitz
In 4 years the ELF debugger support plugin wasn't adapted to other object formats or debugging approaches. After the renaming NFC in https://github.com/llvm/llvm-project/pull/168343, this patch tailors the plugin to ELF and section load-address patching. It allows removal of abstractions and consolidate processing steps with the newly enabled AllocActions from https://github.com/llvm/llvm-project/pull/168343. The key change is to process debug sections in one place in a post-allocation pass. Since we can handle the endianness of the ELF file the single `visitSectionLoadAddresses()` visitor function now, we don't need to track debug objects and sections in template classes anymore. We keep using the `DebugObject` class and drop `DebugObjectSection`, `ELFDebugObjectSection<ELFT>` and `ELFDebugObject`. Furthermore, we now use the allocation's working memory for load-address fixups directly. We can drop the `WritableMemoryBuffer` from the debug object and most of the `finalizeWorkingMemory()` step, which saves one copy of the entire debug object buffer. Inlining `finalizeAsync()` into the pre-fixup pass simplifies quite some logic. We still track `RegisteredObjs` here, because we want to free memory once the corresponding code is freed. There will be a follow-up patch that turns it into a dealloc action.
2025-11-21[RISCV] Update SpacemiT-X60 vector mask instructions latencies (#150644)Mikhail R. Gadelha
This PR adds hardware-measured latencies for all instructions defined in Section 15 of the RVV specification: "Vector Mask Instructions" to the SpacemiT-X60 scheduling model.
2025-11-21[OpenMP] Introduce "loop sequence" as directive association (#168934)Krzysztof Parzyszek
OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence" as the associated code. What used to be "loop association" has become "loop-nest association". Rename Association::Loop to LoopNest, add Association::LoopSeq to represent the "loop sequence" association. Change the association of fuse from "block" to "loop sequence".