summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-11-11RuntimeLibcalls: Add small_printf functions to emscriptenusers/arsenm/runtime-libcalls/add-small-printf-emscriptenMatt Arsenault
2025-11-11RuntimeLibcalls: Add macos unlocked IO functions to systemsusers/arsenm/runtime-libcalls/add-unlocked-io-funcs-macosMatt Arsenault
This is another of the easier to understand conditions from TargetLibraryInfo
2025-11-11RuntimeLibcalls: Add memset_pattern* calls to darwin systemsusers/arsenm/runtime-libcalls/add-memset-pattern-darwinMatt Arsenault
This is one of the easier cases to comprehend in TargetLibraryInfo's setup.
2025-11-11RuntimeLibcalls: Add more function entries from TargetLibraryInfousers/arsenm/runtime-libcalls/add-more-entries-from-targetlibraryinfoMatt Arsenault
Script scraped dump of most functions in TargetLibraryInfo.def, with existing entries and a few special cases removed. This only adds the definitions, and doesn't add them to any system yet. Adding them in the correct places is the hard part, since it's all written as opt-out with manually written exemptions in TargetLibraryInfo.
2025-11-11RuntimeLibcalls: Add malloc and free entriesusers/arsenm/runtime-libcalls/add-malloc-calloc-freeMatt Arsenault
Calloc was already here, but not the others. Also add manual type information.
2025-11-11RuntimeLibcalls: Add mustprogress to common function attributesusers/arsenm/runtime-libcalls/add-mustprogress-common-fn-attrsMatt Arsenault
2025-11-11RuntimeLibcalls: Add __memcpy_chk, __memmove_chk, __memset_chkusers/arsenm/runtime-libcalls/add-memcpy-memmove-memset-chk-functionsMatt Arsenault
These were in TargetLibraryInfo, but missing from RuntimeLibcalls. This only adds the cases that already have the non-chk variants already. Copies the enabled-by-default logic from TargetLibraryInfo, which is probably overly permissive. Only isPS opts-out.
2025-11-11RuntimeLibcalls: Add a few libm entries from TargetLibraryInfousers/arsenm/runtime-libcalls/add-libm-functions-from-targetlibraryinfoMatt Arsenault
These are floating-point functions recorded in TargetLibraryInfo, but missing from RuntimeLibcalls.
2025-11-11RuntimeLibcalls: Add definitions for vector math functionsusers/arsenm/runtime-libcalls/add-vector-library-functionsMatt Arsenault
This is mostly the output of a vibe coded script running on VecFuncs.def, with a lot of manual cleanups and fixing where the vibes were off. This is not yet wired up to anything (except for the handful of calls which are already manually enabled). In the future the SystemLibrary mechanism needs to be generalized to allow plugging these sets in based on the flag. One annoying piece is there are some name conflicts across the libraries. Some of the libmvec functions have name collisions with some sleef functions. I solved this by just adding a prefix to the libmvec functions. It would probably be a good idea to add a prefix to every group. It gets ugly, particularly since some of the sleef functions started to use a Sleef_ prefix, but mostly do not.
2025-11-11DAG: Move expandMultipleResultFPLibCall to TargetLowering (NFC)Matt Arsenault
This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.
2025-11-12[NFC] [C++20] [Modules] Test that we can avoid adding more specializations ↵Chuanqi Xu
in reduced BMI
2025-11-11[MLIR][Python] Add region_op wrappers for linalg (#167616)Asher Mancinelli
Makes linalg.reduce and linalg.map region_ops so they can be constructed from functions and be called as decorators.
2025-11-12[libc++] Implement P2988R12: `std::optional<T&>` (#155202)William Tran-Viet
Resolves #148131 - Unlock `std::optional<T&>` implementation - Allow instantiations of `optional<T(&)(...)>` and `optional<T(&)[]>` but disables `value_or()` and `optional::iterator` + all `iterator` related functions - Update documentation - Update tests
2025-11-12DAG: Stop using TargetLibraryInfo for multi-result FP intrinsic codegen ↵Matt Arsenault
(#166987) Only use RuntimeLibcallsInfo. Remove the helper functions used to transition.
2025-11-11[libunwind] Fix execution flow imbalance when using C++ Exceptions (#165066)Med Ismail Bennani
2025-11-11[WebAssembly] Use MCRegister::id(). NFC (#167609)Craig Topper
2025-11-11[llvm][llvm-dis] Fix 'llvm-dis' with '--materialize-metadata ↵Timur Baydyusenov
--show-annotations' crashes (#167487) Added handling the case of a non-materialized module, also don't call printInfoComment for immaterializable values
2025-11-11InferAddressSpaces: Add more baseline tests for assume handling (#167611)Matt Arsenault
2025-11-11DAG: Use modf vector libcalls through RuntimeLibcalls (#166986)Matt Arsenault
Copy new process from sincos/sincospi
2025-11-12RuntimeLibcalls: Add libcall entries for sleef and armpl modf functions ↵Matt Arsenault
(#166985)
2025-11-11[NFC] Generalize the arithmetic type for `getDisjunctionWeights` (#167593)Mircea Trofin
2025-11-11[llvm-offload-wrapper] Fix Triple and OpenMP handling (#167580)Joseph Huber
Summary: The OpenMP handling using an offload binary should be optional, it's only used for extra metadata for llvm-objdump. Also the triple was completely wrong, it didn't let anyone correctly choose between ELF and COFF handling.
2025-11-12Reland "[LoongArch] Add `isSafeToMove` hook to prevent unsafe instruction ↵hev
motion" (#167465) This patch introduces a new virtual method `TargetInstrInfo::isSafeToMove()` to allow backends to control whether a machine instruction can be safely moved by optimization passes. The `BranchFolder` pass now respects this hook when hoisting common code. By default, all instructions are considered safe to to move. For LoongArch, `isSafeToMove()` is overridden to prevent relocation-related instruction sequences (e.g. PC-relative addressing and calls) from being broken by instruction motion. Correspondingly, `isSchedulingBoundary()` is updated to reuse this logic for consistency. Relands #163725
2025-11-11[AMDGPU][GISel] Add RegBankLegalize support for G_AMDGPU_WAVE_ADDRESS (#167456)Chinmay Deshpande
2025-11-12Fix lldb-dap non-leaf frame source resolution issue (#165944)jeffreytan81
Summary ------- While dogfooding lldb-dap, I observed that VSCode frequently displays certain stack frames as greyed out. Although these frames have valid debug information, double-clicking them shows disassembly instead of source code. However, running `bt` from the LLDB command line correctly displays source file and line information for these same frames, indicating this is an lldb-dap specific issue. Root Cause ---------- Investigation revealed that `DAP::ResolveSource()` incorrectly uses a frame's PC address directly to determine whether valid source line information exists. This approach works for leaf frames, but fails for non-leaf (caller) frames where the PC points to the return address immediately after a call instruction. This return address may fall into compiler-generated code with no associated line information, even though the actual call site has valid source location data. The correct approach is to use the symbol context's line entry, which LLDB resolves by effectively checking PC-1 for non-leaf frames, properly identifying the line information for the call instruction rather than the return address. Testing ------- Manually tested with VSCode debugging sessions on production workloads. Verified that non-leaf frames now correctly display source code instead of disassembly view. Before the change symptom: <img width="1013" height="216" alt="image" src="https://github.com/user-attachments/assets/9487fbc0-f438-4892-a8d2-1437dc25399b" /> And here is after the fix: <img width="1068" height="198" alt="image" src="https://github.com/user-attachments/assets/0d2ebaa7-cca6-4983-a1d1-1a26ae62c86f" /> --------- Co-authored-by: Jeffrey Tan <jeffreytan@fb.com>
2025-11-12[gn build] Port 5c3323a59fd2LLVM GN Syncbot
2025-11-11AMDGPU: Remove override of TargetInstrInfo::getRegClass (#159886)Matt Arsenault
This should not be overridable and the special case hacks have been replaced with RegClassByHwMode
2025-11-11[MachO] Fix test failure. (#167598)Prabhu Rajasekaran
Add requires to not run `invalid-section-index.s` test in non aarch64 supported environments.
2025-11-11[MLIR][Python] Add wrappers for scf.index_switch (#167458)Asher Mancinelli
The C++ index switch op has utilities for `getCaseBlock(int i)` and `getDefaultBlock()`, so these have been added. Optional body builder args have been added: one for the default case and one for the switch cases.
2025-11-12[JITLINK] Fix large offset issue (#167600)anoopkg6
Removed large offset test. It caused issue with ARM 32-bit because of large offset. Co-authored-by: anoopkg6 <anoopkg6@github.com>
2025-11-12[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty ↵Hanumanth
tensor slices (#166569) I hit another runtime verification issue (similar to https://github.com/llvm/llvm-project/pull/164878) while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary. The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions. **Simple example that demonstrates the issue:** ```mlir func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) { // When called with: tensor size=10, offset=10, size=0 // Runtime verification fails: "offset 0 is out-of-bounds" %slice = tensor.extract_slice %tensor[%offset] [%size] [1] : tensor<?xf32> to tensor<?xf32> return } ``` For the above example, the check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access. **Real-world repro from the TensorFlow Lite models:** This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern: In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification. ```mlir func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c10 = arith.constant 10 : index %c-1 = arith.constant -1 : index %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32> %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32> %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32> %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32> %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32> %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32> %c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1> %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) { %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1> %extracted = tensor.extract %7[] : tensor<i1> scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32> } do { ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>): %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32> // First slice %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32> %10 = index.casts %extracted_0 : i32 to index %11 = arith.cmpi eq, %10, %c-1 : index %12 = arith.select %11, %c10, %10 : index %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> // Second slice - this is where the failure occurs %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32> %15 = index.castu %extracted_1 : i32 to index %16 = arith.subi %c10, %15 : index // size = 10 - offset %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32> %17 = index.castu %extracted_2 : i32 to index %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32> %18 = index.castu %extracted_3 : i32 to index // On the last loop iteration: %15=10, %16=0 // %extracted_slice_0 becomes an empty tensor // Runtime verification fails: "offset 0 is out-of-bounds" %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32> scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32> } return %6#1 : tensor<10x4x1xf32> } ``` **The fix:** Make the offset check conditional on slice size: - Empty slice (size == 0): allow `0 <= offset <= dim_size` - Non-empty slice (size > 0): require `0 <= offset < dim_size` **Question for reviewers:** Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR: ```mlir %tensor = arith.constant dense<1.0> : tensor<10xf32> %slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32> ``` Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2025-11-12[mlir][memref] Fix runtime verification for memref.subview for empty memref ↵Hanumanth
subviews (#166581) This PR applies the same fix from #166569 to `memref.subview`. That PR fixed the issue for `tensor.extract_slice`, and this one addresses the identical problem for `memref.subview`. The runtime verification for `memref.subview` incorrectly rejects valid empty subviews (size=0) starting at the memref boundary. **Example that demonstrates the issue:** ```mlir func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) { // When called with: offset=10, dim_0=0, dim_1=4, dim_2=1 // Runtime verification fails: "offset 0 is out-of-bounds" %subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return } ``` When `%offset=10` and `%dim_0=0`, we're creating an empty subview (zero elements along dimension 0) starting at the boundary. The current verification enforces `offset < dim_size`, which evaluates to `10 < 10` and fails. I feel this should be valid since no memory is accessed. **The fix:** Same as #166569 - make the offset check conditional on subview size: - Empty subview (size == 0): allow `0 <= offset <= dim_size` - Non-empty subview (size > 0): require `0 <= offset < dim_size` Please see #166569 for motivation and rationale. --- Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
2025-11-11AMDGPU: Remove wrapper around TRI::getRegClass (#159885)Matt Arsenault
This shadows the member in the base class, but differs slightly in behavior. The base method doesn't check for the invalid case.
2025-11-11AMDGPU: Update register class numbers in test (#167601)Matt Arsenault
2025-11-11AMDGPU: Start using RegClassByHwMode for wavesize operandsMatt Arsenault
(#159884) This eliminates the pseudo registerclasses used to hack the wave register class, which are now replaced with RegClassByHwMode, so most of the diff is from register class ID renumbering.
2025-11-11workflows/libclang-abi-tests: Use new container (#167459)Tom Stellard
2025-11-11[VPlan] Add tests for hoisting predicated loads.Florian Hahn
Adds test coverage with loops where the same loads get executed under complementary predicates and can be hoisted, together with a set of negative test cases.
2025-11-11AArch64: Use TargetConstant for intrinsic IDs (#166661)Matt Arsenault
These should always use TargetConstant
2025-11-11[compiler-rt][asan] Fix a test on Windows (#167591)Alan Zhao
Windows doesn't support `pthread_attr`, which was introduced to asan_test.cpp in #165198, so this change `#ifdef`s out the changes made in that PR. Originally reported by Chrome as https://crbug.com/459880605.
2025-11-11[lld][macho] Fix segfault while processing malformed object file. (#167025)Prabhu Rajasekaran
Ran into a use case where we had a MachO object file with a section symbol which did not have a section associated with it segfaults during linking. This patch aims to handle such cases gracefully and avoid the linker from crashing. --------- Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
2025-11-11[MachO] Report error when there are too many sections (#167418)Prabhu Rajasekaran
When there are more than 255 sections, MachO object writer allows creation of object files which are potentially malformed. Currently, there are assertions in object writer code that prevents this behavior. But for distributions where assertions are turned off this still results in creation of malformed object files. Turning assertions into explicit errors.
2025-11-11AMDGPU: Regenerate test checks after bbde79278 (#167590)Matt Arsenault
Merge chasing latest versions of bulk test updates
2025-11-12[CodeGen] Use MCRegUnit in more places (NFC) (#167578)Sergei Barannikov
2025-11-11[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC)Florian Hahn
Classof for most recipes directly supports VPValue, so there is no need to call getDefiningRecipe when using isa/cast/dyn_cast.
2025-11-11[SPIRV] Use MCRegister instead of unsigned. NFC (#167585)Craig Topper
2025-11-11[libc++] Remove <stdbool.h> (#164595)Nikolas Klauser
`<stdbool.h>` is provided by the compiler and both Clang and GCC provide C++-aware versions of these headers, making our own wrapper header entirely unnecessary.
2025-11-11AMDGPU: Relax shouldCoalesce to allow more register tuple widening (#166475)Matt Arsenault
Allow widening up to 128-bit registers or if the new register class is at least as large as one of the existing register classes. This was artificially limiting. In particular this was doing the wrong thing with sequences involving copies between VGPRs and AV registers. Nearly all test changes are improvements. The coalescer does not just widen registers out of nowhere. If it's trying to "widen" a register, it's generally packing a register into an existing register tuple, or in a situation where the constraints imply the wider class anyway. 067a11015 addressed the allocation failure concern by rejecting coalescing if there are no available registers. The original change in a4e63ead4b didn't include a realistic testcase to judge if this is harmful for pressure. I would expect any issues from this to be of garden variety subreg handling issue. We could use more dynamic state information here if it really is an issue. I get the best results by removing this override completely. This is a smaller step for patch splitting purposes.
2025-11-11PPC: Disable type checking in xfailed sincospi test (#167563)Matt Arsenault
This hangs in expensive_checks
2025-11-11AArch64: align pair-wise spills on WoS to 16-byte (#166902)Saleem Abdulrasool
Adjust the frame setup code for Windows ARM64 to attempt to align pair-wise spills to 16-byte boundaries. This enables us to properly emit the spills for custom clang calling convensions such as preserve most which spills r9-r15 which are normally nonvolatile registers. Even when using the ARM64EC opcodes for the unwinding, we cannot represent the spill if it is unaligned.
2025-11-11[NFCI][lldb][test] Avoid GNU extension for specifying mangling (#167221)Raul Tambre
`asm()` on function declarations is used for specifying the mangling. But that specific spelling is a GNU extension unlike `__asm()`. Found by building with `-std=c2y` in Clang's C frontend's config file.