| Age | Commit message (Collapse) | Author |
|
|
|
This is another of the easier to understand conditions from
TargetLibraryInfo
|
|
This is one of the easier cases to comprehend in TargetLibraryInfo's
setup.
|
|
Script scraped dump of most functions in TargetLibraryInfo.def,
with existing entries and a few special cases removed. This only
adds the definitions, and doesn't add them to any system yet.
Adding them in the correct places is the hard part, since it's
all written as opt-out with manually written exemptions in
TargetLibraryInfo.
|
|
Calloc was already here, but not the others. Also add
manual type information.
|
|
|
|
These were in TargetLibraryInfo, but missing from RuntimeLibcalls.
This only adds the cases that already have the non-chk variants
already. Copies the enabled-by-default logic from TargetLibraryInfo,
which is probably overly permissive. Only isPS opts-out.
|
|
These are floating-point functions recorded in TargetLibraryInfo,
but missing from RuntimeLibcalls.
|
|
This is mostly the output of a vibe coded script running on
VecFuncs.def, with a lot of manual cleanups and fixing where the
vibes were off. This is not yet wired up to anything (except for the
handful of calls which are already manually enabled). In the future
the SystemLibrary mechanism needs to be generalized to allow plugging
these sets in based on the flag.
One annoying piece is there are some name conflicts across the libraries.
Some of the libmvec functions have name collisions with some sleef functions.
I solved this by just adding a prefix to the libmvec functions. It would
probably be a good idea to add a prefix to every group. It gets ugly,
particularly since some of the sleef functions started to use a Sleef_ prefix,
but mostly do not.
|
|
This kind of helper is higher level and not general enough to go directly
in SelectionDAG. Most similar utilities are in TargetLowering.
|
|
in reduced BMI
|
|
Makes linalg.reduce and linalg.map region_ops so they can be constructed
from functions and be called as decorators.
|
|
Resolves #148131
- Unlock `std::optional<T&>` implementation
- Allow instantiations of `optional<T(&)(...)>` and `optional<T(&)[]>`
but disables `value_or()` and `optional::iterator` + all `iterator`
related functions
- Update documentation
- Update tests
|
|
(#166987)
Only use RuntimeLibcallsInfo. Remove the helper functions used to
transition.
|
|
|
|
|
|
--show-annotations' crashes (#167487)
Added handling the case of a non-materialized module, also don't call
printInfoComment for immaterializable values
|
|
|
|
Copy new process from sincos/sincospi
|
|
(#166985)
|
|
|
|
Summary:
The OpenMP handling using an offload binary should be optional, it's
only used for extra metadata for llvm-objdump. Also the triple was
completely wrong, it didn't let anyone correctly choose between ELF
and COFF handling.
|
|
motion" (#167465)
This patch introduces a new virtual method
`TargetInstrInfo::isSafeToMove()` to allow backends to control whether a
machine instruction can be safely moved by optimization passes.
The `BranchFolder` pass now respects this hook when hoisting common
code. By default, all instructions are considered safe to to move.
For LoongArch, `isSafeToMove()` is overridden to prevent
relocation-related instruction sequences (e.g. PC-relative addressing
and calls) from being broken by instruction motion. Correspondingly,
`isSchedulingBoundary()` is updated to reuse this logic for consistency.
Relands #163725
|
|
|
|
Summary
-------
While dogfooding lldb-dap, I observed that VSCode frequently displays
certain stack frames as greyed out. Although these frames have valid
debug information, double-clicking them shows disassembly instead of
source code. However, running `bt` from the LLDB command line correctly
displays source file and line information for these same frames,
indicating this is an lldb-dap specific issue.
Root Cause
----------
Investigation revealed that `DAP::ResolveSource()` incorrectly uses a
frame's PC address directly to determine whether valid source line
information exists. This approach works for leaf frames, but fails for
non-leaf (caller) frames where the PC points to the return address
immediately after a call instruction. This return address may fall into
compiler-generated code with no associated line information, even though
the actual call site has valid source location data.
The correct approach is to use the symbol context's line entry, which
LLDB resolves by effectively checking PC-1 for non-leaf frames, properly
identifying the line information for the call instruction rather than
the return address.
Testing
-------
Manually tested with VSCode debugging sessions on production workloads.
Verified that non-leaf frames now correctly display source code instead
of disassembly view.
Before the change symptom:
<img width="1013" height="216" alt="image"
src="https://github.com/user-attachments/assets/9487fbc0-f438-4892-a8d2-1437dc25399b"
/>
And here is after the fix:
<img width="1068" height="198" alt="image"
src="https://github.com/user-attachments/assets/0d2ebaa7-cca6-4983-a1d1-1a26ae62c86f"
/>
---------
Co-authored-by: Jeffrey Tan <jeffreytan@fb.com>
|
|
|
|
This should not be overridable and the special case hacks
have been replaced with RegClassByHwMode
|
|
Add requires to not run `invalid-section-index.s` test in non aarch64
supported environments.
|
|
The C++ index switch op has utilities for `getCaseBlock(int i)` and
`getDefaultBlock()`, so these have been added.
Optional body builder args have been added: one for the default case and
one for the switch cases.
|
|
Removed large offset test. It caused issue with ARM 32-bit because of
large offset.
Co-authored-by: anoopkg6 <anoopkg6@github.com>
|
|
tensor slices (#166569)
I hit another runtime verification issue (similar to
https://github.com/llvm/llvm-project/pull/164878) while working with
TFLite models. The verifier is incorrectly rejecting
`tensor.extract_slice` operations when extracting an empty slice
(size=0) that starts exactly at the tensor boundary.
The current runtime verification unconditionally enforces `offset <
dim_size`. This makes sense for non-empty slices, but it's too strict
for empty slices, causing false positives that lead to spurious runtime
assertions.
**Simple example that demonstrates the issue:**
```mlir
func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
// When called with: tensor size=10, offset=10, size=0
// Runtime verification fails: "offset 0 is out-of-bounds"
%slice = tensor.extract_slice %tensor[%offset] [%size] [1]
: tensor<?xf32> to tensor<?xf32>
return
}
```
For the above example, the check evaluates `10 < 10` which is false, so
verification fails. However, I believe this operation should be valid -
we're extracting zero elements, so there's no actual out-of-bounds
access.
**Real-world repro from the TensorFlow Lite models:**
This issue manifests while lowering TFLite models and a lot of our
system tests are failing due to this. Here's a simplified version
showing the problematic pattern:
In this code, `%extracted_slice_0` becomes an empty tensor when SSA
value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`.
The operation extracts zero elements along dimension 0, which is
semantically valid but fails runtime verification.
```mlir
func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c10 = arith.constant 10 : index
%c-1 = arith.constant -1 : index
%0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
%1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
%2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
%3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
%4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
%5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
%c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
%6:2 = scf.while (%arg1 = %0, %arg2 = %arg0)
: (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
%7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
%extracted = tensor.extract %7[] : tensor<i1>
scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
} do {
^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
%7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
// First slice
%8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
%10 = index.casts %extracted_0 : i32 to index
%11 = arith.cmpi eq, %10, %c-1 : index
%12 = arith.select %11, %c10, %10 : index
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
// Second slice - this is where the failure occurs
%13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
%15 = index.castu %extracted_1 : i32 to index
%16 = arith.subi %c10, %15 : index // size = 10 - offset
%extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
%17 = index.castu %extracted_2 : i32 to index
%extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
%18 = index.castu %extracted_3 : i32 to index
// On the last loop iteration: %15=10, %16=0
// %extracted_slice_0 becomes an empty tensor
// Runtime verification fails: "offset 0 is out-of-bounds"
%extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
%19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32}
: (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
}
return %6#1 : tensor<10x4x1xf32>
}
```
**The fix:**
Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 <= offset <= dim_size`
- Non-empty slice (size > 0): require `0 <= offset < dim_size`
**Question for reviewers:**
Should we also relax the static verifier to allow this edge case?
Currently, the static verifier rejects the following IR:
```mlir
%tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>
```
Since we're allowing it at runtime for dynamic shapes, it seems
inconsistent to reject it statically. However, I wanted to get feedback
before making that change - this PR focuses only on the runtime
verification fix for dynamic shapes.
P.S. We have a similar issue with `memref.subview`. I will send a
separate patch for the issue.
Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
|
|
subviews (#166581)
This PR applies the same fix from #166569 to `memref.subview`. That PR
fixed the issue for `tensor.extract_slice`, and this one addresses the
identical problem for `memref.subview`.
The runtime verification for `memref.subview` incorrectly rejects valid
empty subviews (size=0) starting at the memref boundary.
**Example that demonstrates the issue:**
```mlir
func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>,
%dim_0: index,
%dim_1: index,
%dim_2: index,
%offset: index) {
// When called with: offset=10, dim_0=0, dim_1=4, dim_2=1
// Runtime verification fails: "offset 0 is out-of-bounds"
%subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] :
memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to
memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
return
}
```
When `%offset=10` and `%dim_0=0`, we're creating an empty subview (zero
elements along dimension 0) starting at the boundary. The current
verification enforces `offset < dim_size`, which evaluates to `10 < 10`
and fails. I feel this should be valid since no memory is accessed.
**The fix:**
Same as #166569 - make the offset check conditional on subview size:
- Empty subview (size == 0): allow `0 <= offset <= dim_size`
- Non-empty subview (size > 0): require `0 <= offset < dim_size`
Please see #166569 for motivation and rationale.
---
Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
|
|
This shadows the member in the base class, but differs slightly
in behavior. The base method doesn't check for the invalid case.
|
|
|
|
(#159884)
This eliminates the pseudo registerclasses used to hack the
wave register class, which are now replaced with RegClassByHwMode,
so most of the diff is from register class ID renumbering.
|
|
|
|
Adds test coverage with loops where the same loads get executed under
complementary predicates and can be hoisted, together with a set of
negative test cases.
|
|
These should always use TargetConstant
|
|
Windows doesn't support `pthread_attr`, which was introduced to
asan_test.cpp in #165198, so this change `#ifdef`s out the changes made
in that PR.
Originally reported by Chrome as https://crbug.com/459880605.
|
|
Ran into a use case where we had a MachO object file with a section
symbol which did not have a section associated with it segfaults during
linking. This patch aims to handle such cases gracefully and avoid the
linker from crashing.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
|
|
When there are more than 255 sections, MachO object writer allows
creation of object files which are potentially malformed. Currently,
there are assertions in object writer code that prevents this behavior.
But for distributions where assertions are turned off this still results
in
creation of malformed object files. Turning assertions into explicit
errors.
|
|
Merge chasing latest versions of bulk test updates
|
|
|
|
Classof for most recipes directly supports VPValue, so there is no need
to call getDefiningRecipe when using isa/cast/dyn_cast.
|
|
|
|
`<stdbool.h>` is provided by the compiler and both Clang and GCC provide
C++-aware versions of these headers, making our own wrapper header
entirely unnecessary.
|
|
Allow widening up to 128-bit registers or if the new register class
is at least as large as one of the existing register classes.
This was artificially limiting. In particular this was doing the wrong
thing with sequences involving copies between VGPRs and AV registers.
Nearly all test changes are improvements.
The coalescer does not just widen registers out of nowhere. If it's
trying
to "widen" a register, it's generally packing a register into an
existing
register tuple, or in a situation where the constraints imply the wider
class anyway. 067a11015 addressed the allocation failure concern by
rejecting coalescing if there are no available registers. The original
change in a4e63ead4b didn't include a realistic testcase to judge if
this is harmful for pressure. I would expect any issues from this to
be of garden variety subreg handling issue. We could use more dynamic
state information here if it really is an issue.
I get the best results by removing this override completely. This is
a smaller step for patch splitting purposes.
|
|
This hangs in expensive_checks
|
|
Adjust the frame setup code for Windows ARM64 to attempt to align
pair-wise spills to 16-byte boundaries. This enables us to properly emit
the spills for custom clang calling convensions such as preserve most
which spills r9-r15 which are normally nonvolatile registers. Even when
using the ARM64EC opcodes for the unwinding, we cannot represent the
spill if it is unaligned.
|
|
`asm()` on function declarations is used for specifying the mangling.
But that specific spelling is a GNU extension unlike `__asm()`.
Found by building with `-std=c2y` in Clang's C frontend's config file.
|