| Age | Commit message (Collapse) | Author |
|
|
|
instruction with basic inline-asm
Finishes adding basic inline-asm callbr support for AMDGPU, started by
https://github.com/llvm/llvm-project/pull/149308.
|
|
A collection of small changes to get a number of lit tests working on
z/OS.
|
|
(#149308)
First batch of changes to add support for inline-asm callbr for the
AMDGPU backend.
|
|
targets (#165676)
If the PTEST is just using the ZF result and one of the operands is a
i32/i64 sign mask we can use the TESTPD/PS instructions instead and
avoid the use of an extra constant.
Fixes some codegen identified in #156233
|
|
Not sure if this warrants a PR, but I realized there was a typo in a
test filename from my previous PR #164387.
|
|
This reverts commit 78bf682cb9033cf6a5bbc733e062c7b7d825fdaf.
Original PR: #157463
Revert PR: #158566
The relevant buildbots have been updated to a ROCm version that does not
use the macros anymore to avoid the failures.
Implements SWDEV-522062.
|
|
(#165540)
This patch allows us to narrow single bit-test/twiddle operations for
larger than legal scalar integers to efficiently operate just on the i32
sub-integer block actually affected.
The BITOP(X,SHL(1,IDX)) patterns are split, with the IDX used to access
the specific i32 block as well as specific bit within that block.
BT comparisons are relatively simple, and builds on the truncated
shifted loads fold from #165266.
BTC/BTR/BTS bit twiddling patterns need to match the entire RMW pattern
to safely confirm only one block is affected, but a similar approach is
taken and creates codegen that should allow us to further merge with
matching BT opcodes in a future patch (see #165291).
The resulting codegen is notably more efficient than the heavily
micro-coded memory folded variants of BT/BTC/BTR/BTS.
There is still some work to improve the bit insert 'init' patterns
included in bittest-big-integer.ll but I'm expecting this to be a
straightforward future extension.
Fixes #164225
|
|
As noticed on #165676 - if we're increasing the use of an operand we should freeze it
|
|
It looks like the documentation for `llvm-cxxfilt`'s
`--[no-]strip-underscore` options weren't updated when
https://github.com/llvm/llvm-project/pull/106233 was made.
CC @Michael137 (I don't have merge rights myself).
|
|
Pulled out of the abandoned patch #69710 to act as a baseline for #165694
|
|
|
|
This patch adds test cases that demonstrate missing dependencies in DA
caused by the lack of overflow handling. These issues will be addressed
by properly inserting overflow checks and bailing out when one is
detected.
It covers the following dependence test functions:
- Strong SIV
- Weak-Crossing SIV
- Weak-Zero SIV
- Symbolic RDIV
- GCD MIV
It does NOT cover:
- Exact SIV
- Exact RDIV
- Banerjee MIV
|
|
intrinsics. (#165400)
This is the first step in removing some NEON reduction intrinsics that
duplicate the behaviour of their llvm.vector.reduce counterpart.
NOTE: The i8/i16 variants differ in that the NEON versions return an i32
result. However, this looks more about making their code generation
convenient with SelectionDAG disgarding the extra bits. This is only
relevant for the next phase because the Clang usage always truncate
their result, making llvm.vector.reduce a drop in replacement.
|
|
Currently all `runInTerminal` test are skipped in debug builds because,
when attaching it times out parsing the debug symbols of lldb-dap.
Add this test since it is running in teminal.
|
|
(#165688)
Reverts llvm/llvm-project#165496
Due to flaky failures on Arm 32-bit since this change. Detailed in
https://github.com/llvm/llvm-project/pull/165496#issuecomment-3467209089.
|
|
Initial parsing/sema/codegen support for threadset clause in task and
taskloop directives [Section 14.8 in in OpenMP 6.0 spec]
---------
|
|
Also rename map to Map, remove the m_ prefix from member variables and
fix the naming of the existing color variables.
|
|
MemoryAccess base class was included from Core.h when it was a subclass
of ExecutorProcessControl, but this changed in
0faa181434cf959110651fe974bef31e7390eba8
|
|
|
|
|
|
We've upgraded to LLVM 22 now, so we can remove a bunch of TODOs.
|
|
This documents two things:
* The recommended way to go about adding a new pass.
* The criteria for enabling a pass.
RFC: https://discourse.llvm.org/t/rfc-guidelines-for-adding-enabling-new-passes/88290
|
|
Allow the stack move optimization (which merges two allocas) when the
address of only one alloca is captured (and the provenance is not
captured). Both addresses need to be captured to observe that the
allocas were merged.
Fixes https://github.com/llvm/llvm-project/issues/165484.
|
|
Follow on from #164372
This changes the DW_AT_name for `_BitInt(N)` from `_BitInt` to `_BitInt(N)`
|
|
|
|
|
|
module (#165535)
This commit extracts some MIR-related code from `common.py` and
`update_mir_test_checks.py` into a dedicated `mir.py` module to improve
code organization. This is a preparation step for
https://github.com/llvm/llvm-project/pull/164965 and also moves some
pieces already moved by https://github.com/llvm/llvm-project/pull/140296
All code intentionally moved verbatim with minimal necessary
adaptations:
* `log()` calls converted to `print(..., file=sys.stderr)` at `mir.py`
lines 62, 64 due to a `log` locality.
|
|
In `TextDiagnostic.cpp`, we're using column- and byte indices
everywhere, but we were using integers for them which made it hard to
know what to pass where, and what was produced. To make matters worse,
that `SourceManager` considers a "column" is a byte in `TextDiagnostic`.
Add `Bytes` and `Columns` structs, which are not related so API using
them can differentiate between values interpreted columns or bytes.
|
|
This patch addresses two use-after-move issues:
1. `Timing.cpp` A variable was std::moved and then immediately passed to
an `assert()` check. Since the moved-from state made the assertion
condition trivially true, the check was effectively useless. The
`assert()` is removed.
2. `Query.cpp` The `matcher` object was moved-from and then subsequently
used as if it still retained valid state. The fix ensures no subsequent
use for the moved-from variable.
Testing:
`ninja check-mlir`
|
|
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.
see the PR:https://github.com/llvm/llvm-project/pull/116953
|
|
|
|
Refine logic to scalarize interleave group member: only skip
scalarization overhead for member being used as addresses. For others,
use the regular scalar memory op cost.
This currently doesn't trigger in practice as far as I could find, but
fixes a potential divergence between VPlan- and legacy cost models.
It fixes a concrete divergence with a follow-up patch,
https://github.com/llvm/llvm-project/pull/161276.
|
|
|
|
Add structural type conversion patterns for CF dialect ops. These
patterns are similar to the SCF structural type conversion patterns.
This commit adds missing functionality and is in preparation of #165180,
which changes the way blocks are converted. (Only entry blocks are
converted.)
|
|
Haiku requires linking in libnetwork.
Co-authored-by: Jérôme Duval <jerome.duval@gmail.com>
|
|
|
|
Local is already of type unsigned.
|
|
Identified with readability-duplicate-include.
|
|
Extends OpenACCSupport utilities to include recipe name generation and
error reporting for unsupported features, providing foundation for
variable privatization handling.
Changes:
- Add RecipeKind enum (private, firstprivate, reduction) for APIs that
request a specific kind of recipe
- Add getRecipeName() API to OpenACCSupport and OpenACCUtils that
generates recipe names from types (e.g.,
"privatization_memref_5x10xf32_")
- Add emitNYI() API to OpenACCSupport for graceful handling of
not-yet-implemented cases
- Generalize MemRefPointerLikeModel template to support
UnrankedMemRefType
- Add unit tests and integration tests for new APIs
|
|
CreateProcess fails with ERROR_INVALID_PARAMETER when duplicate HANDLEs
are passed via `PROC_THREAD_ATTRIBUTE_HANDLE_LIST`. This can happen, for
example, if stdout and stdin are the same device (e.g. a bidirectional
named pipe), or if stdout and stderr are the same device.
Fixes https://github.com/msys2/MINGW-packages/issues/26030
|
|
|
|
The CRC optimization relies on stripping the auxiliary data completely,
and should hence be forbidden when it has a user in the exit-block.
Forbid this case, fixing a miscompile.
Fixes #165382.
|
|
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.
Custom lowering and promotion are set for some FP16 strict ops to work
correctly.
This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
|
|
Fix getShadowAddress computation by adding ShadowBase if it is not zero.
Co-authored-by: anoopkg6 <anoopkg6@github.com>
|
|
And reduce the number of getLLVMStyleWithColumnLimit calls.
|
|
We currently use a background thread to read the DAP output. This means
the test thread and the background thread can race at times and we may
have inconsistent timing due to these races.
To improve the consistency I've removed the reader thread and instead
switched to using the `selectors` module that wraps `select` in a
platform independent way.
|
|
This pr introduces an allow-list for module metadata, this encompasses
the llvm metadata nodes: `llvm.ident` and `llvm.module.flags`, as well
as, the generated `dx.` options.
Resolves: #164473.
|
|
When we create a `SparseIterator`, we sometimes wrap it in a
`FilterIterator`, which delegates _some_ calls to the underlying
`SparseIterator`.
After construction, e.g. in `makeNonEmptySubSectIterator()`, we call
`setSparseEmitStrategy()`. This sets the strategy only in one of the
filters -- if we call `setSparseEmitStrategy()` immediately after
creating the `SparseIterator`, then the wrapped `SparseIterator` will
have the right strategy, and the `FilterIterator` strategy will be
unintialized; if we call `setSparseEmitStrategy()` after wrapping the
iterator in `FilterIterator`, then the opposite happens.
If we make `setSparseEmitStrategy()` a virtual method so that it's
included in the `FilterIterator` pattern, and then do all reads of
`emitStrategy` via a virtual method as well, it's pretty simple to
ensure that the value of `strategy` is being set consistently and
correctly.
Without this, the UB of strategy being uninitialized manifests as a
sporadic test failure in
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_strided_conv_2d_nhwc_hwcf.mlir,
when run downstream with the right flags (e.g. asan + assertions off).
The test sometimes fails with `ne_sub<trivial<dense[0,1]>>.begin' op
created with unregistered dialect`. It can also be directly observed w/
msan that this uninitialized read is the cause of that issue, but msan
causes other problems w/ this test.
|
|
|