| Age | Commit message (Collapse) | Author |
|
Currently, -gsplit-dwarf and -mrelax are incompatible options in
Clang. The issue is that .dwo files should not contain any
relocations, as they are not processed by the linker. However,
relaxable code emits relocations in DWARF for debug ranges that
reside in the .dwo file when DWARF fission is enabled.
This patch makes DWARF fission compatible with RISC-V relaxations.
It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which
allow referencing addresses from .debug_addr instead of using
absolute addresses. This approach eliminates relocations from .dwo
files.
|
|
|
|
explain more about use-after-free in llvm-twine-local
add note about manually adjusting code after applying fix-it.
fixed: #154810
|
|
Dummy variables have an entry in `Program::Globals`, but they are not
added to `GlobalIndices`. When registering redeclarations, we used to
only patch up the global indices, but that left the dummy variables
alone. Update the dummy variables of all redeclarations as well.
Fixes https://github.com/llvm/llvm-project/issues/165952
|
|
(#166370)
The `FEnvSafeTest.cpp` test fails on AArch64 soft nofp configurations
because LLVM libc does not provide a floating-point environment in these
configurations.
This patch adds another preprocessor guard on `__ARM_FP` to disable the
test on those.
|
|
And recurse into records properly.
|
|
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.
Patch by Yafet Beyene alulayafet@gmail.com
|
|
Now that the `SourceManager::getExpansionLoc` and
`SourceManager::getSpellingLoc` functions are efficient, delete
unnecessary code duplicate in `SourceManager::getDecomposedExpansionLoc`
and `SourceManager::getDecomposedSpellingLoc` methods.
|
|
If there are additional uses of the bit twiddled value as well as the
rmw store, we can replace them with a (re)loaded copy of the full width
integer value after the store.
There's some memory op chain handling to handle here - the additional
(re)load is chained after the new store and then any dependencies of the
original store are chained after the (re)load.
|
|
|
|
Reverts llvm/llvm-project#166210
Buildbot failures in the libc on GPU bot:
https://lab.llvm.org/buildbot/#/builders/10/builds/16711
|
|
This is a follow up of PR #165558. (1/n)
This patch updates the below mbarrier Ops to use
AnyTypeOf[] construct:
```
* mbarrier.arrive
* mbarrier.arrive.noComplete
* mbarrier.test.wait
* cp.async.mbarrier.arrive
```
* Updated existing tests accordingly.
* Verified locally that there are no new regressions in the `integration` tests.
* TODO: Two more Ops remain and will be migrated in a subsequent PR.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
|
|
This patch enables compile-time evaluation of AVX512 permutex2var
intrinsics in constexpr contexts.
Extend shuffle generic to handle both integer immediate and vector mask
operands.
Resolves #161335
|
|
attribute lists. NFC. (#166523)
Makes it easier to compare constexpr/non-constexpr attribute defines
Allows clang-format to pack the attributes more efficiently
|
|
Split off from PR #163525, this standalone patch replaces simple cases
where undef is used as a value for arithmetic or getelementptr
instructions. This will reduce the likelihood of contributors hitting
the `undef deprecator` warning in github.
|
|
bodies (#166335)
Since commit 842622bf8bea782e9d9865ed78b0d8643f098122 adding support for
overloading interface methods, a `using` directive is emitted for any
interface method that does not require emission of a trait method,
including for methods that define a method body.
However, methods directly specifying a body (e.g., via the `methodBody`
parameter of `InterfaceMethod`) are implemented directly in the
interface class and are therefore not present in the associated trait.
The generated `using` directive then referes to a non-existent method of
the trait, resulting in an error upon compilation of the generated code.
This patch changes `DefGen::emitTraitMethods()`, such that
`genTraitMethodUsingDecl()` is not invoked for interface methods with a
body anymore.
|
|
parameter is non-const (#166102)
This patch enables `FoldOpIntoSelect` and `foldOpIntoPhi` for the cases
when Op's second parameter is a non-constant.
It doesn't seem to bring significant improvements, but the compile
time impact is neglegable.
|
|
There doesn't seem much of a reason why this should be a struct. Make it
a namespace instead.
|
|
`bugprone-std-namespace-modification` (#165659)
Closes [#157290](https://github.com/llvm/llvm-project/issues/157290)
|
|
|
|
|
|
This otherwise happens in ParseCaseExpression.
If we don't call this, we don't perform the usual arithmetic
conversions, etc.
|
|
Correct a typo in the triple that is used for the test. Because the OS
was not recognised, it would fall to the non-Windows code generation.
|
|
This allows more accurate alias analysis to apply at the bundle level.
This has a bunch of minor effects in post-RA scheduling that look mostly
beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic).
The pre-existing (and unchanged) test in
CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a
bundle with MMOs can be parsed successfully.
v2:
- use cloneMergedMemRefs
- add another test to explicitly check the MMO bundling behavior
v3:
- use poison instead of undef to initialize the global variable in the
test
|
|
Fixes https://github.com/llvm/llvm-project/issues/114402.
This patch accept empty enum in C as a microsoft extension and introduce
an new warning `-Wmicrosoft-empty-enum`.
---------
Signed-off-by: yicuixi <qin_17914@126.com>
Co-authored-by: Erich Keane <ekeane@nvidia.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
|
|
Forked from llvm/test/CodeGen/X86
|
|
Avoid copying machine operands.
Signed-off-by: John Lu <John.Lu@amd.com>
|
|
This reverts commit 93e860e694770f52a9eeecda88ba11173c291ef8.
hasOneUse still has the null check, and it seems bad to be logically
inconsistent across multiple of these predicate functions.
|
|
(#131759)
This isn't really the right check, we want to know that the intrinsic
does not perform a true function call to any code (in the module or
not). nocallback
appears to be the closest thing to this property we have now though.
Fixes theoretically
miscompiles with intrinsics like statepoint, which hide a call to a real
function.
Also do the same for inferring no-agpr usage.
|
|
Folding a mask to a variable shift pair results in better code size as
long as they are scalars that are <= XLen.
Similar to https://github.com/llvm/llvm-project/pull/158069
|
|
This is a bugfix in rematerialization where the liveness of subreg mask
was incorrectly updated causing crash in scheduler.
|
|
This generalizes `handleVectorPmaddIntrinsic()`:
- potentially handle floating-point type intrinsics (e.g.,
`llvm.x86.avx512bf16.dpbf16ps.512`). This usage is not enabled yet.
- "multiplication with an initialized zero guarantees that the
corresponding output becomes initialized" is now gated by a parameter
|
|
|
|
in WgToSg Pass (#165307)
This PR does the following:
1. Handle order attribute during the delinearization from linear
subgroup Id to multi-dim id.
2. Adds a transformation pattern for vector.transpose in wg to sg pass.
3. Updates CHECKS in the wg to sg tests
|
|
We are missing a couple of cases were we are not supposed to ignore
assignment results but did so, which results in compiler crashes. Fix
that.
Also start ignoring IgnoredExprs unless there's side effects (assignments) inside.
|
|
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.
CALL and RET_CALL do not have a description in td files, and it is not
currently possible to add one as these nodes have both variable operands
and variable results.
This also fixes a subtle bug detected by the enabled verification
functionality. `LOCAL_GET` is declared with `SDNPHasChain` property, and
thus should have both a chain operand and a chain result. The original
code created a node without a chain result, which caused a check in
`SDNodeInfo::verifyNode()` to fail.
Part of #119709.
Pull Request: https://github.com/llvm/llvm-project/pull/166259
|
|
fixes #165663
The bug was that we were using the initalizer lists index to populate
the matrix. This meant that [0..n] would coorelate to [0..n] indicies of
the flattened matrix. Hence why we were seeing the Row-major order: [ 0
1 2 3 4 5 ]. To fix this we can simply converted these indicies to the
Column-major order: [ 0 3 1 4 2 5 ].
The net effect of this is the layout of the matrix is now correct and we
don't need to change the MatrixSubscriptExpr indexing scheme.
---------
Co-authored-by: Deric C. <cheung.deric@gmail.com>
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
|
|
|
|
The stripping of the notes was done on a line-by-line basis which was
fragile and led to remove empty lines everywhere in the file. Instead we
can strip it as a single block before splitting the input into multiple
lines.
|
|
Just move all CUDA related intrinsics lowering to a separate file to
avoid clobbering the main Fortran intrinsic file.
|
|
Currently MachineVerifier is missing verifying early-clobber operand
constraint.
The only other machine operand constraint - TiedTo is already verified.
|
|
This is in preparation of a future AMDGPU change where we are going to
create bundles before register allocation and want to rely on the
TwoAddressInstructionPass handling those bundles correctly.
v2:
- simplify the virtual register check and the test
|
|
Address spaces 10 and 11 are reserved for future use in the sense that
we plain to upstream their use.
Address space 12 is used by LLPC. It is used in a workaround for an
issue with SMEM accesses to PRT buffers that is specific to the LLPC
ecosystem and makes no sense to upstream.
|
|
Simplifies the code a bit.
|
|
(#164476)
This patch addresses the profile of 2 branches:
- one that compares the 2 limits, for which we have no information (the C1, C2, see https://reviews.llvm.org/D136233)
- one that is conditioned on a condition for which we have a profile, so we reuse it
Issue #147390
|
|
|
|
BOLT expects pre-aggregated profile entries to be unique, which holds
for externally aggregated traces (or branches+fall-through ranges).
Therefore, BOLT doesn't merge duplicate entries for faster processing.
However, such traces are not expressly prohibited and could come from
concatenated pre-aggregated profiles or otherwise.
Relax the assumption about no duplicate (branch-only) traces in fall-
through imputing.
Test Plan: updated callcont-fallthru.s
|
|
The patch #166382 fixed most of these, but missed the fprintf_test ones.
|
|
4776451693f4a6bd18e50106edb4b3cfa766484f broke this because it started
running an existing pass using the NewPM, which caused ProfCheck to
catch existing issues. Disable it for now because we have not started
looking at anything in the Codegen pipeline. This pass is also only
enabled at O0 or if a function has optnone, so not super critical.
|
|
We don't need to have extra allocations when concatenating raw bodies.
|