| Age | Commit message (Collapse) | Author |
|
This was missed in 0ef522ff68fff4266bf85e7b7a507a16a8fd34ee
|
|
I think we need to keep the SelectionDAG code for volatile load/store so
we should support 4 byte alignment when possible.
|
|
(#168661)"
This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.
This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.
|
|
This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.
|
|
This improves type safety and is less verbose. Use SimpleTy only where
an integer is needed like switches or emitting a VBR.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
|
|
While I am at it, this patch uses const l-value references for
std::shared_ptr. We don't need to increment the reference count by
passing std::shared_ptr by value.
Identified with llvm-use-ranges.
|
|
Identified with bugprone-unused-local-non-trivial-variable.
|
|
isUniformAcrossVFsAndUFs
Extract the PreservesUniformity logic from isSingleScalar into a shared
static helper function. Update isUniformAcrossVFsAndUFs to use this
logic for VPWidenRecipe and VPInstruction, so that any opcode that
preserves uniformity is considered uniform-across-vf-and-uf if its
operands are.
This unifies the uniformity checking logic and makes it easier to extend
in the future.
This should effectively by NFC currently.
|
|
- Support serialization of the number of allocated preload kernarg SGPRs
- Support serialization of the first preload kernarg SGPR allocated
Together they enable reconstructing correctly MIR with preload kernarg
SGPRs.
|
|
The GNU documentation is ambiguous about the version index for
unversioned undefined symbols. The current specification at
https://sourceware.org/gnu-gabi/program-loading-and-dynamic-linking.txt
defines VER_NDX_LOCAL (0) as "The symbol is private, and is not
available outside this object."
However, this naming is misleading for undefined symbols. As suggested
in
discussions, VER_NDX_LOCAL should conceptually be VER_NDX_NONE and apply
to unversioned undefined symbols as well.
GNU ld has used index 0 for unversioned undefined symbols both before
version 2.35 (see https://sourceware.org/PR26002) and in the upcoming
2.46 release (see https://sourceware.org/PR33577). This change aligns
with GNU ld's behavior by switching from index 1 to index 0.
While here, add a test to dso-undef-extract-lazy.s that undefined
symbols of index 0 in DSO are treated as unversioned symbols.
|
|
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction.
PR: https://github.com/llvm/llvm-project/pull/166099
|
|
If a threading path has cycles within it then the transformation is not
correct. This patch fixes a couple of cases that create such cycles.
Fixes https://github.com/llvm/llvm-project/issues/166868
|
|
(#169163)
Extends the `icmp(trunc(shl))` fold to handle any power of 2 constant as
the shift base, not just 1. This generalizes the following patterns by
adjusting the comparison offsets by `log2(Pow2)`.
```llvm
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u< N
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
; to
(trunc (Pow2 << Y) to iN) == 0 --> Y u>= N - log2(Pow2)
(trunc (Pow2 << Y) to iN) != 0 --> Y u< N - log2(Pow2)
(trunc (Pow2 << Y) to iN) == 2**C --> Y == C - log2(Pow2)
(trunc (Pow2 << Y) to iN) != 2**C --> Y != C - log2(Pow2)
```
Proof: https://alive2.llvm.org/ce/z/2zwTkp
|
|
For swift async code, we need to use a debug intrinsic that behaves like
an llvm.dbg.declare but can take any location type rather than just a
pointer or integer.
To solve this, a new debug instrinsic called llvm.dbg.declare_value has
been created, which behaves exactly like an llvm.dbg.declare but can
take non pointer and integer location types.
More information here:
https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269
This is the first patch as part of a stack of patches, with the one
succeeding it being: https://github.com/llvm/llvm-project/pull/168134
|
|
Function &F is the more standard abbreviation (~4000 uses in llvm versus
~300 uses).
|
|
supports -fbounds-safety (#169112)
This patch makes it possible to detect in LLDB shell and API tests if
`-fbounds-safety` is supported by the compiler used for testing. The
motivation behind this is to allow upstreaming
https://github.com/swiftlang/llvm-project/pull/11835 but with the tests
disabled in upstream because the full implementation of -fbounds-safety
isn't available in Clang yet.
For shell tests when -fbounds-safety is available the
`clang-bounds-safety` feature is available which means tests can be
annotated with `# REQUIRES: clang-bounds-safety`.
API tests that need -fbounds-safety support in the compiler can use the
new `@skipUnlessBoundsSafety` decorator.
rdar://165225507
|
|
|
|
This test checks that DSOMarkupPrinter::printDSOMarkup prints the module
and segment mappings, but that is only done if we can determine the GNU
build ID for the given object, and in many environments that is not
enabled by default (e.g. the FreeBSD Clang driver never enables it, and
various other Clang drivers only do so when ENABLE_LINKER_BUILD_ID is
opted into at configure time). GCC tends to enable it by default, and
many distributions enable it for Clang, so this has gone unnoticed for a
while, but this test has been failing on FreeBSD since its creation.
Fixes: 22b9404f09dc ("Optionally print symbolizer markup backtraces.")
See: https://github.com/llvm/llvm-project/issues/168891
|
|
Remove the SCEV-based check refined in
https://github.com/llvm/llvm-project/pull/156910, as InstSimplifyFolder
manages to simplify the generated code to false directly as well.
|
|
(NFC) (#156578)
Add detailed comments explaining why each function should/shouldn't be
unroll-and-jammed based on memory access patterns and dependencies.
Fix loop bounds to ensure array accesses are within array bounds:
* sub_sub_less: j starts from 1 (not 0) to ensure j-1 >= 0
* sub_sub_less_3d: k starts from 1 (not 0) to ensure k-1 >= 0
* sub_sub_outer_scalar: j starts from 1 (not 0) to ensure j-1 >= 0
|
|
Remove the Constraints class from dependence analysis as it is now unused.
|
|
truncation exists (#169022)
Fixes #169017
|
|
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"
The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.
|
|
of ExpandVariadic. (#168161)
This PR fixes the issue where profile metadata (`!prof`) is dropped from
the `VariadicWrapper` when `ExpandVariadics` runs in
`--expand-variadics-override=optimize` mode.
In optimize mode, the pass splits the original variadic function into
two parts:
- A **VariadicWrapper** (retaining the original name) that handles the
`va_list` setup.
- A **FixedArityReplacement** (new function) that contains the original
core logic.
During this process, the basic blocks and associated metadata are
spliced into the `FixedArityReplacement`. Consequently, the
`VariadicWrapper`—which serves as the entry point for callers—is left
without function entry count metadata.
This change explicitly copies the `MD_prof` metadata from the
`FixedArityReplacement` back to the `VariadicWrapper` after the split is
defined.
Co-authored-by: Jin Huang <jingold@google.com>
|
|
Global with invariant should be treated identically to
constant.
|
|
This works fine on main, but broke after a future patch.
|
|
only" (#169073)
Reverts llvm/llvm-project#168518
|
|
This PR adds a Load method for resources, which takes an additional
parameter by reference, status. It fills the status parameter with a 1
or 0, depending on whether or not the resource access was mapped.
CheckAccessFullyMapped is also added as an intrinsic, and called in the
production of this status bit.
Only addresses DXIL for the below issue:
https://github.com/llvm/llvm-project/issues/138910
Also only addresses the DXIL variant for the below issue:
https://github.com/llvm/llvm-project/issues/99204
|
|
Identified with modernize-loop-convert.
|
|
SI_SPILL/SI_RESTORE." (#169068)
PR causes build failures with expensive checks enabled
Reverts llvm/llvm-project#168546
|
|
chains (#168660)
Previously, the following:
%mul0 = mul nsw <8 x i32> %m00, %m01
%mul1 = mul nsw <8 x i32> %m10, %m11
%add0 = add <8 x i32> %mul0, splat (i32 32)
%add1 = add <8 x i32> %add0, %mul1
lowered to:
vsetivli zero, 8, e32, m2, ta, ma
vmul.vv v8, v8, v9
vmacc.vv v8, v11, v10
li a0, 32
vadd.vx v8, v8, a0
After this patch, now lowers to:
li a0, 32
vsetivli zero, 8, e32, m2, ta, ma
vmv.v.x v12, a0
vmadd.vv v8, v9, v12
vmacc.vv v8, v11, v10
Modeled on 0cc981e0 from the AArch64 backend.
C-code for the example case (`clang -O3 -S -mcpu=sifive-x280`):
```
int madd_fail(int a, int b, int * restrict src, int * restrict dst, int loop_bound) {
for (int i = 0; i < loop_bound; i += 2) {
dst[i] = src[i] * a + src[i + 1] * b + 32;
}
}
```
|
|
This is a workaround for the MSVC compiler, which attempts to generate a
copy assignment operator implementation for classes marked as
`__declspec(dllexport)`. Explicitly marking the copy assignment operator
as deleted works around the problem.
DevCom ticket:
https://developercommunity.microsoft.com/t/Classes-marked-with-__declspecdllexport/11003192
|
|
|
|
FileOffset::get already returns uint64_t.
Identified with readability-redundant-casting.
|
|
Identified with modernize-loop-convert.
|
|
Remove getSplitIteration.
A follow-up patch will also remove DVEntry::Splitable and Dependnece::isSplitable.
|
|
ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).
Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.
After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.
Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.
|
|
Some targets have a specific calling convention that should be used for
generated calls to runtime functions.
Pass that down and use it.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
|
|
Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.
Fixes: #168761
|
|
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
|
|
(#166170)
…nd update docs
|
|
|
|
This allows us to strip an unnecessary TypeSwitch.
|
|
Only apply forced instruction costs to recipes with underlying values to
match the legacy cost model. A VPlan may have a number of additional
VPInstructions without underlying values that are not considered for its
cost, and assigning forced costs to them would incorrectly inflate its
cost.
This fixes a cost divergence between legacy and VPlan-based cost models
with forced instruction costs.
PR: https://github.com/llvm/llvm-project/pull/168372
|
|
This patch enables the multi-group xnack replay mode by
configuring the hardware MODE register at kernel entry.
This aligns the hardware behavior with the compiler's
existing multi-group s_wait_xcnt insertion logic.
|
|
This patch replaces the delinearization function used in
LoopCacheAnalysis, switching from one that depends on type information
in GEPs to one that does not. Once this patch and
https://github.com/llvm/llvm-project/pull/161822 are landed, we can
delete `tryDelinearizeFixedSize` from Delienarization, which is an
optimization heuristic guided by GEP type information. After Polly
eliminates its use of `getIndexExpressionsFromGEP`, we will be able to
completely delete GEP-driven heuristics from Delinearization.
|
|
In 4 years the ELF debugger support plugin wasn't adapted to other
object formats or debugging approaches. After the renaming NFC in
https://github.com/llvm/llvm-project/pull/168343, this patch tailors the
plugin to ELF and section load-address patching. It allows removal of
abstractions and consolidate processing steps with the newly enabled
AllocActions from https://github.com/llvm/llvm-project/pull/168343.
The key change is to process debug sections in one place in a
post-allocation pass. Since we can handle the endianness of the ELF file
the single `visitSectionLoadAddresses()` visitor function now, we don't
need to track debug objects and sections in template classes anymore. We
keep using the `DebugObject` class and drop `DebugObjectSection`,
`ELFDebugObjectSection<ELFT>` and `ELFDebugObject`.
Furthermore, we now use the allocation's working memory for load-address
fixups directly. We can drop the `WritableMemoryBuffer` from the debug
object and most of the `finalizeWorkingMemory()` step, which saves one
copy of the entire debug object buffer. Inlining `finalizeAsync()` into
the pre-fixup pass simplifies quite some logic.
We still track `RegisteredObjs` here, because we want to free memory
once the corresponding code is freed. There will be a follow-up patch
that turns it into a dealloc action.
|
|
This PR adds hardware-measured latencies for all instructions defined in
Section 15 of the RVV specification: "Vector Mask Instructions" to the
SpacemiT-X60 scheduling model.
|
|
OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence"
as the associated code. What used to be "loop association" has become
"loop-nest association".
Rename Association::Loop to LoopNest, add Association::LoopSeq to
represent the "loop sequence" association.
Change the association of fuse from "block" to "loop sequence".
|