| Age | Commit message (Collapse) | Author |
|
Need to check if the non-schedulable phi parent node has unique
operands, if the incoming node has copyables, and the node is
commutative. Otherwise, there might be issues with the correct
calculation of the dependencies.
Fixes #168589
|
|
The problem with the many def-use chain problems in SLP vectorizer are
related to the fact that some nodes reuse the same instruction as
insertion point. Insertion point is not the instruction, but the place
between instructions. To set it correctly, better to generate pseudo
instruction immediately after the last instruction, and use it as
insertion point. It resolves the issues in most cases.
Fixes #168512 #168576
|
|
Given a set of pointers, check if they can be rearranged as follows (%s is a constant):
%b + 0 * %s + 0
%b + 0 * %s + 1
%b + 0 * %s + 2
...
%b + 0 * %s + w
%b + 1 * %s + 0
%b + 1 * %s + 1
%b + 1 * %s + 2
...
%b + 1 * %s + w
...
If the pointers can be rearanged in the above pattern, it means that the
memory can be accessed with a strided loads of width `w` and stride `%s`.
|
|
- Remove file local functions out of `llvm` or anonymous namespace and
make them static.
- Use namespace qualifier to define `BoUpSLP` class and several template
specializations.
|
|
- Split from #165532. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.
API change:
```
- InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment,
- AddressSpace, CostKind);
+ InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes,
+ CostKind);
```
Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with the
same information as before.
- Follow-up: migrate gather/scatter, strided, and expand/compress cost
queries to the same attributes-based entry point.
|
|
|
|
The compiler should not consider split vectorize nodes, when checking
for non-schedulable PHI-based parent nodes. Only pure PHI nodes must be
considered, they only can be considered as explicit users, split nodes
are not.
Fixes #168268
|
|
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.
|
|
isCommutable"
This reverts commit ddf5bb0a2e2d2dd77bce66173387d62ab7174d9f to fix
buildbots https://lab.llvm.org/buildbot/#/builders/11/builds/28083.
|
|
Need to check if the non-copyable element is an instruction before actually
trying to check its NSW attribute.
|
|
Patch adds support for sub instructions as main instruction in copyables
elements. Also, adds a check if the base instruction is not profitable
for the selection if at least one instruction with the main opcode is
used as an immediate operand.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/163231
|
|
uses only
Need to be careful when trying to match and/or build copyable node with
the instructions, used outside the block only and if their operands
immediately precede such instructions. In this case insertion point
might be the same and it may cause broken def-use chain.
Fixes #167366
|
|
Identified with bugprone-unused-local-non-trivial-variable.
|
|
Identified with bugprone-unused-local-non-trivial-variable.
|
|
used outside of the block only
If the current node is a copyable node and its parent is copyable too
and still current node is only used outside, better to cancel scheduling
for such node, because otherwise there might be wrong def-use chain
built during vectorization.
Fixes #166775
|
|
a use in binop.
If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected.
Fixes #166035
|
|
If the laternate operation is more stricter than the main operation, we
cannot rely on the analysis of the main operation. In such case, better
to avoid doing the analysis at all, since it may affect the overall
result and lead to incorrect optimization
Fixes #165878
|
|
instruction
If the gather/buildvector node has the match and this matching node has
a scheduled copyable parent, and the parent node of the original node
has a last instruction, which is non-schedulable and is part of the
schedule copyable parent, such matching node should be excluded as
non-matching, since it produces wrong def-use chain.
Fixes #165435
|
|
Need to re-check the instruction with the non-schedulable parent, only
if this parent has a user phi node (i.e. it is used only outside the
block) and the user instruction has unique parent instruction.
Fixes issue reported in https://github.com/llvm/llvm-project/commit/20675ee67d048a42482c246e25b284637d55347c#commitcomment-168863594
|
|
If the instructions in the node do not require scheduling and used
outside basic block only, still need to check, if their operands are
non-inst too. Such nodes should be emitted in the beginning of the
block.
Fixes #165151
|
|
If the parent node is non-schedulable and it includes several copies of
the same instruction, its operand might be replaced by the copyable
nodes in multiple children nodes, and if the instruction is commutative,
they can be used in different operands. The compiler shall consider this
opportunity, taking into account that non-copyable children are
scheduled only ones for the same parent instruction.
Fixes #164242
|
|
This reverts commit e7f370f910701b6c67d41dab80e645227692c58b to fix
buildbots https://lab.llvm.org/buildbot/#/builders/213/builds/1056.
|
|
If the parent node is non-schedulable and it includes several copies of
the same instruction, its operand might be replaced by the copyable
nodes in multiple children nodes, and if the instruction is commutative,
they can be used in different operands. The compiler shall consider this
opportunity, taking into account that non-copyable children are
scheduled only ones for the same parent instruction.
Fixes #164242
|
|
If a main instruction in the copyables is a div-like instruction, the
compiler cannot pack duplicates, extending with poisons, these
instructions, being vectorize, will result in undefined behavior.
Fixes #164185
|
|
The compiler shall not check for overflow of the number of copyable
operands counter, otherwise non-copyable operand can be counted as
copyable and lead to a compiler crash.
Fixes #164164
|
|
Move the checks that all strides are the same from `isStridedLoad` to a
new function `analyzeConstantStrideCandidate`. This is to reduce the
diff for the following MRs which will modify the logic in
`analyzeConstantStrideCandidate` to cover the case of widening of the
strided load. All the checks that are left in `isStridedLoad` will be
reused.
|
|
outside the block
If the copyable entry has the last instruction, used only outside the
block, tha insert ion point for the vector code should be the last
instruction itself, not the following one. It prevents wrong def-use
sequences, which might be generated for the buildvector nodes.
Fixes #163404
|
|
There are 2 cases:
- either the `select` condition is a vector of bools, case in which we don't currently have a way to represent the per-element branch probabilities anyway;
- or the select condition is a scalar, for example from a `llvm.vector.reduce`. We could potentially try and do more here - if the reduced vector contained conditions from other selects, for instance
In either case, IIUC, chances are the `select` doesn't get lowered to a branch, at least I'm not seeing any evidence of that in an internal complex application (CSFDO + ThinLTO). Seems sufficient to mark the selects are unknown (for profiled functions); since that metadata carries with it the pass name (`DEBUG_TYPE`) that marked it as such, we can revisit this if we detect later lowerings of these selects that would have required an actual profile.
Issue #147390
|
|
Allows to use And, Or and Xor instructions as base for copyables.
|
|
Need to insert the vector value for the postponed gather/buildvector
node after all uses non only if the vector value of the user node is
phi, but also if the user node itself is PHI node, which may produce
vector phi + shuffle.
Fixes #162799
|
|
If the non-commutative user has several same operands and at least one
of them (but not the first) is copyable, need to consider this
opportunity when calculating the number of dependencies. Otherwise, the
schedule bundle might be not scheduled correctly and cause a compiler
crash
Fixes #162925
|
|
Undefs/poisons with divs in vector operations lead to undefined
behavior, disabling this combination
Fixes #162663
|
|
Add the `analyzeRtStrideCandidate` function. In the future commits we're
going to add the capability to widen strided loads to it. So, in this
commit, we move the size / type checks into it, since it can possibly
change size / type of load.
|
|
Allow SDiv/UDiv as a main operation in copyables support
|
|
Enables Shl matching for the nodes, where copyable can be modelled as
shl %v, 0
|
|
This is needed to reduce the diff for the future work on widening
strided loads. Also, with this change we'll be able to re-use this for
the case when each pointer represents a start of a group of contiguous
loads.
|
|
We need to clear `TreeEntryToStridedPtrInfoMap` in `deleteTree`.
|
|
Xor with 0 operand should not be compatible with multiplications-based
instructions, only with or/xor/add/sub.
Fixes #161140
|
|
Need to find the last insertelement instruction in the list for the
copyable arguments, otherwise wrong def-use chain may be built
Fixes #160671
|
|
Move size checks inside `isStridedLoad`. In the future we plan to
possibly change the size and type of strided load there.
|
|
operands were copyable
If all operands of the non-schedulable nodes were previously only
copyables, need to clear the dependencies of the original schedule data
for such copyable operands and recalculate them to correctly handle
number of dependecies.
Fixes #159406
|
|
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
|
|
In order to avoid recalculating stride of strided load twice save it in
a map.
|
|
If the commutable instruction can be represented as a non-commutable
vector instruction (like add 0, %v can be represented as a part of sub
nodes with operation sub %v, 0), its operands might still be reordered
and this should be accounted when checking for copyables in operands
Fixes #158293
|
|
Add a test to generate -1 stride load and flags to force this behaviour.
|
|
|
|
If the original instruction is going to be scheduled after same
instruction being scheduled as copyable, need to recalculate
dependencies. Otherwise, the dependencies maybe calculated incorrectly.
|
|
bitcast to float
If the user node of the SExt/ZExt node is a bitcast to a float point
type, the node itself should not be considered legal to demote, since
still the casting is required to match the size of the float point type.
Fixes #157277
|
|
If a standalone schedule data relates to a vectorized instruction, still
need to schedule it as a part of pseudo-bundle to correctly handle
dependencies between its child nodes.
|