<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[SystemZ] Remove incorrect areInlineCompatible hook (#152494)</title>
<updated>2025-08-08T08:06:19+00:00</updated>
<author>
<name>Nikita Popov</name>
<email>npopov@redhat.com</email>
</author>
<published>2025-08-08T08:06:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=18e4f775c33af123772409ffde69dd424b98814a'/>
<id>18e4f775c33af123772409ffde69dd424b98814a</id>
<content type='text'>
This reverts https://github.com/llvm/llvm-project/pull/132976.

The PR incorrectly claimed that this makes inlining more liberal,
referencing the string comparison in TargetTransformInfoImpl.h.

However, the implementation that actually applies is the one in
BasicTTIImpl.h, which performs a feature subset comparison. As such,
this regressed inlining, most concerningly of functions without +vector
into functions with +vector.

Revert the change to restore the previous behavior.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts https://github.com/llvm/llvm-project/pull/132976.

The PR incorrectly claimed that this makes inlining more liberal,
referencing the string comparison in TargetTransformInfoImpl.h.

However, the implementation that actually applies is the one in
BasicTTIImpl.h, which performs a feature subset comparison. As such,
this regressed inlining, most concerningly of functions without +vector
into functions with +vector.

Revert the change to restore the previous behavior.</pre>
</div>
</content>
</entry>
<entry>
<title>[CostModel] Add a DstTy to getShuffleCost (#141634)</title>
<updated>2025-06-21T11:29:29+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-06-21T11:29:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=77941eba7f01fc6576b3e060a3fb9cad1a64f9ea'/>
<id>77941eba7f01fc6576b3e060a3fb9cad1a64f9ea</id>
<content type='text'>
A shuffle will take two input vectors and a mask, to produce a new
vector of size &lt;MaskElts x SrcEltTy&gt;. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A shuffle will take two input vectors and a mask, to produce a new
vector of size &lt;MaskElts x SrcEltTy&gt;. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.</pre>
</div>
</content>
</entry>
<entry>
<title>[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631)</title>
<updated>2025-05-01T14:55:08+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-05-01T14:55:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=abd2c07e390c39830296ee70d4743663b02dc8df'/>
<id>abd2c07e390c39830296ee70d4743663b02dc8df</id>
<content type='text'>
This does not alter much at the moment, but allows const pointers to be
passed as Op0 and Op1, simplifying later patches</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This does not alter much at the moment, but allows const pointers to be
passed as Op0 and Op1, simplifying later patches</pre>
</div>
</content>
</entry>
<entry>
<title>[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830)</title>
<updated>2025-04-30T15:11:27+00:00</updated>
<author>
<name>Jonas Paulsson</name>
<email>paulson1@linux.ibm.com</email>
</author>
<published>2025-04-30T15:11:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=f5c8c1eedb44861cda8885a04813de999ca9d18a'/>
<id>f5c8c1eedb44861cda8885a04813de999ca9d18a</id>
<content type='text'>
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" `
changed SLPVectorizer getScalarizationOverhead() to call
TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some
cases. This was due to X86 specific handlings in these (overridden) methods,
and unfortunately the general preference of TTI.getScalarizationOverhead()
was dropped. If VL is available it should always be preferred to use
getScalarizationOverhead(), and this is indeed the case for SystemZ which
has a special insertion instruction that can insert two GPR64s.

Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with
codegen"` reworked SLPVectorizer getGatherCost() which together with
ad9909d caused the SystemZ test vec-elt-insertion.ll to fail.

This patch restores the SystemZ test and reverts the change in SLPVectorizer
getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always
called again. The ForPoisonSrc argument is now passed on to the TTI method
so that X86 can handle this as required.

Fixes: #135346</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" `
changed SLPVectorizer getScalarizationOverhead() to call
TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some
cases. This was due to X86 specific handlings in these (overridden) methods,
and unfortunately the general preference of TTI.getScalarizationOverhead()
was dropped. If VL is available it should always be preferred to use
getScalarizationOverhead(), and this is indeed the case for SystemZ which
has a special insertion instruction that can insert two GPR64s.

Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with
codegen"` reworked SLPVectorizer getGatherCost() which together with
ad9909d caused the SystemZ test vec-elt-insertion.ll to fail.

This patch restores the SystemZ test and reverts the change in SLPVectorizer
getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always
called again. The ForPoisonSrc argument is now passed on to the TTI method
so that X86 can handle this as required.

Fixes: #135346</pre>
</div>
</content>
</entry>
<entry>
<title>[SystemZ] Fix compile time regression in adjustInliningThreshold(). (#137527)</title>
<updated>2025-04-28T13:04:07+00:00</updated>
<author>
<name>Jonas Paulsson</name>
<email>paulson1@linux.ibm.com</email>
</author>
<published>2025-04-28T13:04:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=98b895da30c03b7061b8740d91c0e7998e69d091'/>
<id>98b895da30c03b7061b8740d91c0e7998e69d091</id>
<content type='text'>
Instead of always iterating over all GlobalVariable:s in the Module to
find the case where both Caller and Callee is using the same GV heavily,
first scan Callee (only if less than 200 instructions) for all GVs used
more than 10 times, and then do the counting for the Caller for just
those relevant GVs.

The limit of 200 instructions makes sense as this aims to inline a
relatively small function using a GV +10 times. 

This resolves the compile time problem with zig where it is on main
(compared to removing the heuristic) a 380% increase, but with this
change &lt;0.5% increase (total user compile time with opt).

Fixes #134714.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of always iterating over all GlobalVariable:s in the Module to
find the case where both Caller and Callee is using the same GV heavily,
first scan Callee (only if less than 200 instructions) for all GVs used
more than 10 times, and then do the counting for the Caller for just
those relevant GVs.

The limit of 200 instructions makes sense as this aims to inline a
relatively small function using a GV +10 times. 

This resolves the compile time problem with zig where it is on main
(compared to removing the heuristic) a 380% increase, but with this
change &lt;0.5% increase (total user compile time with opt).

Fixes #134714.
</pre>
</div>
</content>
</entry>
<entry>
<title>[CostModel] Remove optional from InstructionCost::getValue() (#135596)</title>
<updated>2025-04-23T06:46:27+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2025-04-23T06:46:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=98b6f8dc699d789d834e5b6d810ed217f560aad0'/>
<id>98b6f8dc699d789d834e5b6d810ed217f560aad0</id>
<content type='text'>
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.</pre>
</div>
</content>
</entry>
<entry>
<title>[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655)</title>
<updated>2025-04-22T08:40:12+00:00</updated>
<author>
<name>Sergei Barannikov</name>
<email>barannikov88@gmail.com</email>
</author>
<published>2025-04-22T08:40:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=3334c3597dd51f5a102e5005738e3bf4ef7530e2'/>
<id>3334c3597dd51f5a102e5005738e3bf4ef7530e2</id>
<content type='text'>
These are not diagnosed because implementations hide the methods of the base class rather than overriding them.
This works as long as a hiding function is callable with the same arguments as the same function from the base class.

Pull Request: https://github.com/llvm/llvm-project/pull/136655
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These are not diagnosed because implementations hide the methods of the base class rather than overriding them.
This works as long as a hiding function is callable with the same arguments as the same function from the base class.

Pull Request: https://github.com/llvm/llvm-project/pull/136655
</pre>
</div>
</content>
</entry>
<entry>
<title>[TTI] Make all interface methods const (NFCI) (#136598)</title>
<updated>2025-04-22T03:27:29+00:00</updated>
<author>
<name>Sergei Barannikov</name>
<email>barannikov88@gmail.com</email>
</author>
<published>2025-04-22T03:27:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0014b49482c0862c140149c650d653b4e41fa9b4'/>
<id>0014b49482c0862c140149c650d653b4e41fa9b4</id>
<content type='text'>
Making `TargetTransformInfo::Model::Impl` `const` makes sure all
interface methods are `const`, in `BasicTTIImpl`, its bases, and in all
derived classes.

Pull Request: https://github.com/llvm/llvm-project/pull/136598
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Making `TargetTransformInfo::Model::Impl` `const` makes sure all
interface methods are `const`, in `BasicTTIImpl`, its bases, and in all
derived classes.

Pull Request: https://github.com/llvm/llvm-project/pull/136598
</pre>
</div>
</content>
</entry>
<entry>
<title>[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575)</title>
<updated>2025-04-21T18:42:40+00:00</updated>
<author>
<name>Sergei Barannikov</name>
<email>barannikov88@gmail.com</email>
</author>
<published>2025-04-21T18:42:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e0c1e23b99e9719d0a01ab7dfc0807d891004bd4'/>
<id>e0c1e23b99e9719d0a01ab7dfc0807d891004bd4</id>
<content type='text'>
The main change is making `thisT` method `const`, the rest of the
changes is fixing compilation errors (*).

(*) There are two tricky methods, `getVectorInstrCost()` and
`getIntImmCost()`.
They have several overloads; some of these overloads are typically
pulled in to derived classes using the `using` directive, and then
hidden by methods in the derived class.
The compiler does not complain if the hiding methods are not marked as
`const`, which means that clients will use the methods from the base
class. If after this change your target fails cost model tests, this
must be the reason. To resolve the issue you need  to make all hiding
overloads `const`. See the second commit in this PR.

Pull Request: https://github.com/llvm/llvm-project/pull/136575
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The main change is making `thisT` method `const`, the rest of the
changes is fixing compilation errors (*).

(*) There are two tricky methods, `getVectorInstrCost()` and
`getIntImmCost()`.
They have several overloads; some of these overloads are typically
pulled in to derived classes using the `using` directive, and then
hidden by methods in the derived class.
The compiler does not complain if the hiding methods are not marked as
`const`, which means that clients will use the methods from the base
class. If after this change your target fails cost model tests, this
must be the reason. To resolve the issue you need  to make all hiding
overloads `const`. See the second commit in this PR.

Pull Request: https://github.com/llvm/llvm-project/pull/136575
</pre>
</div>
</content>
</entry>
<entry>
<title>Implement areInlineCompatible for SystemZ using feature bitset (#132976)</title>
<updated>2025-04-07T22:50:30+00:00</updated>
<author>
<name>Andres Chavarria</name>
<email>84650073+chavandres@users.noreply.github.com</email>
</author>
<published>2025-04-07T22:50:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=9b63a92ca723293dfe8570d1b2881ce949f1f6cc'/>
<id>9b63a92ca723293dfe8570d1b2881ce949f1f6cc</id>
<content type='text'>
## What?
Implement `areInlineCompatible` for the SystemZ target using
FeatureBitset comparison.

## Why?
The default implementation in `TargetTransformInfoImpl.h` makes a string
comparison and only inlines when the target-cpu and the target-features
for caller and callee are the same. We are missing out on optimizations
when the callee has a subset of features of the caller.

## How?
Get the FeatureBitset of the caller and callee and check when callee is
a subset or equal to the caller's features. It's a similar
implementation to ARM, PowerPC...

## Testing?
Test cases check for when the callee is a subset of the caller, when
it's not a subset and when both are equals.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
## What?
Implement `areInlineCompatible` for the SystemZ target using
FeatureBitset comparison.

## Why?
The default implementation in `TargetTransformInfoImpl.h` makes a string
comparison and only inlines when the target-cpu and the target-features
for caller and callee are the same. We are missing out on optimizations
when the callee has a subset of features of the caller.

## How?
Get the FeatureBitset of the caller and callee and check when callee is
a subset or equal to the caller's features. It's a similar
implementation to ARM, PowerPC...

## Testing?
Test cases check for when the callee is a subset of the caller, when
it's not a subset and when both are equals.</pre>
</div>
</content>
</entry>
</feed>
