llvm-project.git/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp, branch users/mingmingl-llvm/samplefdo-profile-format

[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)

2025-08-27T11:43:52+00:00

First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.

I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.

It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.

I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.

[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)

2025-08-15T19:06:47+00:00

This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.

Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)

2025-08-13T07:41:44+00:00

Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746

[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)

2025-08-12T18:04:37+00:00

Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.

[WebAssembly] Add support for memcmp expansion (#148298)

2025-07-20T17:27:42+00:00

Fixes https://github.com/llvm/llvm-project/issues/61400

Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll

[TTI] Plumb CostKind through getPartialReductionCost (#144953)

2025-06-19T22:29:56+00:00

Purely for the sake of being idiomatic with other TTI costing routines,
no direct motivation beyond that.

[NFC] Use more isa and isa_and_nonnull instead dyn_cast for predicates (#137393)

2025-05-13T14:34:42+00:00

Also fix some typos in comments

---------

Co-authored-by: Mehdi Amini

[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631)

2025-05-01T14:55:08+00:00

This does not alter much at the moment, but allows const pointers to be
passed as Op0 and Op1, simplifying later patches

[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655)

2025-04-22T08:40:12+00:00

These are not diagnosed because implementations hide the methods of the base class rather than overriding them.
This works as long as a hiding function is callable with the same arguments as the same function from the base class.

Pull Request: https://github.com/llvm/llvm-project/pull/136655

[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575)

2025-04-21T18:42:40+00:00

The main change is making `thisT` method `const`, the rest of the
changes is fixing compilation errors (*).

(*) There are two tricky methods, `getVectorInstrCost()` and
`getIntImmCost()`.
They have several overloads; some of these overloads are typically
pulled in to derived classes using the `using` directive, and then
hidden by methods in the derived class.
The compiler does not complain if the hiding methods are not marked as
`const`, which means that clients will use the methods from the base
class. If after this change your target fails cost model tests, this
must be the reason. To resolve the issue you need  to make all hiding
overloads `const`. See the second commit in this PR.

Pull Request: https://github.com/llvm/llvm-project/pull/136575