llvm-project.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2025-11-19	[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)	Nishant Patel
	This PR adds pattern for unrolling shape_cast given a targetShape. This PR is a follow up of #164010 which was very general and was using inserts and extracts on each element (which is also LowerVectorShapeCast.cpp is doing). After doing some more research on use cases, we (me and @Jianhui-Li ) realized that the previous version in #164010 is unnecessarily generic and doesn't fit our performance needs. Our use case requires that targetShape is contiguous in both source and result vector. This pattern only applies when contiguous slices can be extracted from the source vector and inserted into the result vector such that each slice remains in vector form with targetShape (and not decompose to scalars). In these cases, the unrolling proceeds as: vector.extract_strided_slice -> vector.shape_cast (on the slice unrolled) -> vector.insert_strided_slice
2025-09-16	[mlir][Vector] Add patterns to lower `vector.shuffle` (#157611)	Diego Caballero
	This PR adds patterns to lower `vector.shuffle` with inputs with different vector sizes more efficiently. The current LLVM lowering for these cases degenerates to a sequence of `vector.extract` and `vector.insert` operations. With this PR, the smaller input is promoted to larger vector size by introducing an extra `vector.shuffle`.
2025-09-16	[MLIR][Vector] Add unrolling pattern for vector StepOp (#157752)	Nishant Patel
	This PR adds unrolling pattern for vector.step op to VectorUnroll transform.
2025-09-15	[mlir][vector] Tidy-up testing for to/from_elements unrolling (#158309)	Andrzej Warzyński
	1. Remove `TestUnrollVectorToElements` and `TestUnrollVectorFromElements` test passes - these are not required. 2. Make "vector-from-elements-lowering.mlir" use TD Op for testing (for consistency "vector-to-elements-lowering.mlir" and to make sure that the TD Op, `transform.apply_patterns.vector.unroll_from_elements`, is tested). 3. Unify `CHECK` prefixes (`CHECK-UNROLL` -> `CHECK`). 4. Rename `@to_elements_1d` as `@negative_unroll_to_elements_1d`, for consistency with it's counterpart for `vector.from_elements` and to align with our testing guide (). () https://mlir.llvm.org/getting_started/TestingGuide/#after-step-3-add-the-newly-identified-missing-case
2025-09-12	[mlir][vector] Add a new TD op to wrap unit-dim collapsing patterns (#157507)	Andrzej Warzyński
	Adds a new TD Op, * `apply_patterns.vector.drop_inner_most_unit_dims_from_xfer_ops`, which wraps the following Vector patterns: * `DropInnerMostUnitDimsTransferRead` * `DropInnerMostUnitDimsTransferWrite` This complements other existing unit-dimension–related patterns. To reduce duplication, the `TestVectorTransferCollapseInnerMostContiguousDims` pass has been removed. That pass was only used for testing, and its functionality is now covered by the newly added TD Op.
2025-09-11	[mlir][vector] Add vector.to_elements unrolling (#157142)	Erick Ochoa Lopez
	This PR adds support for unrolling `vector.to_element`'s source operand. It transforms ```mlir %0:8 = vector.to_elements %v : vector<2x2x2xf32> ``` to ```mlir %v0 = vector.extract %v[0] : vector<2x2xf32> from vector<2x2x2xf32> %v1 = vector.extract %v[1] : vector<2x2xf32> from vector<2x2x2xf32> %0:4 = vector.to_elements %v0 : vector<2x2xf32> %1:4 = vector.to_elements %v1 : vector<2x2xf32> // %0:8 = %0:4 - %1:4 ``` This pattern will be applied until there are only 1-D vectors left. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-08-18	[mlir][vector] Support multi-dimensional vectors in ↵	Yang Bai
	VectorFromElementsLowering (#151175) This patch introduces a new unrolling-based approach for lowering multi-dimensional `vector.from_elements` operations. Implementation Details: 1. New Transform Pattern: Added `UnrollFromElements` that unrolls a N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the outermost dimension. 2. Utility Functions: Added `unrollVectorOp` to reuse the unroll algo of vector.gather for vector.from_elements. 3. Integration: Added the unrolling pattern to the convert-vector-to-llvm pass as a temporal transformation. 4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency in `VectorFromElementsLowering`. Example: ```mlir // unroll %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32> %vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32> %vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32> %result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32> // convert-vector-to-llvm %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>> %poison_1d_0 = llvm.mlir.poison : vector<2xf32> %c0_0 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32> %c1_0 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32> %vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>> %poison_1d_1 = llvm.mlir.poison : vector<2xf32> %c0_1 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32> %c1_1 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32> %vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>> %result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32> ``` --------- Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com> Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: James Newling <james.newling@gmail.com> Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-07-25	[mlir][NFC] update `mlir` create APIs (34/n) (#150660)	Maksim Levental
	See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25	[mlir][NFC] update `mlir/Dialect` create APIs (28/n) (#150641)	Maksim Levental
	See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-02	[mlir][vector][nfc] Rename ↵	Andrzej Warzyński
	`populateVectorTransferCollapseInnerMostContiguousDimsPatterns` (#145228) Renames `populateVectorTransferCollapseInnerMostContiguousDimsPatterns` as `populateDropInnerMostUnitDimsXferOpPatterns` + updates the corresponding comments. This addresses a TODO and makes the difference between these two `populate` methods clearer: `populateDropUnitDimWithShapeCastPatterns`, * `populateDropInnerMostUnitDimsXferOpPatterns`.
2025-06-20	[mlir][vector] Add unroll patterns for vector.load and vector.store (#143420)	Nishant Patel
	This PR adds unroll patterns for vector.load and vector.store. This PR is follow up of #137558
2025-06-05	[mlir][vector] add unroll pattern for broadcast (#142011)	Chao Chen
	This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. To support this, it also extends `BroadcastOp` definition with `VectorUnrollOpInterface`
2025-05-23	[mlir] Use llvm::any_of (NFC) (#141317)	Kazu Hirata

2025-05-15	[mlir][vector] Address linearization comments (post commit) (#138075)	James Newling
	This PR adds some documentation to address comments in https://github.com/llvm/llvm-project/pull/136581 This PR adds a test for linearization across scf.for. This new test might be considered redundant by more experienced MLIRers, but might help newer users understand how to linearize scf/cf/func operations easily The documentation added in this PR also tightens our definition of linearization, to now exclude unrolling (which creates multiple ops from 1 op). We hadn't really specified what linearization meant before.
2025-05-14	[mlir] [vector] Add linearization pattern for vector.create_mask (#138214)	Nishant Patel
	This PR is a breakdown [3 / 4] of the PR #136193 The PR adds linearization patterns for vector.create_mask
2025-05-07	Revert "[mlir][MemRef] Remove integer address space builders" (#138853)	Mehdi Amini
	Reverts llvm/llvm-project#138579 An integration test is broken on the mlir-nvidia* bots.
2025-05-06	[mlir][MemRef] Remove integer address space builders (#138579)	Krzysztof Drewniak
	The forms of the MemRef builder that took an integer argument instead of an attribute have been deprecated for years now, and have almost no upstream uses (the remaining ones are handled in this PR). Therefore, remove them.
2025-04-30	[mlir][vector] Linearization: push 'bit width' logic out of patterns (#136581)	James Newling
	[NFC] Vector linearization is a collection of rewrite patterns that reduce the rank of vector operands and results. In https://github.com/llvm/llvm-project/pull/83314 an option to ignore (make 'legal') operations with large inner-most dimensions was added. This current PR is a step towards making that option live outside of upstream MLIR. The motivation is to remove non-core functionality (I would like to use this pass, but would prefer not to deal with 'targetVectorBitWidth` at all). As a follow-up to this PR, I propose that user(s) of the `targetVectorBitWidth` move the relevant code (now in mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp) to their code bases, and then eventually remove it from upstream. In addition the tests need to split out (I've intentionally not modified the lit tests here, to make it easier to confirm that this is a NFC). I'm happy to help make it easier to do this final step! The approach I've used is to move the logic pertaining to `targetVectorBitWidth` out the patterns, and into the conversion target, which the end user can control outside of core MLIR.
2025-04-22	[mlir][vector] Sink vector.extract/splat into load/store ops (#134389)	Ivan Butygin
	``` vector.load %arg0[%arg1] : memref<?xf32>, vector<4xf32> vector.extract %0[1] : f32 from vector<4xf32> ``` Gets converted to: ``` %c1 = arith.constant 1 : index %0 = arith.addi %arg1, %c1 overflow<nsw> : index %1 = memref.load %arg0[%0] : memref<?xf32> ``` ``` %0 = vector.splat %arg2 : vector<1xf32> vector.store %0, %arg0[%arg1] : memref<?xf32>, vector<1xf32> ``` Gets converted to: ``` memref.store %arg2, %arg0[%arg1] : memref<?xf32> ```
2025-03-24	[mlir][vector] Decouple unrolling gather and gather to llvm lowering (#132206)	Kunwar Grover
	This patch decouples unrolling vector.gather and lowering vector.gather to llvm.masked.gather. This is consistent with how vector.load, vector.store, vector.maskedload, vector.maskedstore lower to LLVM. Some interesting test changes from this patch: - 2D vector.gather lowering to llvm tests are deleted. This is consistent with other memory load/store ops. - There are still tests for 2D vector.gather, but the constant mask for these test is modified. This is because with the updated lowering, one of the unrolled vector.gather disappears because it is masked off (also demonstrating why this is a better lowering path) Overall, this makes vector.gather take the same consistent path for lowering to LLVM as other load/store ops. Discourse Discussion: https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
2025-03-20	[mlir] Use *Set::insert_range (NFC) (#132326)	Kazu Hirata
	DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2024-12-20	[mlir] Enable decoupling two kinds of greedy behavior. (#104649)	Jacques Pienaar
	The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.
2024-12-01	[mlir] Fix typo in test vector transform pass descriptions (#118194)	Kai Sasaki
	Fix some typos in the description of vector transform passes.
2024-11-22	[MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994)	Petr Kurapov
	Please see the related RFC here: https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989. This patch does exactly one thing - moves the op to gpu.
2024-10-09	[MLIR] Vector: turn the ExtractStridedSlice rewrite pattern from #111541 ↵	Benoit Jacob
	into a canonicalization (#111614) This is a reasonable canonicalization because `extract` is more constrained than `extract_strided_slices`, so there is no loss of semantics here, just lifting an op to a special-case higher/constrained op. And the additional `shape_cast` is merely adding leading unit dims to match the original result type. Context: discussion on #111541. I wasn't sure how this would turn out, but in the process of writing this PR, I discovered at least 2 bugs in the pattern introduced in #111541, which shows the value of shared canonicalization patterns which are exercised on a high number of testcases. --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2024-10-08	[mlir][vector] Add pattern to rewrite contiguous ExtractStridedSlice into ↵	Benoit Jacob
	Extract (#111541) Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-08-16	[mlir][vector] Group re-order patterns together (#102856)	Andrzej Warzyński
	Group all patterns that re-order vector.transpose and vector.broadcast Ops () under `populateSinkVectorOpsPatterns`. These patterns are normally used to "sink" redundant Vector Ops, hence grouping together. Example: ```mlir %at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32> %bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32> %r = arith.addf %at, %bt : vector<2x4xf32> ``` would get converted to: ```mlir %0 = arith.addf %a, %b : vector<4x2xf32> %r = vector.transpose %0, [1, 0] : vector<2x4xf32> ``` This patch also moves all tests for these patterns so that all of them are: run under one test-flag: `test-vector-sink-patterns`, * located in one file: "vector-sink.mlir". To facilitate this change: * `-test-sink-vector-broadcast` is renamed as `test-vector-sink-patterns`, * "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir", * tests for `ReorderCastOpsOnBroadcast` and `ReorderElementwiseOpsOnTranspose` patterns are moved from "vector-reduce-to-contract.mlir" to "vector-sink.mlir", * `ReorderElementwiseOpsOnTranspose` patterns are removed from `populateVectorReductionToContractPatterns` and added to (newly created) `populateSinkVectorOpsPatterns`, * `ReorderCastOpsOnBroadcast` patterns are removed from `populateVectorReductionToContractPatterns` - these are already present in `populateSinkVectorOpsPatterns`. This should allow us better layering and more straightforward testing. For the latter, the goal is to be able to easily identify which pattern a particular test is exercising (especially when it's a specific pattern). NOTES FOR DOWNSTREAM USERS In order to preserve the current functionality, please make sure to add * `populateSinkVectorOpsPatterns`, wherever you are using `populateVectorReductionToContractPatterns`. Also, rename `populateSinkVectorBroadcastPatterns` as `populateSinkVectorOpsPatterns`. (*) I didn't notice any other re-order patterns.
2024-08-09	[mlir][vector] Add mask elimination transform (#99314)	Benjamin Maxwell
	This adds a new transform `eliminateVectorMasks()` which aims at removing scalable `vector.create_masks` that will be all-true at runtime. It attempts to do this by simply pattern-matching the mask operands (similar to some canonicalizations), if that does not lead to an answer (is all-true? yes/no), then value bounds analysis will be used to find the lower bound of the unknown operands. If the lower bound is >= to the corresponding mask vector type dim, then that dimension of the mask is all true. Note that the pattern matching prevents expensive value-bounds analysis in cases where the mask won't be all true. For example: ```mlir %mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1> ``` From looking at `%c2` we can tell this is not going to be an all-true mask, so we don't need to run the value-bounds analysis for `%dynamicValue` (and can exit the transform early). Note: Eliminating create_masks here means replacing them with all-true constants (which will then lead to the masks folding away).
2024-08-04	[mlir] Construct SmallVector with ArrayRef (NFC) (#101896)	Kazu Hirata

2024-04-18	[mlir][vector] Add support for linearizing Extract, ExtractStridedSlice, ↵	Charitha Saumya
	Shuffle VectorOps in VectorLinearize (#88204) This PR adds support for converting `vector.extract_strided_slice` and `vector.extract` operations to equivalent `vector.shuffle` operations that operates on linearized (1-D) vectors. `vector.shuffle` operations operating on n-D (n > 1) are also converted to equivalent shuffle operations working on linearized vectors.
2024-03-28	[mlir][vector] Add support for scalable vectors to VectorLinearize (#86786)	Andrzej Warzyński
	Adds support for scalable vectors to patterns defined in VectorLineralize.cpp. Linearization is disable in 2 notable cases: * vectors with more than 1 scalable dimension (we cannot represent vscale^2), * vectors initialised with arith.constant that's not a vector splat (such arith.constant Ops cannot be flattened).
2024-03-04	[mlir][vector]Add Vector bitwidth target to Linearize Vectorizable and ↵	Balaji V. Iyer
	Constant Ops (#83314) Added a new flag `targetVectorBitwidth` to capture bit-width input.
2024-02-28	[mlir][vector] Fix n-d transfer write distribution (#83215)	Quinn Dawkins
	Currently n-d transfer write distribution can be inconsistent with distribution of reductions if a value has multiple users, one of which is a transfer_write with a non-standard distribution map, and the other of which is a vector.reduction. We may want to consider removing the distribution map functionality in the future for this reason.
2024-02-21	[mlir][Vector] Add vector bitwidth target to xfer op flattening (#81966)	Diego Caballero
	This PR adds an optional bitwidth parameter to the vector xfer op flattening transformation so that the flattening doesn't happen if the trailing dimension of the read/writen vector is larger than this bitwidth (i.e., we are already able to fill at least one vector register with that size).
2024-02-13	[mlir][vector] ND vectors linearization pass (#81159)	Ivan Butygin
	Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V conversion doesn't handle them at all at the moment. Sometimes it's preferable to treat multidim vectors as linearized 1D. Add pass to do this. Only constants and simple elementwise ops are supported for now. @krzysz00 I've extracted yours result type conversion code from LegalizeToF32 and moved it to common place. Also, add ConversionPattern class operating on traits.
2023-12-18	[mlir][vector] Add pattern to break down reductions into arith ops (#75727)	Jakub Kuderski
	The number of vector elements considered 'small' enough to extract is parameterized. This is to avoid going into specialized reduction lowering when a single/couple of arith ops can do. Targets without dedicated reduction intrinsics can use that as an emulation path too. Depends on https://github.com/llvm/llvm-project/pull/75846.
2023-12-15	[mlir][vector] Add emulation patterns for vector masked load/store (#74834)	Hsiangkai Wang
	In this patch, it will convert ``` vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru ``` to ``` %ivalue = %pass_thru %m = vector.extract %mask[0] %result0 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1] %combined = vector.insert %v, %ivalue[0] scf.yield %combined } else { scf.yield %ivalue } %m = vector.extract %mask[1] %result1 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1 + 1] %combined = vector.insert %v, %result0[1] scf.yield %combined } else { scf.yield %result0 } ... ``` It will convert ``` vector.maskedstore %base[%idx_0, %idx_1], %mask, %value ``` to ``` %m = vector.extract %mask[0] scf.if %m { %extracted = vector.extract %value[0] memref.store %extracted, %base[%idx_0, %idx_1] } %m = vector.extract %mask[1] scf.if %m { %extracted = vector.extract %value[1] memref.store %extracted, %base[%idx_0, %idx_1 + 1] } ... ```
2023-12-13	[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817)	Andrzej Warzyński
	For vectors with either leading or trailing unit dim, replaces: elementwise(a, b) with: sc_a = shape_cast(a) sc_b = shape_cast(b) res = elementwise(sc_a, sc_b) return shape_cast(res) The newly inserted shape_cast Ops fold (before elementwise Op) and then restore (after elementwise Op) the unit dim. Vectors `a` and `b` are required to be rank > 1. Example: ```mlir %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32> %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32> ``` gets converted to: ```mlir %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32> %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32> %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32> %mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32> %cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32> ``` In practice, the bottom 2 shape_cast(s) will be folded away.
2023-12-12	[mlir][vector] Allow vector distribution with multiple written elements (#75122)	Jakub Kuderski
	Add a configuration option to allow vector distribution with multiple elements written by a single lane. This is so that we can perform vector multi-reduction with multiple results per workgroup.
2023-12-05	[mlir][Vector] Update patterns for flattening vector.xfer Ops (2/N) (#73523)	Andrzej Warzyński
	Updates patterns for flattening `vector.transfer_read` by relaxing the requirement that the "collapsed" indices are all zero. This enables collapsing cases like this one: ```mlir %2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Previously only the following case would be consider for collapsing (all indices are 0): ```mlir %2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Also adds some new comments and renames the `firstContiguousInnerDim` parameter as `firstDimToCollapse` (the latter better matches the actual meaning). Similar updates for `vector.transfer_write` will be implemented in a follow-up patch.
2023-11-22	[mlir][vector] Add patterns to simplify chained reductions (#73048)	Jakub Kuderski
	Chained reductions get created during vector unrolling. These patterns simplify them into a series of adds followed by a final reductions. This is preferred on GPU targets like SPIR-V/Vulkan where vector reduction gets lowered into subgroup operations that are generally more expensive than simple vector additions. For now, only the `add` combining kind is handled.
2023-11-10	[mlir][vector] Root the transfer write distribution pattern on the warp op ↵	Quinn Dawkins
	(#71868) Currently when there is a mix of transfer read ops and transfer write ops that need to be distributed, because the pattern for write distribution is rooted on the transfer write, it is hard to guarantee that the write gets distributed after the read when the two aren't directly connected by SSA. This is likely still relatively unsafe when there are undistributable ops, but structurally these patterns are a bit difficult to work with. For now pattern benefits give fairly good guarantees for happy paths.
2023-09-12	Update some uses of `getAttr()` to be explicit about Inherent vs Discardable ↵	Mehdi Amini
	(NFC)
2023-08-22	[mlir][vector] Add support for scalable vectors in `trimLeadingOneDims`	Andrzej Warzynski
	This patch updates one specific hook in "VectorDropLeadUnitDim.cpp" to make sure that "scalable dims" are handled correctly. While this change affects multiple patterns, I am only adding one regression tests that captures one specific case that affects me right now. I am also adding Vector dialect to the list of dependencies of `-test-vector-to-vector-lowering`. Otherwise my test case won't work as a standalone test. Differential Revision: https://reviews.llvm.org/D157993
2023-06-15	[mlir][Vector] Add pattern to reorder elementwise and broadcast ops	Andrzej Warzynski
	The new pattern will replace elementwise(broadcast) with broadcast(elementwise) when safe. This change affects tests for vectorising nD-extract. In one case ("vectorize_nd_tensor_extract_with_tensor_extract") I just trimmed the test and only preserved the key parts (scalar and contiguous load from the original Op). We could do the same with some other tests if that helps maintainability. Differential Revision: https://reviews.llvm.org/D152812
2023-06-09	[mlir][vector][transform] Expose tensor slice -> transfer folding patterns	Matthias Springer
	Add a new transform op to populate patterns: ApplyFoldTensorSliceIntoTransferPatternsOp. Differential Revision: https://reviews.llvm.org/D152531
2023-06-05	[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`	Manish Gupta
	Consider mixed precision data type, i.e., F16 input lhs, F16 input rhs, F32 accumulation, and F32 output. This is typically written as F32 <= F16F16 + F32. During vectorization from linalg to vector for mixed precision data type (F32 <= F16F16 + F32), linalg.matmul introduces arith.extf on input lhs and rhs operands. "linalg.matmul"(%lhs, %rhs, %acc) ({ ^bb0(%arg1: f16, %arg2: f16, %arg3: f32): %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32 %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32 %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32 %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32 "linalg.yield"(%acc) : (f32) -> () }) There are backend that natively supports mixed-precision data type and does not need the arith.extf. For example, NVIDIA A100 GPU has mma.sync.aligned.*.f32.f16.f16.f32 that can support mixed-precision data type. However, the presence of arith.extf in the IR, introduces the unnecessary casting targeting F32 Tensor Cores instead of F16 Tensor Cores for NVIDIA backend. This patch adds a folding pattern to fold arith.extf into vector.contract Differential Revision: https://reviews.llvm.org/D151918
2023-05-19	[mlir][Vector] Extend xfer_read(extract)->scalar load to support multiple uses	Diego Caballero
	This patch extends the vector.extract(vector.transfer_read) -> scalar load patterns to support vector.transfer_read with multiple uses. For now, we check that all the uses are vector.extract operations. Supporting multiple uses is predicated under a flag. Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D150812
2023-05-17	[mlir][vector] Separate out vector transfer + tensor slice patterns	Lei Zhang
	These patterns touches the structure generated from tiling so it affects later steps like bufferization and vector hoisting. Instead of putting them in canonicalization, this commit creates separate entry points for them to be called explicitly. This is NFC regarding the functionality and tests of those patterns. It also addresses two TODO items in the codebase. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D150702
2023-05-12	[mlir] Move casting calls from methods to function calls	Tres Popp
	The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123