summaryrefslogtreecommitdiff
path: root/mlir
AgeCommit message (Collapse)Author
2025-11-22[MLIR][Python] remove PyYAML as a dep (#169145)Maksim Levental
PyYAML is not an actual use-time/runtime dependency of our bindings. It is necessary only if someone wants to regenerate `LinalgNamedStructuredOps.yaml`: https://github.com/llvm/llvm-project/blob/93097b2d47c87bf5eee0a2612d961c7a01831eab/mlir/tools/mlir-linalg-ods-gen/update_core_linalg_named_ops.sh.in#L29 This PR does the minimal refactor to remove the need during actual run/use time.
2025-11-21[MLIR] [XeGPU] Fix dropSgLayoutAndData & dropInstData in SliceAttr (#168618)Nishant Patel
2025-11-21[mlir][presburger] Fix PresburgerSpace comment (#167292)lonely eagle
2025-11-21[MLIR] Drop use of REQUIRES:shell from tests (#168989)Aiden Grossman
This patch drops two instances of REQUIRES: shell from MLIR tests. This feature does not mean much given the internal shell is the default for MLIR. It does prevent these tests from running on Windows, but it does not seem like there is anything inherent to these tests preventing them from running on Windows (minus maybe the lack of spirv-tools, which is explicitly required anyways.
2025-11-21[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)Nick Sarnie
Some targets have a specific calling convention that should be used for generated calls to runtime functions. Pass that down and use it. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-21[acc][flang] Implement acc interface for tracking type descriptors (#168982)Razvan Lupusoru
FIR operations that use derived types need to have type descriptor globals available on device when offloading. Examples of this can be seen in `CUFDeviceGlobal` which ensures that such type descriptor uses work on device for CUF. Similarly, this is needed for OpenACC. This change introduces a new interface to the OpenACC dialect named `IndirectGlobalAccessOpInterface` which can be attached to operations that may result in generation of accesses that use type descriptor globals. This functionality is needed for the `ACCImplicitDeclare` pass that is coming in a follow-up change which implicitly ensures that all referenced globals are available in OpenACC compute contexts. The interface provides a `getReferencedSymbols` method that collects all global symbols referenced by an operation. When a symbol table is provided, the implementation for FIR recursively walks type descriptor globals to find all transitively referenced symbols. Note that alternately this could have been implemented in different ways: - Codegen could implicitly generate such type globals as needed by changing the technique that relies on populating them during lowering (eg generate them directly in gpu.module during codegen). - This interface could attach to types instead of operations for a potentially more conservative implementation which maps all type descriptors even if the underlying implementation using it won't necessarily need such mapping. The technique chosen here is consistent with `CUFDeviceGlobal` (which walks operations inside `prepareImplicitDeviceGlobals`) and avoids conservative mapping of all type descriptors.
2025-11-21[mlir][ROCDL] Adds wmma scaled intrinsics for gfx1250 (#165915)Muzammiluddin Syed
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
2025-11-21Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049)Fabian Mora
Reverts llvm/llvm-project#165873 The revert is triggered by a failing integration test on a couple of buildbots.
2025-11-21[MLIR][GPU] subgroup_mma fp64 extension (#165873)Giacomo Castiglioni
This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type. The extension requires special handling during the lowering to `nvvm` due to the return type for load ops for fragment a and b (they return a scalar instead of a struct).
2025-11-21[mlir][py][c] Enable setting block arg locations. (#169033)Jacques Pienaar
This enables changing the location of a block argument. Follows the approach for updating type of block arg.
2025-11-21[MLIR] Apply clang-tidy fixes for readability-container-size-empty in ↵Mehdi Amini
AffineOps.cpp (NFC)
2025-11-21[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in ↵Mehdi Amini
AsmPrinter.cpp (NFC)
2025-11-21[MLIR] Apply clang-tidy fixes for readability-container-size-empty in ↵Mehdi Amini
ShardOps.cpp (NFC)
2025-11-21[MLIR][XeGPU][TransformOps] Add slice_dims argument to set_op_layout_attr ↵Tuomas Kärnä
and set_desc_layout (#168929) `set_op_layout_attr` and `set_desc_layout` transform ops wrap `xegpu.layout` in an `xegpu.slice` attribute if `slice_dims` argument is set.
2025-11-21[mlir][llvm] Handle debug record import edge cases (#168774)Tobias Gysi
This commit enables the direct import of debug records by default and fixes issues with two edge cases: - Detect early on if the address operand is an argument list (calling getAddress() for argument lists asserts) - Use getAddress() to check if the address operand is null, which means the address operand is an empty metadata node, which currently is not supported. - Add support for debug label records. This is a follow-up to: https://github.com/llvm/llvm-project/pull/167812
2025-11-21[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)Longsheng Mou
This PR removes op verifiers that do not implement any custom verification logic.
2025-11-20[MLIR] [XeGPU] Add distribution pattern for vector.constant_mask from Wg To ↵Nishant Patel
Sg (#168118)
2025-11-20[mlir][SCF] Add `scf::tileAndFuseConsumer` that tiles a consumer into a ↵MaheshRavishankar
given tiled loop nest. (#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-11-20[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in ↵Mehdi Amini
ValueBoundsOpInterface.cpp (NFC)
2025-11-20[mlir] Replace `llvm::OwningArrayRef` with `std::vector` (#168803)David Stone
There are several places where we use `llvm::OwningArrayRef`. The interface to this requires us to first construct temporary storage, then allocate space and set the allocated memory to 0, then copy the values we actually want into that memory, then move the array into place. Instead we can just do it all inline in a single pass by using `std::vector`. In one case we actually allocate a completely separate container and then allocate + copy the data over because `llvm::OwningArrayRef` does not (and can't) support `push_back`. Note that `llvm::SmallVector` is not a suitable replacement here because we rely on reference stability on move construction: when the outer container reallocates, we need the the contents of the inner containers to be fixed in memory, and `llvm::SmallVector` does not give us that guarantee.
2025-11-20[mlir][spirv] Add support for SwitchOp (#168713)Igor Wodiany
The dialect implementation mostly copies the one of `cf.switch`, but aligns naming to the SPIR-V spec.
2025-11-20[mlir][tosa] Fix select folder when operands are broadcast (#165481)Luke Hutton
This commit addresses a crash in the dialects folder. The currently folder assumes no broadcasting of the input operand happens and therefore the folder can complain that the returned value was not the same shape as the result. For now, this commit ensures no folding happens when broadcasting is involved. In the future, folding with a broadcast could likely be supported by inserting a `tosa.tile` operation before returning the operand. This type of transformation is likely better suited for a canonicalization pass. This commit only aims to avoid the crash.
2025-11-20[MLIR][SCFToGPU] Guard operands before AffineApplyOp::create to avoid crash ↵Shashi Shankar
(#167959) This fixes a crash in SCF→GPU when building the per‑dim index for mapped scf.parallel. **Change**: - Map step/lb through cloningMap, then run ensureLaunchIndependent. - If either is still unavailable at launch scope, emit a match‑failure; otherwise build the affine.apply. **Why this is correct:** - Matches how the pass already handles launch bounds; avoids creating an op with invalid operands and replaces a segfault with a clear diagnostic. **Tests**: - Added two small regressions that lower to gpu.launch and exercise the affine.apply path. Fixes : #167654 Signed-off-by: Shashi Shankar <shashishankar1687@gmail.com>
2025-11-20[mlir][memref] Generalize dead store detection to all view-like ops (#168507)Simone Pellegrini
The dead alloc elimination pass previously considered only subviews when checking for dead stores. This change generalizes the logic to support all view-like operations, ensuring broader coverage.
2025-11-20[mlir][Vector] Add support for scalable vectors to `ScanToArithOps` (#123117)Andrzej Warzyński
Note, scalable reductions dims are left as a TODO.
2025-11-20[MLIR] Apply clang-tidy fixes for bugprone-argument-comment in ↵Mehdi Amini
SparseBufferRewriting.cpp (NFC)
2025-11-20[MLIR] Apply clang-tidy fixes for readability-container-size-empty in ↵Mehdi Amini
LinalgOps.cpp (NFC)
2025-11-20[MLIR] Apply clang-tidy fixes for readability-identifier-naming in ↵Mehdi Amini
LLVMToLLVMIRTranslation.cpp (NFC)
2025-11-20[MLIR][LLVM] Extend DIScopeForLLVMFuncOp to handle cross-file operatio… ↵Zichen Lu
(#167844) The current `DIScopeForLLVMFuncOp` pass handles debug information for inlined code by processing `CallSiteLoc` attributes. However, some compilation scenarios compose code from multiple source files directly into a single function without generating `CallSiteLoc`. **Scenario:** ```python # a.py def kernel_a(tensor): print("a: {}", tensor) # a.py:3 jit_func_b(tensor) # Calls b.py code # b.py def func_b(tensor): print("b: {}", tensor) # b.py:7 ``` The scenario executes Python at compile-time and directly inserts operations from `b.py` into the kernel function, resulting in MLIR like: ```mlir @kernel_a(...) { print("a: {}", %arg0) loc(#loc_a) // a.py:3 print("b: {}", %arg0) loc(#loc_b) // b.py:7 <- FileLineColLoc, not CallSiteLoc } loc(#loc_kernel) // a.py:1 #loc1 = loc("a.py":3:.) #loc2 = loc("b.py":7:.) #loc_a = loc("print"(#loc1)) #loc_b = loc("print"(#loc2)) ``` ```llvm !6 = !DIFile(filename: "a.py", directory: "...") !9 = distinct !DISubprogram(name: "...", linkageName: "...", scope: !6, file: !6, line: 13, ...) !10 = !DILocation(line: 7, column: ., scope: !9) // Points to kernel's DISubprogram, not correct ```
2025-11-20Revert "[mlir][Pass] Fix crash when applying a pass to an optional ↵Matthias Springer
interface" (#168847) Reverts llvm/llvm-project#168499
2025-11-20[mlir][Pass] Fix crash when applying a pass to an optional interface (#168499)Matthias Springer
Interfaces can be optional: whether an op implements an interface or not can depend on the state of the operation. ``` // An optional code block for adding additional "classof" logic. This can // be used to better enable "optional" interfaces, where an entity only // implements the interface if some dynamic characteristic holds. // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the // interface instance being checked. code extraClassOf = ""; ``` The current `Pass::canScheduleOn(RegisteredOperationName)` is insufficient. This commit adds an additional overload to inspect `Operation *`. This commit fixes a crash when scheduling an `InterfacePass` for an optional interface on an operation that does not actually implement the interface.
2025-11-20[mlir] Make remove-dead-values remove block and successorOperands before ↵lonely eagle
delete ops (#166766) Reland https://github.com/llvm/llvm-project/pull/165725, fix the Failed test by removing successor operands before delete operations. Following the deletion of cond.branch, its successor operands will subsequently be removed.
2025-11-19[tosa] : Relax dynamic dimension checks for batch for conv decompositions ↵Sayan Saha
(#168764) This PR relaxes the validation checks to allow input/output data to have dynamic batch dimensions.
2025-11-19[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)Nishant Patel
This PR adds pattern for unrolling shape_cast given a targetShape. This PR is a follow up of #164010 which was very general and was using inserts and extracts on each element (which is also LowerVectorShapeCast.cpp is doing). After doing some more research on use cases, we (me and @Jianhui-Li ) realized that the previous version in #164010 is unnecessarily generic and doesn't fit our performance needs. Our use case requires that targetShape is contiguous in both source and result vector. This pattern only applies when contiguous slices can be extracted from the source vector and inserted into the result vector such that each slice remains in vector form with targetShape (and not decompose to scalars). In these cases, the unrolling proceeds as: vector.extract_strided_slice -> vector.shape_cast (on the slice unrolled) -> vector.insert_strided_slice
2025-11-19[MLIR][Conversion] XeGPU to XeVM: Use adaptor for getting base address from ↵Sang Ik Lee
memref. (#168610) adaptor already lowers memref to base address. Conversion patterns should use it instead of generating code to get base address from memref.
2025-11-19[MLIR][NVVM] Doc fixes (#168716)Guray Ozen
2025-11-19[tosa] : Enhance EqualizeRanks to handle dynamic dimensions. (#168564)Sayan Saha
Legalizing following IR to `tosa` using `tf-tosa-opt` from `tensorflow` repo: ``` func.func @main(%arg0: tensor<?x?x?x?xf32>) -> tensor<?x?x?x5xf32> { %0 = "tfl.pseudo_const"() <{value = dense<0.000000e+00> : tensor<5xf32>}> : () -> tensor<5xf32> %1 = tfl.add(%arg0, %0) <{fused_activation_function = "NONE"}> : (tensor<?x?x?x?xf32>, tensor<5xf32>) -> tensor<?x?x?x5xf32> return %1 : tensor<?x?x?x5xf32> } ``` fails with ``` error: 'tosa.add' op operands don't have matching ranks %1 = tfl.add(%arg0, %0) <{fused_activation_function = "NONE"}> : (tensor<?x?x?x?xf32>, tensor<5xf32>) -> tensor<?x?x?x5xf32> ^ tfl.mlir:3:10: note: see current operation: %1 = "tosa.add"(%arg0, %0) : (tensor<?x?x?x?xf32>, tensor<5xf32>) -> tensor<?x?x?x5xf32> // -----// IR Dump After TosaLegalizeTFLPass Failed (tosa-legalize-tfl) //----- // "func.func"() <{function_type = (tensor<?x?x?x?xf32>) -> tensor<?x?x?x5xf32>, sym_name = "main"}> ({ ^bb0(%arg0: tensor<?x?x?x?xf32>): %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<5xf32>}> : () -> tensor<5xf32> %1 = "tosa.add"(%arg0, %0) : (tensor<?x?x?x?xf32>, tensor<5xf32>) -> tensor<?x?x?x5xf32> "func.return"(%1) : (tensor<?x?x?x5xf32>) -> () }) : () -> () ``` This is because of the following check in `computeReshapeOutput` called from `EqualizeRanks` function: ``` if (lowerRankDim != 1 && higherRankDim != 1 && lowerRankDim != higherRankDim) return failure(); ``` Based on the broadcast semantics defined in https://mlir.llvm.org/docs/Traits/Broadcastable/#dimension-inference I think it's legal to allow `lowerRankDim != higherRankDim` if one of them is dynamic. At runtime verifier should enforce that 1. if lowerRankDim is dynamic and higherRankDim is static then the dynamic dim matches the static dim and vice-versa 2. if both are dynamic, they should match It's not necessary to error out during the op construction time.
2025-11-19[mlir][LLVM] Resync memory effect attribute with LLVM IR (#168568)darkbuck
- Add missing locations, namely 'ErrnoMem', 'TargetMem0', and 'TargetMem1'.
2025-11-19[MLIR][Python] make sure stubs get installed with ↵Maksim Levental
LLVM_DISTRIBUTION_COMPONENTS (#168407) Fixes https://github.com/llvm/llvm-project/issues/168393. Also adds top-level `MLIR_PYTHON_STUBGEN_ENABLED` CMake option.
2025-11-19[mlir][vector] Missing indices on vectorization of 1-d reduction to 1-ranked ↵Simone Pellegrini
memref (#166959) Vectorization of a 1-d reduction where the output variable is a 1-ranked memref can generate an invalid `vector.transfer_write` with no indices for the memref, e.g.: vector.transfer_write"(%vec, %buff) <{...}> : (vector<f32>, memref<1xf32>) -> () This patch solves the problem by providing the expected amount of indices (i.e. matching the rank of the memref).
2025-11-19[mlir][tensor] Drop unused AffineExpr variable (NFC) (#168651)Longsheng Mou
2025-11-19[MLIR][ODS] Fully qualify namespace for mlir::Attribute in ODS generated ↵BogdanDragosV
code (#168536) ODS generate code can be included and used outside of the `mlir` namespace and so references to symbols in the mlir namespace must be fully qualified.
2025-11-19[MLIR][NVVM] Add operations and interfacesGuray Ozen
2025-11-19[mlir][tosa] Fix linker failure in build bots introduced by #165581 (#168581)Luke Hutton
This commit fixes linker failures evident on some failing build bots.
2025-11-19Reland "[MLIR][NVVM] Add tcgen05.mma MLIR Ops (#164356)" (#168638)Pradeep Kumar
Reland commit fb829bf11feeb53f815a3abf539e63ec3a23ed3d with additional fixes relating to post-merge CI failure ``` /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp: In function ‘constexpr llvm::nvvm::CTAGroupKind getNVVMCtaGroupKind(mlir::NVVM::CTAGroupKind)’: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include/llvm/Support/ErrorHandling.h:165:36: error: call to non-constexpr function ‘void llvm::llvm_unreachable_internal(const char*, const char*, unsigned int)’ ::llvm::llvm_unreachable_internal(msg, __FILE__, __LINE__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:73:3: note: in expansion of macro ‘llvm_unreachable’ llvm_unreachable("unsupported cta_group value"); ^ ```
2025-11-19[mlir] Use dictionary order to order the pass decl (NFC) (#168648)lonely eagle
2025-11-19[MLIR][NVVM] Move docs to correct folderGuray Ozen
2025-11-18[MLIR][XeGPU] Allow create mem desc from 2d memref (#167767)Jianhui Li
This PR relax the create_mem_desc's restriction on source memref, allowing it to be a 2d memref.
2025-11-18[OpenACC] add cl::values to ACCImplicitRoutineOptions (#168601)Scott Manley
Add the cl::values to the pass options so an assert is not reached when trying to generate a reproducer e.g. "unknown data value for option"
2025-11-18[mlir][acc][flang] Introduce OpenACC interfaces for globals (#168614)Razvan Lupusoru
Introduce two new OpenACC operation interfaces for identifying global variables and their address computations: - `GlobalVariableOpInterface`: Identifies operations that define global variables. Provides an `isConstant()` method to query whether the global is constant. - `AddressOfGlobalOpInterface`: Identifies operations that compute the address of a global variable. Provides a `getSymbol()` method to retrieve the symbol reference. This is being done in preparation for `ACCImplicitDeclare` pass which will automatically ensure that `acc declare` is applied to globals when needed. The following operations now implement these interfaces: - `memref::GlobalOp` implements `GlobalVariableOpInterface` - `memref::GetGlobalOp` implements `AddressOfGlobalOpInterface` - `fir::GlobalOp` implements `GlobalVariableOpInterface` - `fir::AddrOfOp` implements `AddressOfGlobalOpInterface`