summaryrefslogtreecommitdiff
path: root/mlir/include
AgeCommit message (Collapse)Author
2025-11-21[MLIR] [XeGPU] Fix dropSgLayoutAndData & dropInstData in SliceAttr (#168618)Nishant Patel
2025-11-21[mlir][presburger] Fix PresburgerSpace comment (#167292)lonely eagle
2025-11-21[acc][flang] Implement acc interface for tracking type descriptors (#168982)Razvan Lupusoru
FIR operations that use derived types need to have type descriptor globals available on device when offloading. Examples of this can be seen in `CUFDeviceGlobal` which ensures that such type descriptor uses work on device for CUF. Similarly, this is needed for OpenACC. This change introduces a new interface to the OpenACC dialect named `IndirectGlobalAccessOpInterface` which can be attached to operations that may result in generation of accesses that use type descriptor globals. This functionality is needed for the `ACCImplicitDeclare` pass that is coming in a follow-up change which implicitly ensures that all referenced globals are available in OpenACC compute contexts. The interface provides a `getReferencedSymbols` method that collects all global symbols referenced by an operation. When a symbol table is provided, the implementation for FIR recursively walks type descriptor globals to find all transitively referenced symbols. Note that alternately this could have been implemented in different ways: - Codegen could implicitly generate such type globals as needed by changing the technique that relies on populating them during lowering (eg generate them directly in gpu.module during codegen). - This interface could attach to types instead of operations for a potentially more conservative implementation which maps all type descriptors even if the underlying implementation using it won't necessarily need such mapping. The technique chosen here is consistent with `CUFDeviceGlobal` (which walks operations inside `prepareImplicitDeviceGlobals`) and avoids conservative mapping of all type descriptors.
2025-11-21[mlir][ROCDL] Adds wmma scaled intrinsics for gfx1250 (#165915)Muzammiluddin Syed
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
2025-11-21Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049)Fabian Mora
Reverts llvm/llvm-project#165873 The revert is triggered by a failing integration test on a couple of buildbots.
2025-11-21[MLIR][GPU] subgroup_mma fp64 extension (#165873)Giacomo Castiglioni
This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type. The extension requires special handling during the lowering to `nvvm` due to the return type for load ops for fragment a and b (they return a scalar instead of a struct).
2025-11-21[mlir][py][c] Enable setting block arg locations. (#169033)Jacques Pienaar
This enables changing the location of a block argument. Follows the approach for updating type of block arg.
2025-11-21[MLIR][XeGPU][TransformOps] Add slice_dims argument to set_op_layout_attr ↵Tuomas Kärnä
and set_desc_layout (#168929) `set_op_layout_attr` and `set_desc_layout` transform ops wrap `xegpu.layout` in an `xegpu.slice` attribute if `slice_dims` argument is set.
2025-11-21[mlir][llvm] Handle debug record import edge cases (#168774)Tobias Gysi
This commit enables the direct import of debug records by default and fixes issues with two edge cases: - Detect early on if the address operand is an argument list (calling getAddress() for argument lists asserts) - Use getAddress() to check if the address operand is null, which means the address operand is an empty metadata node, which currently is not supported. - Add support for debug label records. This is a follow-up to: https://github.com/llvm/llvm-project/pull/167812
2025-11-21[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)Longsheng Mou
This PR removes op verifiers that do not implement any custom verification logic.
2025-11-20[mlir][SCF] Add `scf::tileAndFuseConsumer` that tiles a consumer into a ↵MaheshRavishankar
given tiled loop nest. (#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-11-20[mlir] Replace `llvm::OwningArrayRef` with `std::vector` (#168803)David Stone
There are several places where we use `llvm::OwningArrayRef`. The interface to this requires us to first construct temporary storage, then allocate space and set the allocated memory to 0, then copy the values we actually want into that memory, then move the array into place. Instead we can just do it all inline in a single pass by using `std::vector`. In one case we actually allocate a completely separate container and then allocate + copy the data over because `llvm::OwningArrayRef` does not (and can't) support `push_back`. Note that `llvm::SmallVector` is not a suitable replacement here because we rely on reference stability on move construction: when the outer container reallocates, we need the the contents of the inner containers to be fixed in memory, and `llvm::SmallVector` does not give us that guarantee.
2025-11-20[mlir][spirv] Add support for SwitchOp (#168713)Igor Wodiany
The dialect implementation mostly copies the one of `cf.switch`, but aligns naming to the SPIR-V spec.
2025-11-20Revert "[mlir][Pass] Fix crash when applying a pass to an optional ↵Matthias Springer
interface" (#168847) Reverts llvm/llvm-project#168499
2025-11-20[mlir][Pass] Fix crash when applying a pass to an optional interface (#168499)Matthias Springer
Interfaces can be optional: whether an op implements an interface or not can depend on the state of the operation. ``` // An optional code block for adding additional "classof" logic. This can // be used to better enable "optional" interfaces, where an entity only // implements the interface if some dynamic characteristic holds. // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the // interface instance being checked. code extraClassOf = ""; ``` The current `Pass::canScheduleOn(RegisteredOperationName)` is insufficient. This commit adds an additional overload to inspect `Operation *`. This commit fixes a crash when scheduling an `InterfacePass` for an optional interface on an operation that does not actually implement the interface.
2025-11-19[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)Nishant Patel
This PR adds pattern for unrolling shape_cast given a targetShape. This PR is a follow up of #164010 which was very general and was using inserts and extracts on each element (which is also LowerVectorShapeCast.cpp is doing). After doing some more research on use cases, we (me and @Jianhui-Li ) realized that the previous version in #164010 is unnecessarily generic and doesn't fit our performance needs. Our use case requires that targetShape is contiguous in both source and result vector. This pattern only applies when contiguous slices can be extracted from the source vector and inserted into the result vector such that each slice remains in vector form with targetShape (and not decompose to scalars). In these cases, the unrolling proceeds as: vector.extract_strided_slice -> vector.shape_cast (on the slice unrolled) -> vector.insert_strided_slice
2025-11-19[MLIR][NVVM] Doc fixes (#168716)Guray Ozen
2025-11-19[mlir][LLVM] Resync memory effect attribute with LLVM IR (#168568)darkbuck
- Add missing locations, namely 'ErrnoMem', 'TargetMem0', and 'TargetMem1'.
2025-11-19[MLIR][ODS] Fully qualify namespace for mlir::Attribute in ODS generated ↵BogdanDragosV
code (#168536) ODS generate code can be included and used outside of the `mlir` namespace and so references to symbols in the mlir namespace must be fully qualified.
2025-11-19Reland "[MLIR][NVVM] Add tcgen05.mma MLIR Ops (#164356)" (#168638)Pradeep Kumar
Reland commit fb829bf11feeb53f815a3abf539e63ec3a23ed3d with additional fixes relating to post-merge CI failure ``` /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp: In function ‘constexpr llvm::nvvm::CTAGroupKind getNVVMCtaGroupKind(mlir::NVVM::CTAGroupKind)’: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include/llvm/Support/ErrorHandling.h:165:36: error: call to non-constexpr function ‘void llvm::llvm_unreachable_internal(const char*, const char*, unsigned int)’ ::llvm::llvm_unreachable_internal(msg, __FILE__, __LINE__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp:73:3: note: in expansion of macro ‘llvm_unreachable’ llvm_unreachable("unsupported cta_group value"); ^ ```
2025-11-19[mlir] Use dictionary order to order the pass decl (NFC) (#168648)lonely eagle
2025-11-18[MLIR][XeGPU] Allow create mem desc from 2d memref (#167767)Jianhui Li
This PR relax the create_mem_desc's restriction on source memref, allowing it to be a 2d memref.
2025-11-18[OpenACC] add cl::values to ACCImplicitRoutineOptions (#168601)Scott Manley
Add the cl::values to the pass options so an assert is not reached when trying to generate a reproducer e.g. "unknown data value for option"
2025-11-18[mlir][acc][flang] Introduce OpenACC interfaces for globals (#168614)Razvan Lupusoru
Introduce two new OpenACC operation interfaces for identifying global variables and their address computations: - `GlobalVariableOpInterface`: Identifies operations that define global variables. Provides an `isConstant()` method to query whether the global is constant. - `AddressOfGlobalOpInterface`: Identifies operations that compute the address of a global variable. Provides a `getSymbol()` method to retrieve the symbol reference. This is being done in preparation for `ACCImplicitDeclare` pass which will automatically ensure that `acc declare` is applied to globals when needed. The following operations now implement these interfaces: - `memref::GlobalOp` implements `GlobalVariableOpInterface` - `memref::GetGlobalOp` implements `AddressOfGlobalOpInterface` - `fir::GlobalOp` implements `GlobalVariableOpInterface` - `fir::AddrOfOp` implements `AddressOfGlobalOpInterface`
2025-11-18Revert "[MLIR][NVVM] Add tcgen05.mma MLIR Ops" (#168583)Mehdi Amini
Reverts llvm/llvm-project#164356 The bots are broken.
2025-11-18[mlir][tosa] Add a pass to narrow i64 to i32 (#165581)Luke Hutton
This pass aims to narrow i64 types on TOSA operations to i32. It can be useful for legalizations from various frameworks. It comes with the following options: - "aggressive-rewrite" - This option is typically able to narrow more values, but may impact numerical behaviour if not used carefully. - "convert-function-boundaries" - If enabled, parameters/ results to/from a function may be narrowed. Otherwise, casts are inserted to preserve the I/O of the function. Currently the non aggressive mode is very limited, targeting an argmax -> cast sequence that has been observed during legalization as well as some data layout operations that can always narrow. Support for more operations will be added in the future. Co-authored-by: Vitalii Shutov <vitalii.shutov@arm.com> Co-authored-by: Shubham <shubham@arm.com> Co-authored-by: Declan Flavin <declan.flavin@arm.com> Signed-off-by: Luke Hutton <luke.hutton@arm.com> Co-authored-by: Vitalii Shutov <vitalii.shutov@arm.com> Co-authored-by: Shubham <shubham@arm.com> Co-authored-by: Declan Flavin <declan.flavin@arm.com>
2025-11-18[MLIR][NVVM] Move the docs to markdown file (#168375)Guray Ozen
2025-11-18[MLIR][NVVM] Add tcgen05.mma MLIR Ops (#164356)Pradeep Kumar
This commit adds support for tgen05.mma family of instructions in the NVVM MLIR dialect and lowers to LLVM Intrinsics. Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-mma-instructions) for information
2025-11-18[mlir][SCF] Add pass option to deactivate pattern rollback (#168481)Matthias Springer
Add a pass option to `convert-scf-to-cf` to deactivate pattern rollback for better performance. The lowering patterns from SCF->CF to benefit a lot from this feature because `splitBlock` is expensive in the rollback driver.
2025-11-18[mlir][NVVM] Add no-rollback option to NVVM lowering passes (#168477)Matthias Springer
Add pass options to run lowerings to NVVM without pattern rollback. This makes the dialect conversions easier to debug and improves performance/memory usage.
2025-11-17[mlir][acc] Add ACCImplicitRoutine pass for implicit `acc routine` (#168433)Razvan Lupusoru
This change adds the ACCImplicitRoutine pass which implements the OpenACC specification for implicit routine directives (OpenACC 3.4 spec, section 2.15.1). According to the specification: "If no explicit routine directive applies to a procedure whose definition appears in the program unit being compiled, then the implementation applies an implicit routine directive to that procedure if any of the following conditions holds: The procedure is called or its address is accessed in a compute region." The pass automatically generates `acc.routine` operations for functions called within OpenACC compute constructs or within existing routine functions that do not already have explicit routine directives. It recursively applies implicit routine directives while avoiding infinite recursion when dependencies form cycles. Key features: - Walks through all OpenACC compute constructs (parallel, kernels, serial) to identify function calls - Creates implicit `acc.routine` operations for functions without explicit routine declarations - Recursively processes existing `acc.routine` operations to handle transitive dependencies - Avoids infinite recursion through proper tracking of processed routines - Respects device-type specific bind clauses to skip routines bound to different device types Requirements: - Function operations must implement `mlir::FunctionOpInterface` to be identified and associated with routine directives. - Call operations must implement `mlir::CallOpInterface` to detect function calls and traverse the call graph. - Optionally pre-register `acc::OpenACCSupport` if custom behavior is needed for determining if a symbol use is valid within GPU regions (such as functions which are already considerations for offloading even without `acc routine` markings) Co-authored-by: delaram-talaashrafi<dtalaashrafi@nvidia.com>
2025-11-17[mlir][XeGPU] Use DistributeLayoutAttr instead of LayoutAttr for load ↵Dmitry Chigarev
gather/scatter ops (#167850) The PR changes the layout attribute type for `xegpu::LoadGatherOp/StoreScatterOp` from `LayoutAttr` to `DistributeLayoutAttr` to also support `xegpu.slice` layouts. Initially we [wanted to restrict slice layouts](https://github.com/llvm/llvm-project/pull/163414#discussion_r2478978798) from the attribute, but now it turns out there are actually valid use cases for that: ```mlir gpu.func @distribute_load_slice_attr() { %2 = memref.alloca() {alignment = 1024} : memref<4096xf32> %offset = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<0> : vector<256xindex> %mask = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8], sg_data = [32], inst_data = [16]> } dense<1> : vector<256xi1> %3 = xegpu.load %2[%offset], %mask <{chunk_size = 1, layout = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]>>} { layout_result_0 = #xegpu.slice<#xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>, dims = [0]> } : memref<4096xf32>, vector<256xindex>, vector<256xi1> -> vector<256xf32> %4 = vector.broadcast %3 {layout_result_0 = #xegpu.layout<sg_layout = [8, 8], sg_data = [32, 32], inst_data = [8, 16]>} : vector<256xf32> to vector<256x256xf32> gpu.return } ``` Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
2025-11-17Fix side effects for LLVM integer operations (udiv, sdiv) incorrectly marked ↵Jeremy Furtek
as Pure (#166648) This MR modifies side effect traits of some integer arithmetic operations in the LLVM dialect. Prior to this MR, the LLVM dialect `sdiv` and `udiv` operations were marked as `Pure` through `tblgen` inheritance of the `LLVM_ArithmeticOpBase` class. The `Pure` trait allowed incorrect hoisting of `sdiv`/`udiv` operations by the `loop-independent-code-motion` pass. This MR modifies the `sdiv` and `udiv` LLVM operations to have traits and code motion behavior identical to their counterparts in the `arith` dialect, which were established by the commit/review below. https://github.com/llvm/llvm-project/commit/ed39825be48805b174d3177f1d8d41ed84784d18 https://reviews.llvm.org/D137814
2025-11-17Add 'exact' flag to arith.shrui/shrsi/divsi/divui operations (#165923)Jeremy Furtek
This MR adds support for the `exact` flag to the `arith.shrui/shrsi/divsi/divui` operations. The semantics are identical to those of the LLVM dialect and the LLVM language reference. This MR also modifies the mechanism for converting `arith` dialect **attributes** to corresponding **properties** in the `LLVM` dialect. (As a specific example, the integer overflow flags `nsw/nuw` are **properties** in the `LLVM` dialect, as opposed to attributes.) Previously, attribute converter classes were required to have a specific method to support integer overflow flags: ```C++ template <typename SourceOp, typename TargetOp> class AttrConvertPassThrough { public: ... LLVM::IntegerOverflowFlags getOverflowFlags() const { return LLVM::IntegerOverflowFlags::none; } }; ``` This method was required, even for `arith` source operations that did not use integer overflow flags (e.g. `AttrConvertFastMathToLLVM`). This MR modifies the interface required by `arith` dialect attribute converters to instead provide a (possibly NULL) properties attribute: ```C++ template <typename SourceOp, typename TargetOp> class AttrConvertPassThrough { public: ... Attribute getPropAttr() const { return {}; } }; ``` For `arith` operations with attributes that map to `LLVM` dialect **properties**, the attribute converter can create a `DictionaryAttr` containing target properties and return that attribute from the attribute converter's `getPropAttr()` method. The `arith` attribute conversion framework will set the `propertiesAttr` of an `OperationState`, and the target operation's `setPropertiesFromAttr()` method will be invoked to set the properties when the target operation is created. The `AttrConvertOverflowToLLVM` class in this MR uses the new approach.
2025-11-17[MLIR][SparseTensor] Dense Outer Loop Ordering Strategy (#160168)Govind Malasani
This PR builds upon the infrastructure set up for Sparse Tensor Loop Ordering Heuristics (#154656) by adding a preference to have dense loops outer and sparse loops inner. As always I'd love to get feedback and know if there's any other direction to go with this work that might be better.
2025-11-17[MLIR][NVVM][Docs] Explain memory spaces (#168059)Guray Ozen
2025-11-17[MLIR][NVVM][NFC] Re-order mem_scope and shared_space attrs (#168348)Durgadoss R
The mbarrier Ops also require access to the `mem_scope` and `shared_space` attributes. Hence, this patch moves their definitions to the beginning of the file alongside the other attribute definitions. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-11-17[MLIR] Add verification that symbol operations must not have results (#168390)Tim Noack
This patch adds verification to the `SymbolOpInterface` to enforce the design constraint that symbol operations must not produce SSA results, as documented in [Symbols and SymbolTables](https://mlir.llvm.org/docs/SymbolsAndSymbolTables/#defining-or-declaring-a-symbol). This is a follow-up of #168376
2025-11-17[mlir][amdgpu] Fix documentation and verifiers (#167369)Erick Ochoa Lopez
2025-11-17[MLIR][NVVM] Add support for shared::cta destination (#168056)Durgadoss R
This patch adds support for shared::cta as destination space in the TMA non-tensor copy Op (from global to shared::cta). * Appropriate verifier checks are added. * Unit tests are added to verify the lowering. The related intrinsic changes were merged through PR #167508. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-11-15[mlir][MemRef] Add UB as a dependent dialect and use `ub.poison` for Mem2Reg ↵Fabian Mora
(#168066) This patch adds `ub` as a dependent dialect to `memref`, and uses `ub.poison` as the default value in `AllocaOp::getDefaultValue` for the mem2reg pass. This aligns the behavior of `mem2reg` with LLVM, where loading a value before having a value should be poison. --------- Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
2025-11-14[mlir][acc] Check legality of symbols in acc regions (#167957)Razvan Lupusoru
This PR adds a new utility function to check whether symbols used in OpenACC regions are legal for offloading. Functions must be marked with `acc routine` or be built-in intrinsics. Global symbols must be marked with `acc declare`. The utility is designed to be extensible, and the OpenACCSupport analysis has been updated to allow handling of additional symbols that do not necessarily use OpenACC attributes but are marked in a way that still guarantees the symbol will be available when offloading. For example, in the Flang implementation, CUF attributes can be validated as legal symbols.
2025-11-14[MLIR] Extend vector.scatter to accept tensor as base (#165548)Ryutaro Okada
This PR makes the following improvements to `vector.scatter` and its lowering pipeline: - In addition to `memref`, accept a ranked `tensor` as the base operand of `vector.scatter`, similar to `vector.transfer_write`. - Implement bufferization support for `vector.scatter`, so that tensor-based scatter ops can be fully lowered to memref-based forms. It's worth to complete the functionality of map_scatter decomposition. Full discussion can be found here: https://github.com/iree-org/iree/issues/21135 --------- Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
2025-11-14[MLIR][LLVM] Debug info: import debug records directly (#167812)Bruno Cardoso Lopes
Effectively means we don't need to call into `llvmModule->convertFromNewDbgValues()` anymore. Added a flag to allow users to access the old behavior.
2025-11-14[mlir][tosa] Allow int64 index tensors in gather/scatter (#167894)Luke Hutton
This commit ensures that gather and scatter operations with int64 index tensors can be created. This aligns with the EXT_INT64 extension.
2025-11-14[Linalg] Add basic infra to add matchers for linalg.*conv*/*pool* ops (#163724)Abhishek Varma
-- This commit includes the basic infra/utilities to add matchers for linalg.*conv*/*pool* ops - such that given a `linalg.generic` op it identifies which linalg.*conv*/*pool* op it is. -- It adds a few representative linalg.*conv*/*pool* ops to demo the matchers' capability and does so as part of `linalg-specialize-generic-ops` pass. -- The goal is directed towards addressing the aim of [[RFC] Op explosion in Linalg](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863) iteratively for `*conv*/*pooling*` ops. -- This is part-1 of a series of PRs aimed to add matchers for Convolution ops. -- For further details, refer to https://github.com/llvm/llvm-project/pull/163374#pullrequestreview-3341048722 Signed-off-by: Abhishek Varma <abhvarma@amd.com>
2025-11-13[mlir][ROCDL] Refactor wmma intrinsics to use attributes not operands where ↵Muzammiluddin Syed
possible (#167041) The current implementation of the WMMA intrinsic ops as they are defined in the ROCDL tablegen is incorrect. They represent as operands what should be attributes such as `clamp`, `opsel`, `signA/signB`. This change performs a refactoring to bring it in line with what we expect. --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
2025-11-13[mlir][NVVM] Make sure barrier reduction attr can roundtrip (#167958)Valentin Clement (バレンタイン クレメン)
The IR was not able to be roundtrip through mlir-opt. Update the assembly format and add round trip tests. ``` mlir-opt mlir/test/Target/LLVMIR/nvvm/barrier.mlir | mlir-opt <stdin>:6:5: error: cannot name an operation with no results %0 = nvvm.barrier <and> %arg2 -> i32 ```
2025-11-13[ROCDL] Added missing cluster.ids op (gfx1250) (#167890)Ravil Dorozhinskii
Added missing cluster.ids op - i.e., along x, y, and z dims. Extended all rocdl tests
2025-11-13[MLIR][LLVMIR] Add {s,u}cmp intrinsics to LLVM dialect (#167870)Robert Konicar