llvm-project.git/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp, branch main

[mlir][amdgpu] Add lowerings for ScaledExtPacked816 (#168123)

2025-11-17T21:51:52+00:00

* Adds lowerings for amdgpy.scaled_ext_packed816
* updates verifiers

[mlir][amdgpu] Fix documentation and verifiers (#167369)

2025-11-17T13:34:21+00:00

[mlir][amdgpu][rocdl] Add gfx1250 wmma ops (#165064)

2025-10-28T16:42:39+00:00

Update `amdgpu.wmma` op definition and implement amdgpu to rocdl
conversion for new variants.

[mlir][amdgpu] Update mfma assembly format with intrinsic shape (#165037)

2025-10-25T09:58:43+00:00

Use the same format as introduced for wmma by
https://github.com/llvm/llvm-project/pull/164920.

Also make `blocks` default to 1.

[mlir][amdgpu] Add explicit intrinsic shape to wmma (#164920)

2025-10-24T16:21:33+00:00

This is in preparation for adding support for gfx1250 wmma intrinsics
that include much more possible shapes.

Instead of guessing the wave32/wave64 mode based on element types and
vector sizes, require the intrinsic shapes to be set explicitly as
attributes.

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations (#159830)

2025-10-17T16:58:03+00:00

[mlir][AMGPU] Replace use of SmallVector with ArrayRef, NFC (#163770)

2025-10-16T14:41:22+00:00

Improving choice of class used, from SmallVector to ArrayRef
(https://llvm.org/docs/ProgrammersManual.html#llvm-adt-arrayref-h). Also infer template types when possible.
Leftover from https://github.com/llvm/llvm-project/pull/155951.

---------

Signed-off-by: Muzammiluddin Syed

[mlir][amdgpu] Add Inliner interface (#162873)

2025-10-10T18:34:00+00:00

All the `amdgpu` dialect ops can be inlined.

---------

Signed-off-by: Ivan Butygin

[mlir][AMDGPU] Add canonicalization pattern to pack scales for ScaledMFMAOp (#155951)

2025-09-18T19:25:14+00:00

The ScaledMFMAOp accepts scales as a vector of 4 bytes
(`vector<4xf8E8M0FNU>`) that can be stored in a single register with a
particular scale accessed using the `OpSel` attribute. Currently, we
only use one byte in this 4-byte vector, resulting in 3 wasted
registers.

This is fixed by identifying when single byte extractions are performed
and rewriting them into extractions of 4-byte vectors.

Example:
```
  %unit = vector.extract %ScaleSrc[offsets] : f8E8M0FNU from vector
  %scale = vector.insert %unit, ... : f8E8M0FNU into vector<4xf8E8M0FNU>
  amdgpu.scaled_mfma(%scale[0] * ...
```
to
```
  %reshaped = vector.shape_cast %ScaleSrc : vector to vector 
  %scale = vector.extract %reshaped[?] : vector<4xf8E8M0FNU> from vector
  amdgpu.scaled_mfma(%scale[0-3] * ...
```

---------

Signed-off-by: Muzammiluddin Syed

[MLIR] Apply clang-tidy fixes for readability-identifier-naming in AMDGPUDialect.cpp (NFC)

2025-09-18T17:28:46+00:00