summaryrefslogtreecommitdiff
path: root/mlir/test/Dialect/Quant/Bytecode
AgeCommit message (Collapse)Author
2025-03-23Sub-channel quantized type implementation (#120172)Sandeep Dasgupta
This is an implementation for [RFC: Supporting Sub-Channel Quantization in MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694). In order to make the review process easier, the PR has been divided into the following commit labels: 1. **Add implementation for sub-channel type:** Includes the class design for `UniformQuantizedSubChannelType`, printer/parser and bytecode read/write support. The existing types (per-tensor and per-axis) are unaltered. 2. **Add implementation for sub-channel type:** Lowering of `quant.qcast` and `quant.dcast` operations to Linalg operations. 3. **Adding C/Python Apis:** We first define he C-APIs and build the Python-APIs on top of those. 4. **Add pass to normalize generic ....:** This pass normalizes sub-channel quantized types to per-tensor per-axis types, if possible. A design note: - **Explicitly storing the `quantized_dimensions`, even when they can be derived for ranked tensor.** While it's possible to infer quantized dimensions from the static shape of the scales (or zero-points) tensor for ranked data tensors ([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3) for background), there are cases where this can lead to ambiguity and issues with round-tripping. ``` Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>> ``` The shape of the scales tensor is [1, 2], which might suggest that only axis 1 is quantized. While this inference is technically correct, as the block size for axis 0 is a degenerate case (equal to the dimension size), it can cause problems with round-tripping. Therefore, even for ranked tensors, we are explicitly storing the quantized dimensions. Suggestions welcome! PS: I understand that the upcoming holidays may impact your schedule, so please take your time with the review. There's no rush.
2023-06-23Fix bytecode reader/writer on big-endian platformsUlrich Weigand
This makes the bytecode reader/writer work on big-endian platforms. The only problem was related to encoding of multi-byte integers, where both reader and writer code make implicit assumptions about endianness of the host platform. This fixes the current test failures on s390x, and in addition allows to remove the UNSUPPORTED markers from all other bytecode-related test cases - they now also all pass on s390x. Also adding a GFAIL_SKIP to the MultiModuleWithResource unit test, as this still fails due to an unrelated endian bug regarding decoding of external resources. Differential Revision: https://reviews.llvm.org/D153567 Reviewed By: mehdi_amini, jpienaar, rriddle
2022-12-09[MLIR/S90x] Convert tests to check 'target=...'Paul Robinson
Part of the project to eliminate special handling for triples in lit expressions.
2022-10-17[mlir][quant] Initial bytecode encoding for quantized typesJacques Pienaar
Add bytecode encoding for quantized types. These mostly follow the storage representation of these. Differential Revision: https://reviews.llvm.org/D136004