<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices  (#166569)</title>
<updated>2025-11-11T23:37:15+00:00</updated>
<author>
<name>Hanumanth</name>
<email>hhanuman@mathworks.com</email>
</author>
<published>2025-11-11T23:37:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=81964597f9918e1f294f5b9de27ee662005b8c58'/>
<id>81964597f9918e1f294f5b9de27ee662005b8c58</id>
<content type='text'>
I hit another runtime verification issue (similar to
https://github.com/llvm/llvm-project/pull/164878) while working with
TFLite models. The verifier is incorrectly rejecting
`tensor.extract_slice` operations when extracting an empty slice
(size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces `offset &lt;
dim_size`. This makes sense for non-empty slices, but it's too strict
for empty slices, causing false positives that lead to spurious runtime
assertions.

**Simple example that demonstrates the issue:**

```mlir
func.func @extract_empty_slice(%tensor: tensor&lt;?xf32&gt;, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor&lt;?xf32&gt; to tensor&lt;?xf32&gt;
  return
}
```

For the above example, the check evaluates `10 &lt; 10` which is false, so
verification fails. However, I believe this operation should be valid -
we're extracting zero elements, so there's no actual out-of-bounds
access.

**Real-world repro from the TensorFlow Lite models:**

This issue manifests while lowering TFLite models and a lot of our
system tests are failing due to this. Here's a simplified version
showing the problematic pattern:

In this code, `%extracted_slice_0` becomes an empty tensor when SSA
value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`.
The operation extracts zero elements along dimension 0, which is
semantically valid but fails runtime verification.

```mlir
func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor&lt;10x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt; {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %1 = "tosa.const"() &lt;{values = dense&lt;1&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %2 = "tosa.const"() &lt;{values = dense&lt;10&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %3 = "tosa.const"() &lt;{values = dense&lt;-1&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %4 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %5 = "tosa.const"() &lt;{values = dense&lt;0.000000e+00&gt; : tensor&lt;1x4x1xf32&gt;}&gt; : () -&gt; tensor&lt;1x4x1xf32&gt;
  %c4_1 = tosa.const_shape  {values = dense&lt;1&gt; : tensor&lt;1xindex&gt;} : () -&gt; !tosa.shape&lt;1&gt;
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) -&gt; (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) {
    %7 = tosa.greater %2, %arg1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i1&gt;
    %extracted = tensor.extract %7[] : tensor&lt;i1&gt;
    scf.condition(%extracted) %arg1, %arg2 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  } do {
  ^bb0(%arg1: tensor&lt;i32&gt;, %arg2: tensor&lt;10x4x1xf32&gt;):
    %7 = tosa.add %arg1, %1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i32&gt;
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_0 = tensor.extract %9[%c0] : tensor&lt;3xi32&gt;
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_1 = tensor.extract %14[%c0] : tensor&lt;3xi32&gt;
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor&lt;3xi32&gt;
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor&lt;3xi32&gt;
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor&lt;?x4x1xf32&gt;, tensor&lt;1x4x1xf32&gt;, tensor&lt;?x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt;
    
    scf.yield %7, %19 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  }
  
  return %6#1 : tensor&lt;10x4x1xf32&gt;
}
```
**The fix:**

Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 &lt;= offset &lt;= dim_size`
- Non-empty slice (size &gt; 0): require `0 &lt;= offset &lt; dim_size`


**Question for reviewers:**
Should we also relax the static verifier to allow this edge case?
Currently, the static verifier rejects the following IR:

```mlir
%tensor = arith.constant dense&lt;1.0&gt; : tensor&lt;10xf32&gt;
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor&lt;10xf32&gt; to tensor&lt;0xf32&gt;
```
Since we're allowing it at runtime for dynamic shapes, it seems
inconsistent to reject it statically. However, I wanted to get feedback
before making that change - this PR focuses only on the runtime
verification fix for dynamic shapes.

P.S. We have a similar issue with `memref.subview`. I will send a
separate patch for the issue.

Co-authored-by: Hanumanth Hanumantharayappa &lt;hhanuman@ah-hhanuman-l.dhcp.mathworks.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I hit another runtime verification issue (similar to
https://github.com/llvm/llvm-project/pull/164878) while working with
TFLite models. The verifier is incorrectly rejecting
`tensor.extract_slice` operations when extracting an empty slice
(size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces `offset &lt;
dim_size`. This makes sense for non-empty slices, but it's too strict
for empty slices, causing false positives that lead to spurious runtime
assertions.

**Simple example that demonstrates the issue:**

```mlir
func.func @extract_empty_slice(%tensor: tensor&lt;?xf32&gt;, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor&lt;?xf32&gt; to tensor&lt;?xf32&gt;
  return
}
```

For the above example, the check evaluates `10 &lt; 10` which is false, so
verification fails. However, I believe this operation should be valid -
we're extracting zero elements, so there's no actual out-of-bounds
access.

**Real-world repro from the TensorFlow Lite models:**

This issue manifests while lowering TFLite models and a lot of our
system tests are failing due to this. Here's a simplified version
showing the problematic pattern:

In this code, `%extracted_slice_0` becomes an empty tensor when SSA
value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`.
The operation extracts zero elements along dimension 0, which is
semantically valid but fails runtime verification.

```mlir
func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor&lt;10x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt; {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %1 = "tosa.const"() &lt;{values = dense&lt;1&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %2 = "tosa.const"() &lt;{values = dense&lt;10&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %3 = "tosa.const"() &lt;{values = dense&lt;-1&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %4 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %5 = "tosa.const"() &lt;{values = dense&lt;0.000000e+00&gt; : tensor&lt;1x4x1xf32&gt;}&gt; : () -&gt; tensor&lt;1x4x1xf32&gt;
  %c4_1 = tosa.const_shape  {values = dense&lt;1&gt; : tensor&lt;1xindex&gt;} : () -&gt; !tosa.shape&lt;1&gt;
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) -&gt; (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) {
    %7 = tosa.greater %2, %arg1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i1&gt;
    %extracted = tensor.extract %7[] : tensor&lt;i1&gt;
    scf.condition(%extracted) %arg1, %arg2 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  } do {
  ^bb0(%arg1: tensor&lt;i32&gt;, %arg2: tensor&lt;10x4x1xf32&gt;):
    %7 = tosa.add %arg1, %1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i32&gt;
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_0 = tensor.extract %9[%c0] : tensor&lt;3xi32&gt;
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_1 = tensor.extract %14[%c0] : tensor&lt;3xi32&gt;
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor&lt;3xi32&gt;
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor&lt;3xi32&gt;
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor&lt;?x4x1xf32&gt;, tensor&lt;1x4x1xf32&gt;, tensor&lt;?x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt;
    
    scf.yield %7, %19 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  }
  
  return %6#1 : tensor&lt;10x4x1xf32&gt;
}
```
**The fix:**

Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 &lt;= offset &lt;= dim_size`
- Non-empty slice (size &gt; 0): require `0 &lt;= offset &lt; dim_size`


**Question for reviewers:**
Should we also relax the static verifier to allow this edge case?
Currently, the static verifier rejects the following IR:

```mlir
%tensor = arith.constant dense&lt;1.0&gt; : tensor&lt;10xf32&gt;
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor&lt;10xf32&gt; to tensor&lt;0xf32&gt;
```
Since we're allowing it at runtime for dynamic shapes, it seems
inconsistent to reject it statically. However, I wanted to get feedback
before making that change - this PR focuses only on the runtime
verification fix for dynamic shapes.

P.S. We have a similar issue with `memref.subview`. I will send a
separate patch for the issue.

Co-authored-by: Hanumanth Hanumantharayappa &lt;hhanuman@ah-hhanuman-l.dhcp.mathworks.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][tensor] Fix runtime verification for `tensor.extract_slice` when size dimension value is 0 (#164878)</title>
<updated>2025-10-27T18:43:18+00:00</updated>
<author>
<name>Hanumanth</name>
<email>hhanuman@mathworks.com</email>
</author>
<published>2025-10-27T18:43:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a6788b52468fb1bf661ce76f95ad92d0050bd35e'/>
<id>a6788b52468fb1bf661ce76f95ad92d0050bd35e</id>
<content type='text'>
Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `tensor.extract_slice` operations where one of the
dimensions had a size of 0.

The `tensor.extract_slice` runtime verification logic was
unconditionally generating checks for the position of the last element
(`offset + (size - 1) * stride`). When `size` is 0, this causes the
assertion condition to always be false, leading to runtime failures even
though the operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size &gt; 0` to avoid generating spurious assert statements.

This issue was discovered through LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`tensor.extract_slice`.

The following is a simplified IR snippet from the model. After running
the runtime verification pass, an assertion that always fails is
generated because the SSA value `%3` becomes 0.

```mlir
func.func @simple_repro_from_liteRT_model(%arg0: tensor&lt;10x4x1xf32&gt;) -&gt; tensor&lt;?x?x?xf32&gt; {
  %cst = arith.constant dense&lt;0&gt; : tensor&lt;1xi32&gt;
  %cst_0 = arith.constant dense&lt;-1&gt; : tensor&lt;2xi32&gt;
  %c-1 = arith.constant -1 : index
  %c0 = arith.constant 0 : index
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %0 = tensor.empty() : tensor&lt;3xi32&gt;
  %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor&lt;1xi32&gt; into tensor&lt;3xi32&gt;
  %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor&lt;2xi32&gt; into tensor&lt;3xi32&gt;
  %extracted = tensor.extract %inserted_slice_1[%c0] : tensor&lt;3xi32&gt;
  %1 = index.casts %extracted : i32 to index
  %2 = arith.cmpi eq, %1, %c-1 : index
  %3 = arith.select %2, %c10, %1 : index
  %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor&lt;3xi32&gt;
  %4 = index.casts %extracted_2 : i32 to index
  %5 = arith.cmpi eq, %4, %c-1 : index
  %6 = arith.select %5, %c4, %4 : index
  %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor&lt;3xi32&gt;
  %7 = index.casts %extracted_3 : i32 to index
  %8 = arith.cmpi eq, %7, %c-1 : index
  %9 = arith.select %8, %c1, %7 : index
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x?x?xf32&gt;
  return %extracted_slice : tensor&lt;?x?x?xf32&gt;
}
```

The issue can be reproduced more simply with the following test case,
where `dim_0` is `0`. When the runtime verification pass is applied to
this code with `dim_0 = 0`, it generates an assertion that will always
fail at runtime.

```mlir
func.func @extract_slice_zero_size_dim(%arg0: tensor&lt;10x4x1xf32&gt;,
                                      %dim_0: index,
                                      %dim_1: index,
                                      %dim_2: index) {
  %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1]
    : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x?x?xf32&gt;
  return
}

func.func @test_zero_size_extraction() {
  %input = arith.constant dense&lt;1.0&gt; : tensor&lt;10x4x1xf32&gt;
  // Define slice dimensions: 0x4x1 (zero-size in first dimension)
  %dim_0 = arith.constant 0 : index
  %dim_1 = arith.constant 4 : index
  %dim_2 = arith.constant 1 : index
  func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2)
    : (tensor&lt;10x4x1xf32&gt;, index, index, index) -&gt; ()
  return
}
```

P.S. We probably have a similar issue with `memref.subview`. I will
check this and send a separate PR for the issue.

---------

Co-authored-by: Hanumanth Hanumantharayappa &lt;hhanuman@ah-hhanuman-l.dhcp.mathworks.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `tensor.extract_slice` operations where one of the
dimensions had a size of 0.

The `tensor.extract_slice` runtime verification logic was
unconditionally generating checks for the position of the last element
(`offset + (size - 1) * stride`). When `size` is 0, this causes the
assertion condition to always be false, leading to runtime failures even
though the operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size &gt; 0` to avoid generating spurious assert statements.

This issue was discovered through LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`tensor.extract_slice`.

The following is a simplified IR snippet from the model. After running
the runtime verification pass, an assertion that always fails is
generated because the SSA value `%3` becomes 0.

```mlir
func.func @simple_repro_from_liteRT_model(%arg0: tensor&lt;10x4x1xf32&gt;) -&gt; tensor&lt;?x?x?xf32&gt; {
  %cst = arith.constant dense&lt;0&gt; : tensor&lt;1xi32&gt;
  %cst_0 = arith.constant dense&lt;-1&gt; : tensor&lt;2xi32&gt;
  %c-1 = arith.constant -1 : index
  %c0 = arith.constant 0 : index
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %0 = tensor.empty() : tensor&lt;3xi32&gt;
  %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor&lt;1xi32&gt; into tensor&lt;3xi32&gt;
  %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor&lt;2xi32&gt; into tensor&lt;3xi32&gt;
  %extracted = tensor.extract %inserted_slice_1[%c0] : tensor&lt;3xi32&gt;
  %1 = index.casts %extracted : i32 to index
  %2 = arith.cmpi eq, %1, %c-1 : index
  %3 = arith.select %2, %c10, %1 : index
  %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor&lt;3xi32&gt;
  %4 = index.casts %extracted_2 : i32 to index
  %5 = arith.cmpi eq, %4, %c-1 : index
  %6 = arith.select %5, %c4, %4 : index
  %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor&lt;3xi32&gt;
  %7 = index.casts %extracted_3 : i32 to index
  %8 = arith.cmpi eq, %7, %c-1 : index
  %9 = arith.select %8, %c1, %7 : index
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x?x?xf32&gt;
  return %extracted_slice : tensor&lt;?x?x?xf32&gt;
}
```

The issue can be reproduced more simply with the following test case,
where `dim_0` is `0`. When the runtime verification pass is applied to
this code with `dim_0 = 0`, it generates an assertion that will always
fail at runtime.

```mlir
func.func @extract_slice_zero_size_dim(%arg0: tensor&lt;10x4x1xf32&gt;,
                                      %dim_0: index,
                                      %dim_1: index,
                                      %dim_2: index) {
  %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1]
    : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x?x?xf32&gt;
  return
}

func.func @test_zero_size_extraction() {
  %input = arith.constant dense&lt;1.0&gt; : tensor&lt;10x4x1xf32&gt;
  // Define slice dimensions: 0x4x1 (zero-size in first dimension)
  %dim_0 = arith.constant 0 : index
  %dim_1 = arith.constant 4 : index
  %dim_2 = arith.constant 1 : index
  func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2)
    : (tensor&lt;10x4x1xf32&gt;, index, index, index) -&gt; ()
  return
}
```

P.S. We probably have a similar issue with `memref.subview`. I will
check this and send a separate PR for the issue.

---------

Co-authored-by: Hanumanth Hanumantharayappa &lt;hhanuman@ah-hhanuman-l.dhcp.mathworks.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>[MLIR] Reuse AsmState to enable fast generate-runtime-verification pass; add location-only pass option (#160331)</title>
<updated>2025-10-08T10:48:34+00:00</updated>
<author>
<name>Hanchenng Wu</name>
<email>42194432+HanchengWu@users.noreply.github.com</email>
</author>
<published>2025-10-08T10:48:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=a6d1a52b8da9cb3c351a086180f8b871f0fd2a6e'/>
<id>a6d1a52b8da9cb3c351a086180f8b871f0fd2a6e</id>
<content type='text'>
The pass generate-runtime-verification generates additional runtime op
verification checks.

Currently, the pass is extremely expensive. For example, with a
mobilenet v2 ssd network(converted to mlir), running this pass alone in
debug mode will take 30 minutes. The same observation has been made to
other networks as small as 5 Mb.

The culprit is this line "op-&gt;print(stream, flags);" in function
"RuntimeVerifiableOpInterface::generateErrorMessage" in File
mlir/lib/Interfaces/RuntimeVerifiableOpInterface.cpp.

As we are printing the op with all the names of the operands in the
middle end, we are constructing a new SSANameState for each
op-&gt;print(...) call. Thus, we are doing a new SSA analysis for each
error message printed.

Perf profiling shows that 98% percent of the time is spent in the
constructor of SSANameState.

This change refactored the message generator. We use a toplevel
AsmState, and reuse it with all the op-print(stream, asmState). With a
release build, this change reduces the pass exeuction time from ~160
seconds to 0.3 seconds on my machine.

This change also adds verbose options to generate-runtime-verification
pass.
verbose 0: print only source location with error message.
verbose 1: print the full op, including the name of the operands.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pass generate-runtime-verification generates additional runtime op
verification checks.

Currently, the pass is extremely expensive. For example, with a
mobilenet v2 ssd network(converted to mlir), running this pass alone in
debug mode will take 30 minutes. The same observation has been made to
other networks as small as 5 Mb.

The culprit is this line "op-&gt;print(stream, flags);" in function
"RuntimeVerifiableOpInterface::generateErrorMessage" in File
mlir/lib/Interfaces/RuntimeVerifiableOpInterface.cpp.

As we are printing the op with all the names of the operands in the
middle end, we are constructing a new SSANameState for each
op-&gt;print(...) call. Thus, we are doing a new SSA analysis for each
error message printed.

Perf profiling shows that 98% percent of the time is spent in the
constructor of SSANameState.

This change refactored the message generator. We use a toplevel
AsmState, and reuse it with all the op-print(stream, asmState). With a
release build, this change reduces the pass exeuction time from ~160
seconds to 0.3 seconds on my machine.

This change also adds verbose options to generate-runtime-verification
pass.
verbose 0: print only source location with error message.
verbose 1: print the full op, including the name of the operands.</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][NFC] update `mlir/Dialect` create APIs (23/n) (#149930)</title>
<updated>2025-07-23T14:16:52+00:00</updated>
<author>
<name>Maksim Levental</name>
<email>maksim.levental@gmail.com</email>
</author>
<published>2025-07-23T14:16:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=8fff238b2c363b036ce9e7bf7abab3acafc87ab2'/>
<id>8fff238b2c363b036ce9e7bf7abab3acafc87ab2</id>
<content type='text'>
See https://github.com/llvm/llvm-project/pull/147168 for more info.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
See https://github.com/llvm/llvm-project/pull/147168 for more info.</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir] Remove unused includes (NFC) (#148119)</title>
<updated>2025-07-11T18:59:26+00:00</updated>
<author>
<name>Kazu Hirata</name>
<email>kazu@google.com</email>
</author>
<published>2025-07-11T18:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=5e0de68626828009c4cc09e2ce984f9c9634f6f6'/>
<id>5e0de68626828009c4cc09e2ce984f9c9634f6f6</id>
<content type='text'>
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.</pre>
</div>
</content>
</entry>
<entry>
<title>[mlir][tensor] Add runtime verification for `cast`/`dim`/`extract`/`insert`/`extract_slice` (#141332)</title>
<updated>2025-06-05T03:06:47+00:00</updated>
<author>
<name>Matthias Springer</name>
<email>me@m-sp.org</email>
</author>
<published>2025-06-05T03:06:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=e4c8ff94e7a30589ab6dc6dbb6151e1424ce3432'/>
<id>e4c8ff94e7a30589ab6dc6dbb6151e1424ce3432</id>
<content type='text'>
Add `RuntimeVerifiableOpInterface` implementations for the following
ops. These were mostly copied from the respective memref
implementations. Only the part that deals with offsets and strides was
removed.
* `tensor.cast`: `memref.cast`
* `tensor.dim`: `memref.dim`
* `tensor.extract`: `memref.load`
* `tensor.insert`: `memref.store`
* `tensor.extract_slice`: `memref.subview`</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add `RuntimeVerifiableOpInterface` implementations for the following
ops. These were mostly copied from the respective memref
implementations. Only the part that deals with offsets and strides was
removed.
* `tensor.cast`: `memref.cast`
* `tensor.dim`: `memref.dim`
* `tensor.extract`: `memref.load`
* `tensor.insert`: `memref.store`
* `tensor.extract_slice`: `memref.subview`</pre>
</div>
</content>
</entry>
</feed>
