19 files changed, 178 insertions, 149 deletions
diff --git a/mlir/docs/DefiningDialects/AttributesAndTypes.md b/mlir/docs/DefiningDialects/AttributesAndTypes.md
index 022bdad9fe51..b99186391d71 100644
--- a/mlir/docs/DefiningDialects/AttributesAndTypes.md
+++ b/mlir/docs/DefiningDialects/AttributesAndTypes.md
@@ -136,7 +136,7 @@ def My_IntegerAttr : MyDialect_Attr<"Integer", "int"> {
   /// Here we've defined two parameters, one is a "self" type parameter, and the
   /// other is the integer value of the attribute. The self type parameter is
   /// specially handled by the assembly format.
-  let parameters = (ins AttributeSelfTypeParameter<"">:$type, "APInt":$value);
+  let parameters = (ins AttributeSelfTypeParameter<"">:$type, APIntParameter<"">:$value);
 
   /// Here we've defined a custom builder for the type, that removes the need to pass
   /// in an MLIRContext instance; as it can be infered from the `type`.
diff --git a/mlir/docs/Dialects/Mesh.md b/mlir/docs/Dialects/Mesh.md
deleted file mode 100644
index 5eb6569c7044..000000000000
--- a/mlir/docs/Dialects/Mesh.md
+++ /dev/null
@@ -1,74 +0,0 @@
-# 'mesh' Dialect
-
-The `mesh` dialect contains a set of attributes, operations and interfaces that
-are useful for representing sharding and communication on a device mesh
-cluster.
-
-[TOC]
-
-## Collective Communication Operations
-There are a number of operations in the Mesh dialect to facilitate
-communication between devices in a mesh.
-It is assumed that the user is familiar with collective operations.
-[Wikipedia](https://en.wikipedia.org/wiki/Collective_operation) has a good
-explanation.
-The main addition is that the collectives in this dialect have mesh
-semantics.
-
-### Device groups
-The operation attributes `mesh` and `mesh_axes` specifies a list of device mesh
-axes that partition the devices into disjoint groups.
-The collective operation is performed between devices in the same group.
-Devices that have the same coordinates outside of axes `mesh_axes` are in the
-same group.
-A group is described by its multi-index along the axes outside of `mesh_axes`.
-For example if we have a device mesh of size `2x3x4x5` and the partition mesh
-axes list is `[0, 1]` then devices are partitioned into the groups
-`{ { (i, j, k, m) | 0<=i<2, 0<=j<3 } | 0<=k<4, 0<=m<5 }`.
-The device groups would be `{ (k, m) | 0<=k<4, 0<=m<5 }`.
-Devices (1, 0, 2, 3) and (1, 1, 2, 3) will be in the same group.
-Device (1, 0, 2, 4) will be in another group.
-Some collective operations like all-to-all and all-gather care about the
-order of devices.
-The order of device in a device group is induced by the order of axes in
-`mesh_axes`.
-The axes are ordered from outer to inner.
-If we have an axis list `[3, 1]` then device `(i, 1, k, 0)` will precede
-both devices `(i, 0, k, 1)` and `(i, 2, k, 0)`.
-
-### In-group Device
-Some operations like `broadcast`, `scatter` and `send` specify devices in each
-device-group.
-These devices are represented with their multi-index over the mesh axes that
-are not constant within a device group.
-These are the axes specified by `mesh_axes` attribute.
-
-For Example on a 3D mesh an operation with `mesh_axes = [0, 2]` would specify
-an in-group device with `(i, j)`. Then for each group with index `g` on the
-second axis, the in-group device would be `(i, g, j)`.
-### Purity
-Collectives that involve the whole device group to perform a single operation
-are pure. The exceptions are `send` and `recv`.
-
-There is an assumption that the execution is SPMD.
-Not only that each process runs the same program, but that at the point of
-execution of a collective operation, all processes are in a coherent state.
-All compiler transformations must be consistent.
-Collective operations in the IR that may correspond to the same runtime
-collective operation must be transformed in a consistent manner.
-For example if a collective operation is optimized out, than it must also
-not appear in any path of execution on any process.
-
-Having the operations as `Pure` implies that if an interpreter is to execute
-the IR containing the `mesh` collectives, all processes would execute the same
-line when they reach a pure collective operation.
-This requirement stems from the need to be compatible with general optimization
-passes like dead code and common sub-expression elimination.
-
-## Operations
-
-[include "Dialects/MeshOps.md"]
-
-## Attributes
-
-[include "Dialects/MeshAttrs.md"]
diff --git a/mlir/docs/Dialects/Shard.md b/mlir/docs/Dialects/Shard.md
new file mode 100644
index 000000000000..eb6ff6150e47
--- /dev/null
+++ b/mlir/docs/Dialects/Shard.md
@@ -0,0 +1,92 @@
+# 'shard' Dialect
+
+The 'shard' dialect defines a set of attributes, operations, and interfaces for
+working with tensor sharding and device communication.
+
+It’s inspired by [GSPMD](*General and Scalable Parallelization for ML Computation Graphs*).
+
+Originally, the dialect was called `mesh`, but it was renamed to better reflect
+what it actually does.
+
+[TOC]
+
+## Collective Communication Operations
+
+The 'shard' dialect includes several collective operations that help coordinate
+communication between devices arranged in a grid.
+
+If you’re not already familiar with collective operations, [this Wikipedia
+article](https://en.wikipedia.org/wiki/Collective_operation) is a good starting
+point.
+
+Unlike traditional collectives that are defined in terms of message-passing
+between explicit buffers on each process, the collectives in this dialect work
+at a higher level. They’re defined in terms of how data moves across the
+dimensions of a tensor, and the participating processes are inferred from how
+the tensor is sharded - not specified manually.
+
+### Device Groups
+
+Each collective operation runs within a group of devices. You define groups
+using the `grid` and `grid_axes` attributes, which describe how to slice the
+full device grid into smaller groups.
+
+Devices that have the same coordinates *outside* the listed `grid_axes` belong
+to the same group.
+
+Example: Say your device grid is shaped `2×3×4×5`, and you set
+`grid_axes = [0, 1]`. This splits the grid into groups by fixing axes 2 and 3. You’d get groups like:
+
+```
+{ { (i, j, k, m) | 0 ≤ i < 2, 0 ≤ j < 3 } | 0 ≤ k < 4, 0 ≤ m < 5 }
+```
+
+So the groups are identified by the coordinates `(k, m)`, and devices like
+`(1, 0, 2, 3)` and `(1, 1, 2, 3)` are in the same group. But `(1, 0, 2, 4)`
+is in a different group.
+
+For some collectives (like `all-to-all`), the order of devices in the group
+matters. The device order is based on the order of axes in `grid_axes`, from
+outermost to innermost.
+
+Example: If `grid_axes = [3, 1]`, then device `(i, 1, k, 0)` comes before
+`(i, 0, k, 1)` and `(i, 2, k, 0)`.
+
+### In-group Devices
+
+Some operations (like `broadcast`, `scatter`, and `send`) refer to a specific
+device within each group. These in-group devices are identified using their
+coordinates over the axes listed in `grid_axes`.
+
+Example: In a 3D grid with `grid_axes = [0, 2]`, an in-group device is specified
+as `(i, j)`. If a group is fixed at coordinate `g` on axis 1, then the full
+device index would be `(i, g, j)`.
+
+### Purity and Execution Model
+
+Collective operations involve all devices in a group (e.g. `all-gather`,
+`all-to-all`) and are considered pure. Operations like `send` and `recv` are not
+collective and are not pure.
+
+The execution model assumes SPMD (Single Program, Multiple Data):
+
+* Every process runs the same program.
+* At any collective operation, all processes are in sync.
+
+This means compiler optimizations must treat collective ops carefully. For
+example, if a collective is removed during optimization, it must be removed from
+*every* path and *every* process that would have participated - otherwise, you’ll
+get undefined behavior at runtime.
+
+Marking these ops as pure also helps with standard compiler passes like dead
+code elimination and common subexpression elimination. It ensures that when the
+program is executed, all devices hit the same line of code at the same time
+during collectives and so avoid dead-locks.
+
+## Operations
+
+[include "Dialects/ShardOps.md"]
+
+## Attributes
+
+[include "Dialects/ShardAttrs.md"]
diff --git a/mlir/docs/Dialects/Transform.md b/mlir/docs/Dialects/Transform.md
index 5f79116dd00b..7164cb74f0a8 100644
--- a/mlir/docs/Dialects/Transform.md
+++ b/mlir/docs/Dialects/Transform.md
@@ -415,10 +415,22 @@ ops rather than having the methods directly act on the payload IR.
 
 [include "Dialects/TransformOps.md"]
 
+## Tuning Extension Operaiton
+
+[include "Dialects/TuneExtensionOps.md"]
+
 ## Affine Transform Operations
 
 [include "Dialects/AffineLoopTransformOps.md"]
 
+## ARM Neon Transform Operations
+
+[include "Dialects/ArmNeonVectorTransformOps.md"]
+
+## ARM SVE Transform Operations
+
+[include "Dialects/ArmSVEVectorTransformOps.md"]
+
 ## Bufferization Transform Operations
 
 [include "Dialects/BufferizationTransformOps.md"]
diff --git a/mlir/docs/Dialects/Vector.md b/mlir/docs/Dialects/Vector.md
index ebeb0a2de0ff..6c8949d70b4a 100644
--- a/mlir/docs/Dialects/Vector.md
+++ b/mlir/docs/Dialects/Vector.md
@@ -294,7 +294,7 @@ LLVM instructions are prefixed by the `llvm.` dialect prefix (e.g.
 `llvm.insertvalue`). Such ops operate exclusively on 1-D vectors and aggregates
 following the [LLVM LangRef](https://llvm.org/docs/LangRef.html). MLIR
 operations are prefixed by the `vector.` dialect prefix (e.g.
-`vector.insertelement`). Such ops operate exclusively on MLIR `n-D` `vector`
+`vector.insert`). Such ops operate exclusively on MLIR `n-D` `vector`
 types.
 
 ### Alternatives For Lowering an n-D Vector Type to LLVM
diff --git a/mlir/docs/Dialects/emitc.md b/mlir/docs/Dialects/emitc.md
index e2288f518dae..6d09e93b895a 100644
--- a/mlir/docs/Dialects/emitc.md
+++ b/mlir/docs/Dialects/emitc.md
@@ -18,6 +18,8 @@ The following convention is followed:
     GCC or Clang.
 *   If `emitc.array` with a dimension of size zero is used, then the code
     requires [a GCC extension](https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html).
+*   If `aligned_alloc` is passed to an `emitc.call_opaque` operation, then C++17 
+    or C11 is required.
 *   Else the generated code is compatible with C99.
 
 These restrictions are neither inherent to the EmitC dialect itself nor to the
diff --git a/mlir/docs/Interfaces.md b/mlir/docs/Interfaces.md
index bf590ac3351e..7e1c5fe07567 100644
--- a/mlir/docs/Interfaces.md
+++ b/mlir/docs/Interfaces.md
@@ -563,7 +563,7 @@ def MyInterface : OpInterface<"MyInterface"> {
         template <typename ConcreteOp>
         struct Model : public Concept {
           Operation *create(OpBuilder &builder, Location loc) const override {
-            return builder.create<ConcreteOp>(loc);
+            return ConcreteOp::create(builder, loc);
           }
         }
       };
@@ -574,7 +574,7 @@ def MyInterface : OpInterface<"MyInterface"> {
     }],
       "Operation *", "create", (ins "OpBuilder &":$builder, "Location":$loc),
       /*methodBody=*/[{
-        return builder.create<ConcreteOp>(loc);
+        return ConcreteOp::create(builder, loc);
     }]>,
 
     InterfaceMethod<[{
diff --git a/mlir/docs/PDLL.md b/mlir/docs/PDLL.md
index 9839d1d0df76..c6e352fd647d 100644
--- a/mlir/docs/PDLL.md
+++ b/mlir/docs/PDLL.md
@@ -1483,7 +1483,7 @@ be defined by specifying a string code block after the rewrite declaration:
 
 ```pdll
 Rewrite BuildOp(value: Value) -> (foo: Op<my_dialect.foo>, bar: Op<my_dialect.bar>) [{
-  return {rewriter.create<my_dialect::FooOp>(value), rewriter.create<my_dialect::BarOp>()};
+  return {my_dialect::FooOp::create(rewriter, value), my_dialect::BarOp::create(rewriter)};
 }];
 
 Pattern {
@@ -1508,7 +1508,7 @@ translated into:
 
 ```c++
 std::tuple<my_dialect::FooOp, my_dialect::BarOp> BuildOp(Value value) {
-  return {rewriter.create<my_dialect::FooOp>(value), rewriter.create<my_dialect::BarOp>()};
+  return {my_dialect::FooOp::create(rewriter, value), my_dialect::BarOp::create(rewriter)};
 }
 ```
 
@@ -1530,7 +1530,7 @@ below describes the various result translation scenarios:
 
 ```pdll
 Rewrite createOp() [{
-  rewriter.create<my_dialect::FooOp>();
+  my_dialect::FooOp::create(rewriter);
 }];
 ```
 
@@ -1538,7 +1538,7 @@ In the case where a native `Rewrite` has no results, the native function returns
 
 ```c++
 void createOp(PatternRewriter &rewriter) {
-  rewriter.create<my_dialect::FooOp>();
+  my_dialect::FooOp::create(rewriter);
 }
 ```
 
@@ -1546,7 +1546,7 @@ void createOp(PatternRewriter &rewriter) {
 
 ```pdll
 Rewrite createOp() -> Op<my_dialect.foo> [{
-  return rewriter.create<my_dialect::FooOp>();
+  return my_dialect::FooOp::create(rewriter);
 }];
 ```
 
@@ -1555,7 +1555,7 @@ native type for that single result:
 
 ```c++
 my_dialect::FooOp createOp(PatternRewriter &rewriter) {
-  return rewriter.create<my_dialect::FooOp>();
+  return my_dialect::FooOp::create(rewriter);
 }
 ```
 
diff --git a/mlir/docs/Passes.md b/mlir/docs/Passes.md
index e9d22d1e3dfa..9df32666415b 100644
--- a/mlir/docs/Passes.md
+++ b/mlir/docs/Passes.md
@@ -72,9 +72,9 @@ This document describes the available MLIR passes and their contracts.
 
 [include "MemRefPasses.md"]
 
-## 'mesh' Dialect Passes
+## 'shard' Dialect Passes
 
-[include "MeshPasses.md"]
+[include "ShardPasses.md"]
 
 ## 'ml\_program' Dialect Passes
 
diff --git a/mlir/docs/Rationale/RationaleLinalgDialect.md b/mlir/docs/Rationale/RationaleLinalgDialect.md
index 7b5137ede3ae..8975b0a7d515 100644
--- a/mlir/docs/Rationale/RationaleLinalgDialect.md
+++ b/mlir/docs/Rationale/RationaleLinalgDialect.md
@@ -118,7 +118,7 @@ pragmatic solution. The following non-exhaustive list refers to some of the
 projects that influenced Linalg design:
 
 - [ONNX](https://onnx.ai/),
-- [LIFT](https://www.lift-project.org/),
+- [LIFT](https://lift-project.github.io/),
 - [XLA](https://www.tensorflow.org/xla/architecture),
 - [Halide](https://halide-lang.org/) and [TVM](https://tvm.apache.org/),
 - [TACO](http://tensor-compiler.org/),
@@ -171,12 +171,12 @@ Linalg hopes to additionally address the following:
   other, thus simplifying the intermediate representation.
 
 ### Lessons from LIFT<a name="lessonslift"></a>
-[LIFT](https://www.lift-project.org/) is a system to write computational
+[LIFT](https://lift-project.github.io/) is a system to write computational
 kernels based on functional abstractions. Transformations are
 represented by additional nodes in the IR, whose semantics are at the
 level of the algorithm (e.g. `partialReduce`).
 LIFT applies and composes transformations by using [local rewrite
-rules](https://www.lift-project.org/presentations/2015/ICFP-2015.pdf) that
+rules](https://lift-project.github.io/publications/2015/steuwer15generating.pdf) that
 embed these additional nodes directly in the functional abstraction.
 
 Similarly to LIFT, Linalg uses local rewrite rules implemented with the MLIR
@@ -194,9 +194,9 @@ Linalg builds on, and helps separate concerns in the LIFT approach as follows:
 LIFT is expected to further influence the design of Linalg as it evolves. In
 particular, extending the data structure abstractions to support non-dense
 tensors can use the experience of LIFT abstractions for
-[sparse](https://www.lift-project.org/publications/2016/harries16sparse.pdf)
+[sparse](https://lift-project.github.io/publications/2016/harries16sparse.pdf)
 and [position-dependent
-arrays](https://www.lift-project.org/publications/2019/pizzuti19positiondependentarrays.pdf).
+arrays](https://lift-project.github.io/publications/2019/pizzuti19positiondependentarrays.pdf).
 
 ### Lessons from XLA<a name="lessonsxla"></a>
 [XLA](https://www.tensorflow.org/xla/architecture) is one of the first
diff --git a/mlir/docs/Tutorials/QuickstartRewrites.md b/mlir/docs/Tutorials/QuickstartRewrites.md
index 0c890659b0ee..cbb6f03e93e6 100644
--- a/mlir/docs/Tutorials/QuickstartRewrites.md
+++ b/mlir/docs/Tutorials/QuickstartRewrites.md
@@ -130,7 +130,7 @@ def : Pat<(TF_LeakyReluOp:$old_value, $arg, F32Attr:$a),
 ```c++
 static Value createTFLLeakyRelu(PatternRewriter &rewriter, Operation *op,
                                 Value operand, Attribute attr) {
-  return rewriter.create<mlir::TFL::LeakyReluOp>(
+  return mlir::TFL::LeakyReluOp::create(rewriter,
       op->getLoc(), operands[0].getType(), /*arg=*/operands[0],
       /*alpha=*/cast<FloatAttr>(attrs[0]));
 }
@@ -194,10 +194,10 @@ LogicalResult circt::MulOp::canonicalize(MulOp op, PatternRewriter &rewriter) {
   // mul(x, c) -> shl(x, log2(c)), where c is a power of two.
   if (inputs.size() == 2 && matchPattern(inputs.back(), m_RConstant(value)) &&
       value.isPowerOf2()) {
-    auto shift = rewriter.create<rtl::ConstantOp>(op.getLoc(), op.getType(),
+    auto shift = rtl::ConstantOp::create(rewriter, op.getLoc(), op.getType(),
                                                   value.exactLogBase2());
     auto shlOp =
-        rewriter.create<comb::ShlOp>(op.getLoc(), inputs[0], shift);
+        comb::ShlOp::create(rewriter, op.getLoc(), inputs[0], shift);
     rewriter.replaceOpWithNewOp<MulOp>(op, op.getType(),
                                        ArrayRef<Value>(shlOp));
     return success();
diff --git a/mlir/docs/Tutorials/Toy/Ch-2.md b/mlir/docs/Tutorials/Toy/Ch-2.md
index 039417c9c9a1..81e41615ee55 100644
--- a/mlir/docs/Tutorials/Toy/Ch-2.md
+++ b/mlir/docs/Tutorials/Toy/Ch-2.md
@@ -521,7 +521,7 @@ def ConstantOp : Toy_Op<"constant"> {
 
   // Add custom build methods for the constant operation. These methods populate
   // the `state` that MLIR uses to create operations, i.e. these are used when
-  // using `builder.create<ConstantOp>(...)`.
+  // using `ConstantOp::create(builder, ...)`.
   let builders = [
     // Build a constant with a given constant tensor value.
     OpBuilder<(ins "DenseElementsAttr":$value), [{
diff --git a/mlir/docs/Tutorials/Toy/Ch-4.md b/mlir/docs/Tutorials/Toy/Ch-4.md
index 1275d36de353..e9abe36afc4d 100644
--- a/mlir/docs/Tutorials/Toy/Ch-4.md
+++ b/mlir/docs/Tutorials/Toy/Ch-4.md
@@ -300,7 +300,7 @@ struct ToyInlinerInterface : public DialectInlinerInterface {
   Operation *materializeCallConversion(OpBuilder &builder, Value input,
                                        Type resultType,
                                        Location conversionLoc) const final {
-    return builder.create<CastOp>(conversionLoc, resultType, input);
+    return CastOp::create(builder, conversionLoc, resultType, input);
   }
 };
 ```
@@ -445,7 +445,7 @@ When processing an operation like described, we query if it registered the
 
 ```c++
   // Ask the operation to infer its output shapes.
-  LLVM_DEBUG(llvm::dbgs() << "Inferring shape for: " << *op << "\n");
+  LDBG() << "Inferring shape for: " << *op;
 
   /// We check if an operation has a particular interface by casting.
   if (ShapeInference shapeOp = dyn_cast<ShapeInference>(op)) {
diff --git a/mlir/docs/Tutorials/Toy/Ch-5.md b/mlir/docs/Tutorials/Toy/Ch-5.md
index d483cd8bba21..17cd6bb412a9 100644
--- a/mlir/docs/Tutorials/Toy/Ch-5.md
+++ b/mlir/docs/Tutorials/Toy/Ch-5.md
@@ -91,13 +91,11 @@ doesn't matter. See `ConversionTarget::getOpInfo` for the details.
 After the conversion target has been defined, we can define how to convert the
 *illegal* operations into *legal* ones. Similarly to the canonicalization
 framework introduced in [chapter 3](Ch-3.md), the
-[`DialectConversion` framework](../../DialectConversion.md) also uses
-[RewritePatterns](../QuickstartRewrites.md) to perform the conversion logic.
-These patterns may be the `RewritePatterns` seen before or a new type of pattern
-specific to the conversion framework `ConversionPattern`. `ConversionPatterns`
+[`DialectConversion` framework](../../DialectConversion.md) uses a special kind
+of `ConversionPattern` to perform the conversion logic. `ConversionPatterns`
 are different from traditional `RewritePatterns` in that they accept an
-additional `operands` parameter containing operands that have been
-remapped/replaced. This is used when dealing with type conversions, as the
+additional `operands` (or `adaptor`) parameter containing operands that have
+been remapped/replaced. This is used when dealing with type conversions, as the
 pattern will want to operate on values of the new type but match against the
 old. For our lowering, this invariant will be useful as it translates from the
 [TensorType](../../Dialects/Builtin.md/#rankedtensortype) currently being
@@ -106,38 +104,23 @@ look at a snippet of lowering the `toy.transpose` operation:
 
 ```c++
 /// Lower the `toy.transpose` operation to an affine loop nest.
-struct TransposeOpLowering : public mlir::ConversionPattern {
-  TransposeOpLowering(mlir::MLIRContext *ctx)
-      : mlir::ConversionPattern(TransposeOp::getOperationName(), 1, ctx) {}
-
-  /// Match and rewrite the given `toy.transpose` operation, with the given
-  /// operands that have been remapped from `tensor<...>` to `memref<...>`.
-  llvm::LogicalResult
-  matchAndRewrite(mlir::Operation *op, ArrayRef<mlir::Value> operands,
-                  mlir::ConversionPatternRewriter &rewriter) const final {
-    auto loc = op->getLoc();
+struct TransposeOpLowering : public OpConversionPattern<toy::TransposeOp> {
+  using OpConversionPattern<toy::TransposeOp>::OpConversionPattern;
 
-    // Call to a helper function that will lower the current operation to a set
-    // of affine loops. We provide a functor that operates on the remapped
-    // operands, as well as the loop induction variables for the inner most
-    // loop body.
-    lowerOpToLoops(
-        op, operands, rewriter,
-        [loc](mlir::PatternRewriter &rewriter,
-              ArrayRef<mlir::Value> memRefOperands,
-              ArrayRef<mlir::Value> loopIvs) {
-          // Generate an adaptor for the remapped operands of the TransposeOp.
-          // This allows for using the nice named accessors that are generated
-          // by the ODS. This adaptor is automatically provided by the ODS
-          // framework.
-          TransposeOpAdaptor transposeAdaptor(memRefOperands);
-          mlir::Value input = transposeAdaptor.input();
-
-          // Transpose the elements by generating a load from the reverse
-          // indices.
-          SmallVector<mlir::Value, 2> reverseIvs(llvm::reverse(loopIvs));
-          return rewriter.create<mlir::AffineLoadOp>(loc, input, reverseIvs);
-        });
+  LogicalResult
+  matchAndRewrite(toy::TransposeOp op, OpAdaptor adaptor,
+                  ConversionPatternRewriter &rewriter) const final {
+    auto loc = op->getLoc();
+    lowerOpToLoops(op, rewriter,
+                   [&](OpBuilder &builder, ValueRange loopIvs) {
+                     Value input = adaptor.getInput();
+
+                     // Transpose the elements by generating a load from the
+                     // reverse indices.
+                     SmallVector<Value, 2> reverseIvs(llvm::reverse(loopIvs));
+                     return affine::AffineLoadOp::create(builder, loc, input,
+                                                         reverseIvs);
+                   });
     return success();
   }
 };
diff --git a/mlir/docs/Tutorials/Toy/Ch-6.md b/mlir/docs/Tutorials/Toy/Ch-6.md
index e8a68b5f9ee3..529de5530420 100644
--- a/mlir/docs/Tutorials/Toy/Ch-6.md
+++ b/mlir/docs/Tutorials/Toy/Ch-6.md
@@ -47,7 +47,7 @@ static FlatSymbolRefAttr getOrInsertPrintf(PatternRewriter &rewriter,
   // Insert the printf function into the body of the parent module.
   PatternRewriter::InsertionGuard insertGuard(rewriter);
   rewriter.setInsertionPointToStart(module.getBody());
-  rewriter.create<LLVM::LLVMFuncOp>(module.getLoc(), "printf", llvmFnType);
+  LLVM::LLVMFuncOp::create(rewriter, module.getLoc(), "printf", llvmFnType);
   return SymbolRefAttr::get("printf", context);
 }
 ```
diff --git a/mlir/docs/Tutorials/Toy/Ch-7.md b/mlir/docs/Tutorials/Toy/Ch-7.md
index dce3490aeace..0f50c49a5f64 100644
--- a/mlir/docs/Tutorials/Toy/Ch-7.md
+++ b/mlir/docs/Tutorials/Toy/Ch-7.md
@@ -488,9 +488,9 @@ mlir::Operation *ToyDialect::materializeConstant(mlir::OpBuilder &builder,
                                                  mlir::Type type,
                                                  mlir::Location loc) {
   if (isa<StructType>(type))
-    return builder.create<StructConstantOp>(loc, type,
+    return StructConstantOp::create(builder, loc, type,
                                             cast<mlir::ArrayAttr>(value));
-  return builder.create<ConstantOp>(loc, type,
+  return ConstantOp::create(builder, loc, type,
                                     cast<mlir::DenseElementsAttr>(value));
 }
 ```
diff --git a/mlir/docs/Tutorials/UnderstandingTheIRStructure.md b/mlir/docs/Tutorials/UnderstandingTheIRStructure.md
index 30b50cb09490..f7c62f23068e 100644
--- a/mlir/docs/Tutorials/UnderstandingTheIRStructure.md
+++ b/mlir/docs/Tutorials/UnderstandingTheIRStructure.md
@@ -178,7 +178,7 @@ inside a single block (or a single region), however it is frequently interesting
 to traverse the IR in a nested fashion. To this end MLIR exposes the `walk()`
 helper on `Operation`, `Block`, and `Region`. This helper takes a single
 argument: a callback method that will be invoked for every operation recursively
-nested under the provided entity.
+nested under the provided entity (as well as this initial operation).
 
 ```c++
   // Recursively traverse all the regions and blocks nested inside the function
diff --git a/mlir/docs/Tutorials/transform/Ch0.md b/mlir/docs/Tutorials/transform/Ch0.md
index ac3989a09e54..dc4b753f98ca 100644
--- a/mlir/docs/Tutorials/transform/Ch0.md
+++ b/mlir/docs/Tutorials/transform/Ch0.md
@@ -46,7 +46,7 @@ When no support is available, such an operation can be transformed into a loop:
 %c8 = arith.constant 8 : index
 %init = arith.constant 0.0 : f32
 %result = scf.for %i = %c0 to %c8 step %c1 iter_args(%partial = %init) -> (f32) {
-  %element = vector.extractelement %0[%i : index] : vector<8xf32>
+  %element = vector.extract %0[%i] : f32 into vector<8xf32>
   %updated = arith.addf %partial, %element : f32
   scf.yield %updated : f32
 }
@@ -145,7 +145,7 @@ linalg.generic {
   %c0 = arith.constant 0.0 : f32
   %0 = arith.cmpf ogt %in_one, %c0 : f32
   %1 = arith.select %0, %in_one, %c0 : f32
-  linalg.yield %1 : f32 
+  linalg.yield %1 : f32
 }
 ```
 
@@ -185,7 +185,7 @@ In the case of `linalg.generic` operations, the iteration space is implicit and
 For example, tiling the matrix multiplication presented above with tile sizes `(2, 8)`, we obtain a loop nest around a `linalg.generic` expressing the same operation on a `2x8` tensor.
 
 ```mlir
-// A special "multi-for" loop that supports tensor-insertion semantics 
+// A special "multi-for" loop that supports tensor-insertion semantics
 // as opposed to implicit updates. The resulting 8x16 tensor will be produced
 // by this loop.
 // The trip count of iterators is computed dividing the original tensor size,
@@ -202,9 +202,9 @@ For example, tiling the matrix multiplication presented above with tile sizes `(
   // Take slices of inputs and outputs. Only the "i" and "j" dimensions are sliced.
   %lhs_slice = tensor.extract_slice %lhs[%3, 0] [2, 10] [1, 1]
              : tensor<8x10xf32> to tensor<2x10xf32>
-  %rhs_slice = tensor.extract_slice %rhs[0, %4] [10, 8] [1, 1] 
+  %rhs_slice = tensor.extract_slice %rhs[0, %4] [10, 8] [1, 1]
              : tensor<10x16xf32> to tensor<10x8xf32>
-  %result_slice = tensor.extract_slice %shared[%3, %4] [2, 8] [1, 1] 
+  %result_slice = tensor.extract_slice %shared[%3, %4] [2, 8] [1, 1]
                 : tensor<8x16xf32> to tensor<2x8xf32>
 
   // This is exactly the same operation as before, but now operating on smaller
@@ -214,7 +214,7 @@ For example, tiling the matrix multiplication presented above with tile sizes `(
                    affine_map<(i, j, k) -> (k, j)>,
                    affine_map<(i, j, k) -> (i, j)>],
   iterator_types = ["parallel", "parallel", "reduction"]
-  } ins(%lhs_slice, %rhs_slice : tensor<2x10xf32>, tensor<10x8xf32>) 
+  } ins(%lhs_slice, %rhs_slice : tensor<2x10xf32>, tensor<10x8xf32>)
     outs(%result_slice : tensor<2x8xf32>) -> tensor<2x8xf32> {
   ^bb0(%lhs_one: f32, %rhs_one: f32, %init_one: f32):
     %0 = arith.mulf %lhs_one, %rhs_one : f32
@@ -238,15 +238,15 @@ After materializing loops with tiling, another key code generation transformatio
 1. the subset (slice) of the operand that is used by the tile, and
 2. the tensor-level structured operation producing the whole tensor that is being sliced.
 
-By inverting the `indexing_map` and applying it to the set of elements accessed through the slice, we can compute the part of the iteration space of the operation defining the full tensor necessary to compute the tile. Thus fusion boils down to replacing the `tensor.extract_slice` operation with the tile of the `linalg.generic` producing the original operand. 
+By inverting the `indexing_map` and applying it to the set of elements accessed through the slice, we can compute the part of the iteration space of the operation defining the full tensor necessary to compute the tile. Thus fusion boils down to replacing the `tensor.extract_slice` operation with the tile of the `linalg.generic` producing the original operand.
 
 Let us assume that the matrix multiplication operation is followed by another operation that multiplies each element of the resulting matrix with itself. This trailing elementwise operation has a 2D iteration space, unlike the 3D one in matrix multiplication. Nevertheless, it is possible to tile the trailing operation and then fuse the producer of its operand, the matmul, into the loop generated by tiling. The untiled dimension will be used in its entirety.
 
 
 ```mlir
 // Same loop as before.
-%0 = scf.forall (%i, %j) in (4, 2) 
-     shared_outs(%shared = %init) 
+%0 = scf.forall (%i, %j) in (4, 2)
+     shared_outs(%shared = %init)
      -> (tensor<8x16xf32>, tensor<8x16xf32>) {
   // Scale the loop induction variables by the tile sizes.
   %1 = affine.apply affine_map<(d0) -> (d0 * 2)>(%i)
@@ -286,7 +286,7 @@ Let us assume that the matrix multiplication operation is followed by another op
     indexing_maps = [affine_map<(i, j) -> (i, j)>,
                      affine_map<(i, j) -> (i, j)>],
     iterator_types = ["parallel", "parallel"]
-  } ins(%partial : tensor<2x8xf32>)   
+  } ins(%partial : tensor<2x8xf32>)
    outs(%shared_slice : tensor<2x8xf32>) {
   ^bb0(%in: f32, %out: f32):
     %5 = arith.mulf %in, %in : f32
diff --git a/mlir/docs/Tutorials/transform/Ch3.md b/mlir/docs/Tutorials/transform/Ch3.md
index fa788d13e205..eeab77043a4e 100644
--- a/mlir/docs/Tutorials/transform/Ch3.md
+++ b/mlir/docs/Tutorials/transform/Ch3.md
@@ -139,7 +139,21 @@ void MyExtension::init() {
 ```
 
 This type is now directly available in the Transform dialect and can be used in operations.
+In the previous tablegen definition, the type of `$call` must be `Transform_ConcreteOp<“func.call”>`,
+By adding `CallOpInterfaceHandle` as an allowed type for `$call`, the corresponding handle
+is allowed to be to any op implementing the interface.
 
+```tablegen
+def ChangeCallTargetOp : ... {
+    let arguments = (ins
+    // Allow the handle to be to concrete `func.call` ops as well as any op implementing
+    // the `CallOpInterface`.
+    AnyTypeOf<[Transform_ConcreteOpType<"func.call">, CallOpInterfaceHandle]>:$call,
+    StrAttr:$new_target); 
+}
+```
+
+We can then add the following code to `sequence.mlir` and run it with the interpreter.
 
 ```mlir
   // Cast to our new type.
@@ -172,7 +186,7 @@ def CallToOp : Op<Transform_Dialect, "my.call_to_op",
   let results = (outs TransformHandleTypeInterface:$transformed);
 
   // Provide nice syntax.
-  let assemblyFormat = "$call attr-dict `:` functional-type(inputs, outputs)";
+  let assemblyFormat = "$call attr-dict `:` functional-type(operands, results)";
 
   // Declare the function implementing the interface for a single payload operation.
   let extraClassDeclaration = [{