buildGlobalOptimizationPassPipeline
-
IREE::Util::createSimplifyGlobalAccessesPass
这个pass主要做这几件事:- 将不可变
global tensor
的 load 提前到了 block 的开头,将global tensor
的 store 安全地挪到 block 的结尾。 - 进行以下化简:
- 如果
load after store
,则把 load 直接替换成 store 的 source。比如,
转换成,store %0, @p %1 = load @p return %1
store %0, @p return %0
- 如果
store after store
,则直接消除前一个 store
转换成,store %0, @p store %1, @p
store %1, @p
- 如果
load after load
,则消除后一个 load
转换成,%0 = load @p %1 = load @p return %1
%0 = load @p return %0
- 如果
- 将不可变
-
IREE::Util::createApplyPatternsPass
执行IREE::Util dialect ODS
中定义的Canonicalization Patterns
,并执行 block 和跳转命令参数化简操作。- block 参数化简
br ^bb1(%0, %0 : index, index) ^bb1(%arg0: index, %arg1: index): ...
折叠相同的参数,化简为
br ^bb1(%0 : index) ^bb1(%arg0: index): // %arg1 remapped to %arg0 ...
- 跳转命令参数消除
func.func @foo(%arg0: index) { br ^bb1(%arg0 : index) ^bb1(%0: index): ... }
消除参数后,
func.func @foo(%arg0: index) { br ^bb1 ^bb1: // %0 remapped to %arg0 ... }
-
IREE::Util::createFoldGlobalsPass
这个 pass 继续对global tensor
的 load 和 store 操作进行优化,主要包括:- 内联常量 store,比如
util.global mutable @a : i32 func.func @fool { %c5 = arith.constant 5 : i32 util.global.store %c5, @a : i32 return }
转换成,
util.global @a = 5 : i32
- 內联常量 load,比如
util.global @a = 5 : i32 func.func @fool { %1 = util.global.load @a : i32 ... }
转换成,
func.func @fool { %1 = arith.constant 5 : i32 ... }
- 重命名互为链式的
global tensor
。 - 如果一个
mutable global tensor
只在 init 函数中被 store 过,则将它修改为 immutable。 - 删除没有 load 过的
global tensor
。 - 合并相同初始值的
immutable global tensor
-
IREE::Flow::createTensorPadToTensorInsertSlicePass
将tensor.pad
转换为linalg.fill + tensor.insert_slice
。func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<1x1xf32> %padded = tensor.pad %0 low[1, 2] high[3, 4] { ^bb0(%arg1: index, %arg2: index): tensor.yield %cst : f32 } : tensor<1x1xf32> to tensor<5x7xf32> %1 = hal.tensor.export %padded : tensor<5x7xf32> -> !hal.buffer_view return %1 : !hal.buffer_view }
转换为,
func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<1x1xf32> %1 = tensor.empty() : tensor<5x7xf32> %2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<5x7xf32>) -> tensor<5x7xf32> %inserted_slice = tensor.insert_slice %0 into %2[1, 2] [1, 1] [1, 1] : tensor<1x1xf32> into tensor<5x7xf32> %3 = hal.tensor.export %inserted_slice : tensor<5x7xf32> -> !hal.buffer_view return %3 : !hal.buffer_view }
-
mlir::createConvertElementwiseToLinalgPass
把 elementwise 算子(带有Elementwise traits
的 op)转换成linalg generic op
,方便后续对elementwise op
做算子融合。arith dialect
和math dialect
的 op 都是 Elementwise 的,所以实际上这个 pass 会把arith dialect
和math dialect lower
到linalg dialect
。func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x3xf32> %1 = arith.addf %0, %0 : tensor<2x3xf32> %2 = hal.tensor.export %1 : tensor<2x3xf32> -> !hal.buffer_view return %2 : !hal.buffer_view }
转换成,
func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x3xf32> %1 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%0, %0 : tensor<2x3xf32>, tensor<2x3xf32>) outs(%0 : tensor<2x3xf32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %3 = arith.addf %in, %in_0 : f32 linalg.yield %3 : f32 } -> tensor<2x3xf32> %2 = hal.tensor.export %1 : tensor<2x3xf32> -> !hal.buffer_view return %2 : !hal.buffer_view }
-
mlir::createLinalgFoldUnitExtentDimsPass
消除长度为 的维度或者循环。func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<1x3xf32> %1 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%0 : tensor<1x3xf32>) outs(%0 : tensor<1x3xf32>) { ^bb0(%in: f32, %out: f32): %3 = arith.addf %in, %in : f32 linalg.yield %3 : f32 } -> tensor<1x3xf32> %2 = hal.tensor.export %1 : tensor<1x3xf32> -> !hal.buffer_view return %2 : !hal.buffer_view }
转换成,
func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<1x3xf32> %collapsed = tensor.collapse_shape %0 [[0, 1]] : tensor<1x3xf32> into tensor<3xf32> %collapsed_0 = tensor.collapse_shape %0 [[0, 1]] : tensor<1x3xf32> into tensor<3xf32> %1 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%collapsed : tensor<3xf32>) outs(%collapsed_0 : tensor<3xf32>) { ^bb0(%in: f32, %out: f32): %3 = arith.addf %in, %in : f32 linalg.yield %3 : f32 } -> tensor<3xf32> %expanded = tensor.expand_shape %1 [[0, 1]] : tensor<3xf32> into tensor<1x3xf32> %2 = hal.tensor.export %expanded : tensor<1x3xf32> -> !hal.buffer_view return %2 : !hal.buffer_view }
linalg.generic
由 2 层循环缩减成了单层循环 -
createInterchangeGenericOpsPass
循环维度变换。将 reduction 循环维度交换到最内层,相应的 parallel 循环维度被交换到外层。// sum(%arg0: tensor<2x3xf32>, 0) -> tensor<3xf32> func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x3xf32> %1 = tensor.empty() : tensor<3xf32> %2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<3xf32>) -> tensor<3xf32> %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>], iterator_types = ["reduction", "parallel"]} ins(%0 : tensor<2x3xf32>) outs(%2 : tensor<3xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<3xf32> %4 = hal.tensor.export %3 : tensor<3xf32> -> !hal.buffer_view return %4 : !hal.buffer_view }
交换循环之后转换成,
func.func @foo(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x3xf32> %1 = tensor.empty() : tensor<3xf32> %2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<3xf32>) -> tensor<3xf32> %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%0 : tensor<2x3xf32>) outs(%2 : tensor<3xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<3xf32> %4 = hal.tensor.export %3 : tensor<3xf32> -> !hal.buffer_view return %4 : !hal.buffer_view }
-
memref::createResolveShapedTypeResultDimsPass
-
mlir::createCanonicalizerPass
-
mlir::createCSEPass
-
createFusionOfTensorOpsPass
主要做 elementwise 的算子融合,其次也会将tensor.expand_shape
转换成linalg generic op
,方便进行算子融合。elementwise 算子融合的条件:
- producer 和 comsumer 都是
linalg generic op
,且都为 tensor 语义。 - producer 只有一个 user。
- producer 所有维度的迭代类型都是 parallel,consumer 的 index map 必须和 producer 具有相同的循环嵌套层数。
- producer 结果的 index map 必须是 Permutation,即结果的每个元素有且仅 store 一次(输出是 pointwise 的)。
- consumer 可以包含 reduction 迭代类型,但需要保证融合后输入的 index map 可以覆盖每一个迭代维度,理由是如果缺失就无法确定该维度的循环边界。
// reduce(mul(arg0, arg1), 0) // for (int d0 = 0; d0 < n; ++d0) { // temp[d0] = arg0[d0] * arg1[d0]; // } // result = 0; // for (int d0 = 0; d0 < n; ++d0) { // result += temp[d0]; // } func.func @foo(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<2xf32> %2 = tensor.empty() : tensor<2xf32> %3 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%0, %1 : tensor<2xf32>, tensor<2xf32>) outs(%2 : tensor<2xf32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %8 = arith.mulf %in, %in_0 : f32 linalg.yield %8 : f32 } -> tensor<2xf32> %4 = tensor.empty() : tensor<f32> %5 = linalg.fill ins(%cst : f32) outs(%4 : tensor<f32>) -> tensor<f32> %6 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>], iterator_types = ["reduction"]} ins(%3 : tensor<2xf32>) outs(%5 : tensor<f32>) { ^bb0(%in: f32, %out: f32): %8 = arith.addf %in, %out : f32 linalg.yield %8 : f32 } -> tensor<f32> %7 = hal.tensor.export %6 : tensor<f32> -> !hal.buffer_view return %7 : !hal.buffer_view }
融合mul和reduce之后转换成,
// result = 0; // for (int d0 = 0; d0 < n; ++d0) { // result += arg0[d0] * arg1[d0]; // } func.func @foo(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<2xf32> %2 = tensor.empty() : tensor<f32> %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<f32>) -> tensor<f32> %4 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>], iterator_types = ["reduction"]} ins(%0, %1 : tensor<2xf32>, tensor<2xf32>) outs(%3 : tensor<f32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %6 = arith.mulf %in, %in_0 : f32 %7 = arith.addf %6, %out : f32 linalg.yield %7 : f32 } -> tensor<f32> %5 = hal.tensor.export %4 : tensor<f32> -> !hal.buffer_view return %5 : !hal.buffer_view }
- producer 和 comsumer 都是
-
mlir::createLinalgDetensorizePass
将 0-D Tensor 转换为它的基础元素类型。 -
mlir::createCanonicalizerPass
-
mlir::createCSEPass
-
createSplitReductionPass
将 matmul 和 topk 的单次 reduce 分成两次 reduce 操作(一次 batch matmul 和一次 add)。默认不开启,设置--iree-flow-split-matmul-reduction>=2
可开启。func.func @test(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<128x256xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<256x256xf32> %2 = linalg.init_tensor [128, 256] : tensor<128x256xf32> %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<128x256xf32>) -> tensor<128x256xf32> %4 = linalg.matmul ins(%0, %1 : tensor<128x256xf32>, tensor<256x256xf32>) outs(%3 : tensor<128x256xf32>) -> tensor<128x256xf32> %5 = hal.tensor.export %4 : tensor<128x256xf32> -> !hal.buffer_view return %5 : !hal.buffer_view }
--iree-flow-split-matmul-reduction=2
转换成,func.func @test(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<128x256xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<256x256xf32> %2 = linalg.init_tensor [128, 256] : tensor<128x256xf32> %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<128x256xf32>) -> tensor<128x256xf32> %4 = tensor.expand_shape %0 [[0], [1, 2]] : tensor<128x256xf32> into tensor<128x2x128xf32> %5 = tensor.expand_shape %1 [[0, 1], [2]] : tensor<256x256xf32> into tensor<2x128x256xf32> %6 = linalg.init_tensor [2, 128, 256] : tensor<2x128x256xf32> %7 = linalg.fill ins(%cst : f32) outs(%6 : tensor<2x128x256xf32>) -> tensor<2x128x256xf32> %8 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d1, d0, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%4, %5 : tensor<128x2x128xf32>, tensor<2x128x256xf32>) outs(%7 : tensor<2x128x256xf32>) attrs = {__internal_linalg_transform__ = "SPLIT", linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>]} { ^bb0(%arg2: f32, %arg3: f32, %arg4: f32): %11 = arith.mulf %arg2, %arg3 : f32 %12 = arith.addf %arg4, %11 : f32 linalg.yield %12 : f32 } -> tensor<2x128x256xf32> %9 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1, d2)>], iterator_types = ["reduction", "parallel", "parallel"]} ins(%8 : tensor<2x128x256xf32>) outs(%3 : tensor<128x256xf32>) attrs = {__internal_linalg_transform__ = "SPLIT"} { ^bb0(%arg2: f32, %arg3: f32): %11 = arith.addf %arg2, %arg3 : f32 linalg.yield %11 : f32 } -> tensor<128x256xf32> %10 = hal.tensor.export %9 : tensor<128x256xf32> -> !hal.buffer_view return %10 : !hal.buffer_view }
-
createInterchangeGenericOpsPass
循环维度变换。将 reduction 循环维度交换到最内层,相应的 parallel 循环维度被交换到外层。 -
createInterchangeTransposeGenericOpsPass
当输入 indexing map 是 permutation 时,交换循环维度使得输入的 indexing map 是 identity 的,其作用是使得输入尽可能变成连续访存。 -
createDispatchWithTransformDialect
根据transform dialect
对算子进行调度和派遣,需要另外加载一个transform dialect
的 module 文件,默认不做该变换。transform dialect
定义了一套调度规则,用于引导目标 IR 进行变换,比如循环展开、tiling 等。 -
createFormDispatchRegionsPass
以包含reduction loop
的linalg op
或named linalg op
为中心(root),按一定规则合并 producers 和 comsumers,划分出dispatch region
子图。dispatch region
是 IREE 中的原子执行单元,dispatch region
内部可以直接复用输入和输出的内存,从而避免了内部的内存分配操作,内存分配只发生在dispatch region
的边界,同时dispatch region
之间会自动插入同步操作。func.func @predict(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x10xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<10x5xf32> %2 = hal.tensor.import %arg2 : !hal.buffer_view -> tensor<5xf32> %3 = tensor.empty() : tensor<2x5xf32> %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2x5xf32>) -> tensor<2x5xf32> %5 = linalg.matmul ins(%0, %1 : tensor<2x10xf32>, tensor<10x5xf32>) outs(%4 : tensor<2x5xf32>) -> tensor<2x5xf32> %6 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%5, %2 : tensor<2x5xf32>, tensor<5xf32>) outs(%3 : tensor<2x5xf32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %8 = arith.addf %in, %in_0 : f32 linalg.yield %8 : f32 } -> tensor<2x5xf32> %7 = hal.tensor.export %6 : tensor<2x5xf32> -> !hal.buffer_view return %7 : !hal.buffer_view }
转换成,
func.func @predict(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} { %cst = arith.constant 0.000000e+00 : f32 %0 = hal.tensor.import %arg0 : !hal.buffer_view -> tensor<2x10xf32> %1 = hal.tensor.import %arg1 : !hal.buffer_view -> tensor<10x5xf32> %2 = hal.tensor.import %arg2 : !hal.buffer_view -> tensor<5xf32> %3 = tensor.empty() : tensor<2x5xf32> %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2x5xf32>) -> tensor<2x5xf32> %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %c2 = arith.constant 2 : index %c1_0 = arith.constant 1 : index %5 = affine.apply affine_map<()[s0, s1, s2] -> ((s1 - s0) ceildiv s2)>()[%c0, %c2, %c1_0] %c0_1 = arith.constant 0 : index %c5 = arith.constant 5 : index %c1_2 = arith.constant 1 : index %6 = affine.apply affine_map<()[s0, s1, s2] -> ((s1 - s0) ceildiv s2)>()[%c0_1, %c5, %c1_2] %7 = flow.dispatch.region[%5, %6] -> (tensor<2x5xf32>) { %9 = linalg.matmul ins(%0, %1 : tensor<2x10xf32>, tensor<10x5xf32>) outs(%4 : tensor<2x5xf32>) -> tensor<2x5xf32> %10 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%9, %2 : tensor<2x5xf32>, tensor<5xf32>) outs(%3 : tensor<2x5xf32>) { ^bb0(%in: f32, %in_3: f32, %out: f32): %11 = arith.addf %in, %in_3 : f32 linalg.yield %11 : f32 } -> tensor<2x5xf32> flow.return %10 : tensor<2x5xf32> } count(%arg3: index, %arg4: index) -> (index, index, index) { %x, %y, %z = flow.dispatch.workgroup_count_from_dag_root %arg3, %arg4 flow.return %x, %y, %z : index, index, index } %8 = hal.tensor.export %7 : tensor<2x5xf32> -> !hal.buffer_view return %8 : !hal.buffer_view }
-
createFormDispatchWorkgroupsPass
将dispatch region
转换成dispatch work group
的形式,并将 cloneable 的 op(比如tensor.fill
、tensor.empty
等)拷贝到 work group 中。如果在linalg
层做了tiling
,该 pass 也会把tiling
引入的tensor.extract_slice
和tensor.insert_slice
尽可能转换成flow.tensor.slice和flow.tensor.update
,转换不了的后续再转换成flow.dispatch.tensor.load
和flow.dispatch.tensor.store
-
createCaptureDispatchDynamicDimsPass
由于flow.dispatch.workgroups
的参数中动态形状 tensor 被替换成了!flow.dispatch.tensor
和相应的动态维度 index,该 pass 捕获 workgroups 参数中的动态维度 index,插入flow.dispatch.tie_shape
将参数中的动态维度 index 和!flow.dispatch.tensor
进行绑定。 -
mlir::createCanonicalizerPass
-
createCSEPass
-
createInitializeEmptyTensorsPass
如果tensor.empty op
的 user 中存在非 linalg 或 IREE LinalgExt op,则把该tensor.empty op
转换成flow.tensor.empty
或flow.tensor.splat op
。 -
IREE::Flow::createOutlineDispatchRegionsPass
把每个dispatch region
转换成flow.executable + flow.dispatch op
。 -
IREE::Util::createStripDebugOpsPass
消除DebugOnly op。 -
mlir::createCanonicalizerPass
-
IREE::Flow::createDeduplicateExecutablesPass
消除重复的flow.executable
。 -
IREE::Flow::createInjectDispatchTracingPass
注入跟踪运行时 dispatch 函数输入和输出信息的 op。默认不开启。 -
IREE::Flow::createCleanupTensorShapesPass
删除flow.tensor.tie_shape op
,并确认 module 中不再包含tensor.dim
和tensor.rank
这两类形状查询 op。 -
mlir::createCanonicalizerPass
-
mlir::createCSEPass
-
mlir::createCanonicalizerPass
-
mlir::createCSEPass
-
mlir::createSymbolDCEPass
未完待续…
标签:hal,tensor,linalg,buffer,编译,f32,buildGlobalOptimizationPassPipeline,iree,view From: https://blog.csdn.net/qq_38342510/article/details/140759512