首页 > 其他分享 >Template-based MLIR Compiler

Template-based MLIR Compiler

时间:2024-08-06 10:55:24浏览次数:16  
标签:based x86 aarch64 make results MLIR build Template spec

Template-based MLIR Compiler

The repository contains the sources for building the template-based MLIR compiler and the dependent LLVM sources (commit 5d4927 with some modifications). It compiles and executes MLIR programs consisting of supported operations (multiple sample programs are included; similar to mlir-cpu-runner); on first execution, it generates required templates and persists them.
Furthermore, the artifact contains the modified sources for LingoDB with integrated template-based code-generation backend and Polygeist (commit fd4194b) for conversion of C files to MLIR upstream dialect operations. Sample MLIR programs and scripts for preparing/running the benchmarks from Figures 2-5 are attached.

Benchmarks

Reproducible Artifact

Reproduction

  • as all dependent projects (lingodb, polygeist and our approach) require their own llvm version, reproducing the results requires building the llvm project three times.
  • experiments from the paper:
    • Microbenchmarks (x86 only)
    • PolyBenchC (x86 only)
    • LingoDB (x86 only)
    • Coremark (x86 and aarch64)
    • SPEC (x86 and aarch64)

Requirements

  • linux operating system on x86 and aarch64
  • podman container runtime
  • disk space: 40GB (x86); 20GB (aarch64)
  • DRAM: 32GB(x86); 16GB (aarch64)

Setup

Folder structure like:

src
 |- mlir-codegen <- run scripts inside here
 | |- spec-data (spec only; spec benchmark input data)
 | |- spec-programs (spec only; spec benchmark mlir programs)
 | \- results <- results will appear here
 |- llvm-project
 |- coremark
 |- lingo-db
 |- mlir-codegen-lingodb
 |- Polygeist
 \- PolybenchC-4.2.1
  1. everything should be run from inside the mlir-codegen directory
  2. build container for build and runtime environment
    podman build --squash . -t mlir-codegen-build
  3. run the build container and mount the above folder structure
    podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
  4. build the dependent projects (you might want to adjust the CMAKE_BUILD_PARALLEL_LEVEL environment variable to control the number of compiling jobs --- default is set to 2)
    make prepare-x86/make prepare-aarch64

On some hosts, we encountered a sporadic internal compiler error of gcc while building LLVM; in these cases rerun the target until it finishes successfully

  1. (spec only; run on target machine) prepare spec programs and data. As we can not distribute the SPEC benchmarks and data, there is some manual effort required. Export SPEC_BENCHSPEC_FOLDER environment variable to point to the benchspec folder of the unpacked spec benchmark data. Run make spec to create and fill the spec-data and spec-program folder.

On aarch64, the -m64 option must be first removed from benchspec/CPU/525.x264_r/src/simple-build-ldecod_r-525.sh, as gcc disallows the -m64 option on aarch64.

Execution

  1. run the benchmarks (except SPEC) using the architecture specific benchmark make targets (make benchmark-x86/make benchmark-aarch64). The spec benchmarks can be run on both architectures using make benchmark-spec. The benchmarks produce output log files in the result directory.

Visualization

  1. visualize the results in diagrams similar to the ones presented in the paper by using the make viz target. It produces output diagrams as pdf files in the results folder for whatever benchmark result files are present.

Summary

Reproduce all result diagrams in results folder (SPEC commands can be left out to skip reproduction of the SPEC results):

X86
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-x86
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-x86
make benchmark-spec
make viz
AArch64
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-aarch64
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-aarch64
make benchmark-spec
make viz

Evaluation and expected results

In general, the x86-64 benchmarks in the paper were run on an older server CPU, so latency of individual instructions as well as memory access costs might vary on modern desktop CPUs and thus end up with slightly different results. The following applies for both architectures - x86-64 and aarch64; the experiments reproduce compilation and execution times of the respective benchmarks visualized as diagrams similar to the ones presented in the paper. In general, there should be a one to two orders of magnitude difference in the compilation time of our approach compared to the LLVM backends and a slowdown below 3x (at least in the geomean) for the execution time.

Microbenchmarks

The experiment additionally reproduces the effect of the individual applied optimizations.
The individual optimizations might vary heavily based on the exact CPU. The faster the execution of the benchmarks the less difference between the individual stages. Depending on the memory access costs, the difference effectiveness of the register caching might be reduced (barely any improvement to template calling convention).

LingoDB

Faster systems might end up with a different speedup factor for total query runtime as execution time is a larger factor to our approach than it is to the others.

PolyBenchC, Spec and Coremark

Expect an order of magnitude between the three approaches on compilation time; similar results in the geomean for execution time. As the results are normalized to the execution of the optimized code generation backend of LLVM results might shift quite a bit for individual benchmarks as faster execution times for the baseline results in comparably high slowdowns for the other approaches.

标签:based,x86,aarch64,make,results,MLIR,build,Template,spec
From: https://www.cnblogs.com/hongyugao/p/18344718

相关文章

  • Python,volcengine-python-sdk,安装失败,提示which is required to install pyproject.to
    问题描述:我是安装截止20240804发布的最新版本,volcengine-python-sdk-1.0.94.tar.gz报错一报错最后提示:whichisrequiredtoinstallpyproject.toml-basedprojects...note:Thiserrororiginatesfromasubprocess,andislikelynotaproblemwithpip.ERROR:Fai......
  • react、vue组件编译区别&template解析原理
    react、vue组件打包编译为js时的区别1.react组件打包为js后,jsx会被编译为React.createElement.比如:antd的button.js(函数式组件直接returnjsx)constInternalButton=(props,ref)=>{//React.createElement第三个参数children一般兼容传数组和分开多个参数传递俩种形式......
  • Deep Learning-Based Multiclass Instance Segmentation for Dental Lesion Detection
    Abstract为此,我们提出了一种用于根尖周疾病检测的轻量级Mask-RCNN模型。该模型分为两部分构建:轻量级的改进MobileNet-v2骨干网和基于区域的网络(RPN),用于小数据集的根尖周疾病定位。为了测量所提出模型的有效性,轻量级的Mask-RCNN在包含五种不同类型根尖周围病变图像的自定义......
  • 开源模型应用落地-LangChain实用小技巧-ChatPromptTemplate的各种花样(三)
    一、前言  在当今的自然语言处理领域,LangChain框架因其强大的功能和灵活性而备受关注。掌握一些实用的小技巧,能够让您在使用LangChain框架时更加得心应手,从而更高效地开发出优质的自然语言处理应用。二、术语2.1.LangChain  是一个全方位的、基于大语言模型这......
  • TemplateSyntaxError 无法解析其余部分
    我的Django模板中有一点jinja2:{%forfilesystem,total_quota,total_usage,df_usageintotals_by_filesystem%}<tr><td>{{filesystem}}</span></td><td>{{total_quota|filesizeformat}}</td><td>{{to......
  • JdbcTemplate
    JdbcTemplate是Spring框架提供的一个用于简化JDBC操作的类。它处理了资源的创建和释放,使得开发者能够更专注于SQL语句本身和结果的处理。JdbcTemplate提供了大量的方法,用于执行各种类型的SQL语句,包括查询、更新、批处理、调用存储过程等。导入jar包<depende......
  • 将dynamicTemplate添加到谷歌云模板启动
    我们使用谷歌云功能通过以下方式启动模板:https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.templates/launch我们想添加一个通过具有以下布局的动态模板将请求的暂存位置:DYNAMICTEMPLATE={"gcsPath":GCSPATH,"stagingLocation"......
  • 论文阅读:BERT-Based Chinese Relation Extraction for Public Security
    模型框架包含一个BERT模型层(嵌入+编码+池化->得到句子的特征向量)、一个Dropout层(防止过拟合)。基于BERT的预训练模型BERT模型是通过注意力机制对训练集进行处理。然后,通过Embedding层和Encoder层加载预训练的词向量。最后,Pooling层使用BERT模型来训练两个句子。BERT嵌入层......
  • (8-6-05)优先级遍历(Priority-based Search)算法:基于tkinter的多算法路径规划程序(5)
    (7)函数breadth_first_search实现了广度优先搜索算法。它使用一个队列来存储待探索的节点,并通过迭代地从队列中取出节点来搜索路径。在搜索过程中,它会调用`add_neighbours`函数来添加节点的相邻节点,并在添加节点后继续搜索。当找到目标节点时,函数会停止搜索,并调用`paint`函数来......
  • 深入浅出WebRTC—LossBasedBweV2
    WebRTC同时使用基于丢包的带宽估计算法和基于延迟的带宽估计算法那,能够实现更加全面和准确的带宽评估和控制。基于丢包的带宽估计算法主要依据网络中的丢包情况来动态调整带宽估计,以适应网络状况的变化。本文主要讲解最新LossBasedBweV2的实现。1.静态结构LossBasedBweV2......