dbt docs generate 简单说明

时间：2024-05-01 09:04:23浏览次数：36

标签：docs self results generate catalog path manifest dbt

dbt docs generate 核心是获取dbt 项目的元数据信息（包含了project 的）以及相关table的（dbt 模型相关的），然后通过提供的解析页面进行显示
目前是基于静态处理的（先生成，然后基于纯web 的解析渲染）对于展示方法很多，可以基于dbt 的docs serve 命令也可以基于自己的静态web server （nginx 或者s3），以下简单说明下

内部处理

实际处理是基于GenerateTask 类的，此类继承了CompileTask

cli 装饰器
参考如下，可以看到，依赖了profile，runtime_config 以及manifest，尽管write=False 但是实际manifest 文件也是写入的，因为docs 依赖

@requires.profile

@requires.project

@requires.runtime_config

@requires.manifest(write=False)

GenerateTask

此类主要的方法是run，里边包含了编译，copy 静态资源（web的），获取catalog（table）以及write_manifest 写入的，代码比较清晰

class GenerateTask(CompileTask):

    def run(self) -> CatalogArtifact:

        compile_results = None

        if self.args.compile:

            compile_results = CompileTask.run(self)

            if any(r.status == NodeStatus.Error for r in compile_results):

                fire_event(CannotGenerateDocs())

                return CatalogArtifact.from_results(

                    nodes={},

                    sources={},

                    generated_at=datetime.utcnow(),

                    errors=None,

                    compile_results=compile_results,

        shutil.copyfile(

            DOCS_INDEX_FILE_PATH, os.path.join(self.config.project_target_path, "index.html")

        for asset_path in self.config.asset_paths:

            to_asset_path = os.path.join(self.config.project_target_path, asset_path)

            if os.path.exists(to_asset_path):

                shutil.rmtree(to_asset_path)

            if os.path.exists(asset_path):

                shutil.copytree(asset_path, to_asset_path)

        if self.manifest is None:

            raise DbtInternalError("self.manifest was None in run!")

        adapter = get_adapter(self.config)

        with adapter.connection_named("generate_catalog"):

            fire_event(BuildingCatalog())

            catalog_table, exceptions = adapter.get_catalog(self.manifest)

        catalog_data: List[PrimitiveDict] = [

            dict(zip(catalog_table.column_names, map(dbt.utils._coerce_decimal, row)))

            for row in catalog_table

        catalog = Catalog(catalog_data)

        errors: Optional[List[str]] = None

        if exceptions:

            errors = [str(e) for e in exceptions]

        nodes, sources = catalog.make_unique_id_map(self.manifest)

        results = self.get_catalog_results(

            nodes=nodes,

            sources=sources,

            generated_at=datetime.utcnow(),

            compile_results=compile_results,

            errors=errors,

        path = os.path.join(self.config.project_target_path, CATALOG_FILENAME)

        results.write(path)

        if self.args.compile:

            write_manifest(self.manifest, self.config.project_target_path)

        if exceptions:

            fire_event(WriteCatalogFailure(num_exceptions=len(exceptions)))

        fire_event(CatalogWritten(path=os.path.abspath(path)))

        return results

get_catalog 结合Manifest 获取catalog 处理

核心是编译Manifest 的schema 信息，之后通过数据库的查询获取实际的catalogs

def get_catalog(self, manifest: Manifest) -> Tuple[agate.Table, List[Exception]]:

    schema_map = self._get_catalog_schemas(manifest)

    with executor(self.config) as tpe:

        futures: List[Future[agate.Table]] = []

        for info, schemas in schema_map.items():

            if len(schemas) == 0:

                continue

            name = ".".join([str(info.database), "information_schema"])

            fut = tpe.submit_connected(

                self, name, self._get_one_catalog, info, schemas, manifest

            futures.append(fut)

        catalogs, exceptions = catch_as_completed(futures)

    return catalogs, exceptions

说明

对于web 解析部分大家可以参考下边的资料学习下，dagster 也有一个开源实现可以确保有自己快的加载解析速度

参考资料

core/dbt/task/generate.py
https://docs.getdbt.com/reference/commands/cmd-docs
https://github.com/dbt-labs/dbt-docs
https://github.com/dagster-io/supercharged-dbt-docs

标签：docs,self,results,generate,catalog,path,manifest,dbt
From： https://www.cnblogs.com/rongfengliang/p/18117117

dbt 自定义schema 简单说明
dbt的schema我们是可以灵活进行自定义的，可以实现一个比较有意思的事情使用场景模型级别的schema自定义seed数据schema自定义不同env或者vars的schema自定义不同targetschema的自定义schema自定义核心是generate_schema_name这个macro，我们可以自己定义参考自......
dbt Relation 扩展简单说明
dbt的Relation实际上就是包含关系数据库表，数据库，schema一些信息的描述，dbt官方提供了api.Relation.create等操作进行Relation的维护，当然系统的builtins也提供了一些基本的操作能力,内部使用上会使用dbtBaseRelation中的各类实现比如pg就自己扩展了一个，同时在自己的m......
dbt 自定义AdapterPlugin 中dependencies 简单说明
结合dbt-redshift的对于dependencies部分的定义以及使用简单说明下参考代码Plugin:AdapterPlugin=AdapterPlugin(adapter=RedshiftAdapter,#type:ignorecredentials=RedshiftCredentials,include_path=redshift.PACKAGE_PATH,dep......
dbt dbt-audit-helper 包提供的一些方便macro
dbt-audit-helper从字面意思是dbt的审计帮助工具，但是实际上我们也可以使用此工具做一些数据质量相关的东西dbt-audit-helper提供的macro比较数据输出包含了compare_relations,compare_queries,compare_row_counts比较列compare_column_values,cmpare_all_columns,com......
go generate ./... 含义
gogenerate./...是一个Go语言中的命令，用于在编译前自动执行代码生成任务。这个命令会遍历当前包及其子包中的所有源代码文件，查找所有包含特殊注释//go:generate的行。这些注释后面跟着的是应该执行的命令，用于生成额外的源代码、元数据或其他编译时所需的文件。执行g......
dbt seed 处理简单说明
dbt支持基于seed的快速建模处理（比较适合测试环境使用），我们只需要提供csv格式的文件，之后执行dbtseed就会创建对应的模型，之后我们就可以在dbt模型中引用了，以下简单说明下内部实现以及处理参考使用seed文件位置一般我们会在dbt项目的seed目录中放对应的seed文件，就是......
推荐一款好用的文档工具：docsify
docsify是什么docsify可以快速帮你生成文档网站。不同于GitBook、Hexo的地方是它不会生成静态的.html文件，所有转换工作都是在运行时。如果你想要开始使用它，只需要创建一个index.html就可以开始编写文档并直接部署在GitHubPages。编写一些团队内部研发规范、api接口文档......
dbt 使用adapter.dispatch 进行macro 的覆盖处理
adapter.dispatch是一个很方便的功能，可以实现方法的重载，对于不同环境可以使用不同的macro，以下是一个简单示例macro定义appdemo.sql注意在macros目录下，当然可以修改{%macrodemo(name,age)%}#注意此处我没有指定，namespace或者package,对于package开......
dbt docs block 简单说明
dbtdocsblock是一个jinja2bblock的扩展，以下是一个简单的说明参考使用定义{%docstable_events%} Thistablecontainsclickstreameventsfromthemarketingwebsite. TheeventsinthistablearerecordedbySnowplowandpipedintothewa......
dbt doc 函数内部处理简单说明
dbt提供了一个方便的doc函数，可以方便的使用类似ref模式进行docsblock定义的引用引用参考处理示例version:2models:-name:eventsdescription:'{{doc("table_events")}}'columns:-name:event_iddescription:......

dbt docs generate 简单说明

内部处理

说明

参考资料

相关文章

赞助商

阅读排行