首页 > 系统相关 >GaussDB(DWS)性能调优,解决DM区大内存占用问题

GaussDB(DWS)性能调优,解决DM区大内存占用问题

时间:2024-07-04 09:10:58浏览次数:18  
标签:DWS LT1 DM JOIN GaussDB period id key ID

本文分享自华为云社区《GaussDB(DWS)性能调优:DM区优化案例——维度表关联条件存在会计期》,作者: O泡果奶~。

当前DM(P1、P3、CBGDM)存在维度表与主表关联时使用会计期作为关联条件,会导致出现大内存占用或未识别数据倾斜的问题

【场景一】f.period_id = 维度表.period_id

1.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,导致维度表未进行分区剪枝,可能会产生大内存占用的情况

1.2、【原始SQL】

仅呈现SQL中的问题,详细SQL见附件

FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND F.PERIOD_ID = PP.PERIOD_ID
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

1.3、【性能分析】

image.png
image.png
image.png
从上图的执行计划可以看出,由于用会计期作为关联条件,导致维度表未进行分区剪枝,数据量大,不但产生了数据倾斜,同时还由于数据量大出现了关联下盘,大大降低了sql执行性能。
主表只有一个会计期,可以识别出对应的会计期,然后对SQL进行如下改写:

FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND PP.PERIOD_ID = '202406'
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

经优化后,执行计划如下图所示,维度表进行了分区剪枝,数据量减少,缓解了数据倾斜,也避免了关联下盘的问题。
image.png
image.png

【场景二】f left join 维度表 on f.period_id = 维度表.period_id and 维度表.period_id = ‘会计期’

2.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,同时还为维度表会计期进行赋值,可能会产生数据倾斜未识别的情况

2.2、【原始SQL】

FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT1.period_id = LT2.period_id 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.period_id = LT3.period_id 
        AND LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.period_id = LT4.period_id 
        AND LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.period_id = LT5.period_id 
        AND LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.period_id = LT6.period_id 
        AND LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.period_id = LT7.period_id 
        AND LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.period_id = LT8.period_id 
        AND LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.period_id = LT9.period_id 
        AND LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.period_id = LT10.period_id 
        AND LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.period_id = LT11.period_id 
        AND LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.period_id = LT12.period_id 
        AND LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.period_id = LT13.period_id 
        AND LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.period_id = LT14.period_id 
        AND LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.period_id = LT15.period_id 
        AND LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.period_id = LT16.period_id 
        AND LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.period_id = LT17.period_id 
        AND LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.period_id = LT18.period_id 
        AND LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.period_id = LT19.period_id 
        AND LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.period_id = LT20.period_id 
        AND LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

2.3、【性能分析】

image.png
image.png
上图的执行计划可以看出,在主表一开始关联过程中就存在数据倾斜,导致SQL执行性能差。
image.png
image.png
详细执行计划中,虽然维度表进行了分区剪枝,但由于使用了 left join,导致关联条件中维度表的常量period_id不能直接赋值给主表period_id,主表关联后的结果重分布时将period_id作为了分布键之一,这会影响优化器的倾斜优化。
可以将f.period_id = 维度表.period_id这一关联条件删掉,对sql进行如下改写

FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

改写后,执行计划如下所示

image.png

可以看出,执行计划不但进行了分区剪枝,同时优化器还进行了倾斜优化,提高了SQL执行性能

点击关注,第一时间了解华为云新鲜技术~

 

标签:DWS,LT1,DM,JOIN,GaussDB,period,id,key,ID
From: https://www.cnblogs.com/huaweiyun/p/18282903

相关文章

  • Yi.Admin+Blazor上手教程系列2
    基础设施层,存放各种通用方法、ORM等领域共享层,存放枚举、常量、事件参数、对应用层的dto领域层,存放业务的内部实现(领域服务)、领域事件。对外也就是对应用层可以暴露实体、对应用层的dto应用抽象层,存放应用层抽象、系统对外的dto应用层,存放对业务的组装(领域服务),对外暴露的dto,主......
  • Yi.Admin+Blazor上手教程系列1
    1、clone项目后可以删除的实例文件2、认识各个模块Doman.Shared共享层最底层是Doman.Shared,共享层,这里存放各个常量、枚举、不依赖各个模块的通用类,Doman.Shared,共享层是一层很简单、不包含业务的模块,让它尽可能简单。Domain领域层只依赖Doman.Shared的Domain领域层,你的......
  • idm 支持断点续传吗 idm 断点续传如何使用 idm断点续传怎么解决 idm下载中断后无法继
    断点续传功能,让我再也不会惧怕下载大型文件。在断点续传的帮助下,用户可以随时暂停下载任务,并在空闲时继续之前的下载进程。下载文件不惧网络波动,断点续传让下载过程更稳定。有关idm支持断点续传吗,idm断点续传如何使用的问题,本文将进行详细介绍。一、idm支持断点续传吗最......
  • 【云原生】Kubernetes-kubeadm升级版本
    一、版本升级当我们要用到新版本的一些功能和特性的时候或者当前版本太旧无法满足需要的时候势必要对Kubernetes集群进行升级。1.1、升级Master节点1.1.1、腾空节点[root@master~]#kubectlgetnodeNAMESTATUSROLESAGEVERSIONmaster......
  • README-zh
    vue-admin-template这是一个极简的vueadmin管理后台。它只包含了ElementUI&axios&iconfont&permissioncontrol&lint,这些搭建后台必要的东西。线上地址国内访问目前版本为v4.0+基于vue-cli进行构建,若你想使用旧版本,可以切换分支到tag/3.11.0,它不依赖vue-c......
  • WPF Image scalertransform translatetransform mvvm,custom delegate command Comman
    //xaml<Windowx:Class="WpfApp187.MainWindow"xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"xmlns:d="http://schemas.mi......
  • BPI-M4 Berry HDMI测试
    HDMI转HDMI1、使用HDMI线连接屏幕和BPI-M4Berry2、板子启动后,观察HDMI屏幕是否有输出。PS:请注意,有的笔记本虽然有HDMI接口,但是接口只具备输出功能,没有输入功能。所以,使用前先确定好接口是否具备HDMIIN功能。如果启动后,屏幕没有显示,先检查一下线有没有接紧。如果HDMI线没......
  • 6、Django-管理员界面-admin
    概念:Django的admin界面是自动生成的,它根据你的模型类自动创建表单和列表视图。你只需将模型类注册到admin界面,就可以轻松地管理和操作数据库中的数据。admin界面提供了各种功能,包括:列表视图:以表格形式展示数据库中的数据,支持分页、搜索和排序功能,方便快速浏览和筛选数据。表......
  • OFDM的缺点与关键技术
    子载波间干扰英文简写ICI,ICI可能由各种原因引起在多径信道中,CP小于最大附加时延时收发系统载波频率偏差和采样偏差收发系统相对移动,存在多普勒频移ICI是制约OFDM系统性能的主要重要因素之一对频率偏差敏感----->同步技术,信道估计技术,ICI消除算法高的峰均比,对PA的线性要求......
  • OFDM关键技术——同步技术
    同一符号内各子载波受到不同的相位偏转及幅度衰减,k越大偏差越大;相位偏转与子载波序号k和符号序号i均有关,序号越大,相位偏转越大;引入了ICI,系统信噪比降低;当采样偏差效果累积到一定程度(每隔   个样点)会产生整数倍采样间隔偏差,即符号定时偏差,可能带来ISI。......