首页 > 其他分享 >使用presto 进行跨库数据对比

使用presto 进行跨库数据对比

时间:2023-06-21 15:22:05浏览次数:57  
标签:name presto column 跨库 list db table array 对比

 

明细+汇总

 

 1 SELECT 
 2 concat(
 3 'select a.* from ('
 4 ,concat(' ' ,array_join (TRANSFORM(table_catalog_schema ,_->concat('select id, map(array[' ,column_name_str ,'],split(' ,array_join(TRANSFORM(column_name_list ,_->concat('coalesce(cast('    ,CASE WHEN element_at(column_type_list,array_position(column_name_list,_)) LIKE 'decimal%' OR element_at(column_type_list,array_position(column_name_list,_)) ='double' THEN  concat('REGEXP_REPLACE(cast(',_,' as varchar),','''([0]+$)''',',','''''',')') WHEN element_at(column_type_list,array_position(column_name_list,_)) LIKE 'timestam%' OR element_at(column_type_list,array_position(column_name_list,_)) LIKE 'datetime' THEN  concat('substring( cast(',_,' as varchar) ,1,19)') WHEN element_at(column_type_list,array_position(column_name_list,_)) = 'json'  THEN  concat('json_format(json ','''{"":""}''',')') ELSE _  END    ,' as varchar)',',',chr(39),'NULL',chr(39),')')),concat('||','''^''','||'))  ,' ,''^''))' ,' from ',_,'.',table_name  ,' where '  , CASE WHEN contains(column_name_list,'create_time')  THEN concat( 'create_time',' between ' ,'date_trunc(''day'',current_timestamp -INTERVAL ''1'' DAY  )' ,' and ' ,'date_add(''HOUR'',-1,cast(date_format(current_timestamp,''%Y-%m-%d %H:00:00'') AS timestamp))') ELSE '1=1' END  ,CASE WHEN array_position(table_catalog_schema,_)<> dcnt THEN ' union all ' ELSE ' ' END  )),' '))                                            
 5 ,') a left join  ('
 6 ,concat(' ',concat('select id, map(array[',column_name_str ,'],split(' ,array_join(TRANSFORM(column_name_list ,_->concat('coalesce(cast('    ,CASE WHEN element_at(column_type_list,array_position(column_name_list,_)) LIKE 'decimal%' OR element_at(column_type_list,array_position(column_name_list,_)) ='double' THEN  concat('REGEXP_REPLACE(cast(',_,' as varchar),','''([0]+$)''',',','''''',')') WHEN element_at(column_type_list,array_position(column_name_list,_)) LIKE 'timestam%' OR element_at(column_type_list,array_position(column_name_list,_)) LIKE 'datetime' THEN  concat('substring( cast(',_,' as varchar) ,1,19)') WHEN element_at(column_type_list,array_position(column_name_list,_)) = 'json'  THEN  concat('json_format(json ','''{"":""}''',')') ELSE _  END    ,' as varchar)',',',chr(39),'NULL',chr(39),')')),concat('||','''^''','||')) ,' ,''^''))' ,' from doris.',doris_tabName  ,' where ' , CASE WHEN contains(column_name_list,'create_time') THEN concat( 'create_time',' between ','date_trunc(''day'',current_timestamp -INTERVAL ''1'' DAY  )',' and ','date_add(''HOUR'',-1,cast(date_format(current_timestamp,''%Y-%m-%d %H:00:00'') AS timestamp))') ELSE '1=1' END ,' and msg_is_valid =1 ' ))
 7 ,') b on a.id=b.id WHERE b.id IS NULL '  
 8 ) detal_SQL  -- 根据明细对比的SQL
 9 ,concat(
10     'select a.*,b.*,a.cnt-b.cnt from (' 
11     ,array_join (TRANSFORM(table_catalog_schema,_->concat('select  ''',substr(_,strpos(_,'.')+1)||'.'||table_name,''' cate_log_schema, count(1) cnt  from ',_,'.',table_name ,' where ' , CASE WHEN contains(column_name_list,'create_time') THEN concat( 'create_time',' between ','date_trunc(''day'',current_timestamp -INTERVAL ''1'' DAY  )',' and ','date_add(''HOUR'',-1,cast(date_format(current_timestamp,''%Y-%m-%d %H:00:00'') AS timestamp))') ELSE '1=1' END ,CASE WHEN array_position(table_catalog_schema,_)<> dcnt THEN ' group by 1 union all ' ELSE ' group by 1' END )),' ')
12     ,') a left join ( '    
13     ,concat('select  concat(msg_source_db,''.'',msg_source_table) cate_log_schema',', count(1) cnt  from doris.',doris_tabName ,' where ' , CASE WHEN contains(column_name_list,'create_time') THEN concat( 'create_time',' between ','date_trunc(''day'',current_timestamp -INTERVAL ''1'' DAY  )',' and  ','date_add(''HOUR'',-1,cast(date_format(current_timestamp,''%Y-%m-%d %H:00:00'') AS timestamp))') ELSE '1=1' END ,' and msg_is_valid =1  group by 1' )
14     ,') b on a.cate_log_schema=b.cate_log_schema '
15 ) agg_Sql_Str  -- 根据汇总数据对比的Sql
16 FROM (
17 SELECT table_name
18 ,max(doris_tabName) doris_tabName
19 --,array_agg(DISTINCT doris_tabName) doris_tabName
20 ,array_agg(DISTINCT  concat(table_catalog,'.',table_schema)) table_catalog_schema
21 ,count(DISTINCT  concat(table_catalog,'.',table_schema)) dcnt
22 ,max(column_name_list) column_name_list
23 ,max(column_name_str) column_name_str
24 ,max(columnCnt) columnCnt
25 ,max(column_type_list) column_type_list
26 FROM (
27 SELECT table_catalog,table_schema,table_name, array_join(array_agg(concat('''',column_name,'''')),',') column_name_str,array_agg(column_name ORDER BY ordinal_position) column_name_list ,count(column_name) columnCnt ,array_agg(data_type ORDER BY ordinal_position ) column_type_list FROM oms4_som01.information_schema.columns WHERE table_schema NOT IN ('information_schema','sys','pt','maxwell','db_oms4_som998','db_ecs_oms4_som_nike_prod') GROUP BY 1,2,3
28 UNION ALL  SELECT table_catalog,table_schema,table_name, array_join(array_agg(concat('''',column_name,'''')),',') column_name_str,array_agg(column_name ORDER BY ordinal_position) column_name_list ,count(column_name) columnCnt ,array_agg(data_type ORDER BY ordinal_position ) column_type_list  FROM oms4_som02.information_schema.columns  WHERE table_schema NOT IN ('information_schema','sys','pt','maxwell','db_oms4_som999') GROUP BY 1,2,3
29 UNION ALL  SELECT table_catalog,table_schema,table_name, array_join(array_agg(concat('''',column_name,'''')),',') column_name_str,array_agg(column_name ORDER BY ordinal_position) column_name_list ,count(column_name) columnCnt ,array_agg(data_type ORDER BY ordinal_position ) column_type_list  FROM oms4_baseinfo.information_schema.columns  WHERE table_schema NOT IN ('db_sec_compare_prod') GROUP BY 1,2,3
30 UNION ALL  SELECT table_catalog,table_schema,table_name, array_join(array_agg(concat('''',column_name,'''')),',') column_name_str,array_agg(column_name ORDER BY ordinal_position) column_name_list ,count(column_name) columnCnt ,array_agg(data_type ORDER BY ordinal_position ) column_type_list FROM oms4_finance.information_schema.columns   WHERE table_schema NOT IN ('db_sec_compare_prod') GROUP BY 1,2,3
31 )t INNER JOIN (
32 SELECT *
33 FROM (VALUES -- (doris表名, 源库表明)
34 ('db_ods.oms4_db_base_info_bi_brand','bi_brand'),
35 ('db_ods.oms4_db_base_info_bi_sales_entity_brand_ref','bi_sales_entity_brand_ref'),
36 ('db_ods.oms4_db_base_info_bi_sys_goods_tax_rate','bi_sys_goods_tax_rate'),
37 ('db_ods.oms4_rf_refund','rf_refund'),
38 ('db_ods.oms4_db_oms4_som_trade_order_guide_info','trade_order_guide_info'),
39 ('db_ods.oms4_db_oms4_som_trade_order_item','trade_order_item'),
40 ('db_ods.oms4_db_oms4_som_trade_order_member_info','trade_order_member_info'),
41 ('db_ods.oms4_db_oms4_som_trade_order_payment_info','trade_order_payment_info'),
42 ('db_ods.oms4_db_oms4_som_trade_return_order','trade_return_order')
43 --('db_ods.oms4_db_oms4_som_trade_order_guide_info','trade_order_guide_info'),
44 --('db_ods.oms4_db_oms4_som_trade_order_delivery_info','trade_order_delivery_info'),
45 --('db_ods.oms4_db_oms4_som_so_order_timing_promise','so_order_timing_promise')
46 ) AS tab (doris_tabName,src_db_tabName)
47 ) s ON t.table_name=s.src_db_tabName
48 GROUP BY table_name
49 ) tt
View Code

 

标签:name,presto,column,跨库,list,db,table,array,对比
From: https://www.cnblogs.com/linbo3168/p/17494197.html

相关文章

  • h265编码是什么意思,H265的编码和H264的编码有什么区别,视频编码h.264和mpeg4的对比
    MPEG-4标准则是基于对象和内容的编码方式,和传统的图像帧编码方式不同,它只处理图像帧与帧之间的差异元素,抛弃相同图像元素,因此大大减少了合成多媒体文件的体积,从而以较小的文件体积同样可得到高清晰的还原图像。换句话说,相同的原始图像,MPEG-4编码标准具有更高的压缩比。H.264编码技......
  • 对比2023年最受欢迎的8款项目管理工具
    在当今的工作环境中,项目管理软件已成为提升团队效率、推动项目进度的重要工具。从任务分配到协同合作,从进度追踪到文档共享,一个好用的项目管理工具能大大提高工作效率,简化项目流程。但是,在众多项目管理软件中,我们应该如何选择呢?哪一个才是我们真正需要的呢?在本文中,我们将对比2023......
  • Bash_Perl_Tcl语法对比
    system:Nowyouareaprogrammingexpert.IwillaskyougrammarquestionsandyouwillprovideacomparisontableofthesyntaxforBash,Tcl,andPerlinmarkdownformatbasedonmyquestions.PleasereplyinChinese.Thankyou.user:变量assistant:......
  • 声明式API replica controller vs replica set 对比
    1.在命令式API中,你可以直接发出服务器要执行的命令,例如: “运行容器”、“停止容器”等。在声明性API中,你声明系统要执行的操作,系统将不断向该状态驱动。可以想象成手动驾驶和自动驾驶系统。(多了会删除,少了会自动增加)  因此,在Kubernetes中,你创建一个API对象(使用CLI或RESTAPI)来表......
  • 各种数据类型的取值范围对比
    char:-128--127  unsignedchar: 0--255 byte:-2^7~2^7-1,即-128~127。1字节。Byte。末尾加Bshort:-2^15~2^15-1,即-32768~32767。2字节。Short。末尾加S有符号int:-2^31~2^31-1,即-2147483648~2147483647。4字节。Integer。无符号int:0~2^32-1。long:-2^63~......
  • 相同更改数据量的前提下,单次COMMIT和多次COMMIT对日志空间浪费的影响对比
    LGWR进程按照顺序写在线日志,中间不会跳跃,而且LGWR进程不会在同一个日志快写2次,即使一次写入的日志快只占几个字节,下次不会再用了,这就造成日志空间的浪费。Oracle做一次Commit,就会触发LGWR进程进行日志缓冲到日志文件的写入操作,因此可以说更改相同数据量的前提下,如果提交过于频繁,产......
  • 【对比】ChatGPT Plus与ChatGPT实操对比体验
    前言......
  • C++创建对象的4种方式对比
    三种方式分别为等号,圆括号,大括号,最后一种是等号和大括号一起使用等号和圆括号是以前就存在的,大括号是新的语法,其具有2个优点:防止变窄转换:大括号不支持变窄转换,等号和圆括号为了向下兼容支持变窄转换免疫C++最令人头疼的解析:C++规定任何可以被解析为一个声明的东西必......
  • FireDac三种方式批量添加数据的性能对比
    我有个程序,需要从CSV中读入数据,对数据进行分析后,然后插入另一个sqlite数据库的数据表。在我的程序中使用了virtualstringtree和Firedac,数据大约有13000条,需要转存的数据有11000条左右,转存的字段有8条,除了一条是boolean类型的外都是string类型。1、直接插入记录我刚开始采用的......
  • Python - Java vs Python对比
     工具类最大堆,最小堆Java-PriorityQueue<T>https://www.cnblogs.com/frankcui/p/12125210.html#_label23 Python- heapq库https://blog.csdn.net/wangws_sb/article/details/124108070 注意:python中的heapq库只有最小堆,没有最大堆,当使用最大堆时,可以在插入元素时......