首页 > 其他分享 >【Hive】MapJoin限制场景之一(MapJoin Followed by Union)

【Hive】MapJoin限制场景之一(MapJoin Followed by Union)

时间:2022-11-10 21:01:13浏览次数:39  
标签:Followed Union MapJoin act date table

举例说明Hive实现MapJoin限制场景之一,MapJoin Followed by Union

Hive的MapJoin逻辑会有几个限制场景:

  • Union Followed by a MapJoin
  • Lateral View Followed by a MapJoin
  • Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By) Followed by MapJoin
  • MapJoin Followed by Union
  • MapJoin Followed by Join
  • MapJoin Followed by MapJoin

官网链接:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions

实际举例MapJoin限制场景之一,MapJoin Followed by Union

不能实现Map Join

set hive.auto.convert.join=true; -- 显式设置mapjoin
SELECT pvid
     , req_time
FROM dataware.table_a a
LEFT JOIN
     (   
         SELECT
             id, act_name
         FROM default.table_b
         WHERE dt = '{@date}'

         UNION ALL

         SELECT
             id, act_name
         FROM default.table_c
         WHERE dt = '{@date}'
     ) dim_act ON a.pvid = dim_act.pvid
WHERE dt = '{@date}'

image.png

可以广播:

set hive.auto.convert.join=true; -- 显式设置mapjoin
SELECT pvid
     , req_time
     , click_cnt
FROM dataware.table_a a
         LEFT JOIN
     (   
         SELECT
             id, act_name
         FROM default.table_b
         WHERE dt = '{@date}'
     ) dim_act ON a.adpos_id = dim_act.id
WHERE dt = '{@date}'

没有Union的场景,小表可以广播,运行过程产生两个Job。 job 0,查询小表进行广播 job 1,实现MapJoin,没有Shuffle过程,大表没有数据分发过程产生 image.png image.png

标签:Followed,Union,MapJoin,act,date,table
From: https://blog.51cto.com/u_10120275/5842095

相关文章