举例说明Hive实现MapJoin限制场景之一,MapJoin Followed by Union
Hive的MapJoin逻辑会有几个限制场景:
- Union Followed by a MapJoin
- Lateral View Followed by a MapJoin
- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By) Followed by MapJoin
- MapJoin Followed by Union
- MapJoin Followed by Join
- MapJoin Followed by MapJoin
官网链接:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions
实际举例MapJoin限制场景之一,MapJoin Followed by Union
不能实现Map Join
set hive.auto.convert.join=true; -- 显式设置mapjoin
SELECT pvid
, req_time
FROM dataware.table_a a
LEFT JOIN
(
SELECT
id, act_name
FROM default.table_b
WHERE dt = '{@date}'
UNION ALL
SELECT
id, act_name
FROM default.table_c
WHERE dt = '{@date}'
) dim_act ON a.pvid = dim_act.pvid
WHERE dt = '{@date}'
可以广播:
set hive.auto.convert.join=true; -- 显式设置mapjoin
SELECT pvid
, req_time
, click_cnt
FROM dataware.table_a a
LEFT JOIN
(
SELECT
id, act_name
FROM default.table_b
WHERE dt = '{@date}'
) dim_act ON a.adpos_id = dim_act.id
WHERE dt = '{@date}'
没有Union的场景,小表可以广播,运行过程产生两个Job。 job 0,查询小表进行广播 job 1,实现MapJoin,没有Shuffle过程,大表没有数据分发过程产生
标签:Followed,Union,MapJoin,act,date,table From: https://blog.51cto.com/u_10120275/5842095