说明
Hive在2.2版本之后开始支持Merge操作,并且Merge只能在支持ACID的表上执行
语法
MERGE INTO <target table> AS T USING <source expression/table> AS S ON <boolean expression1> WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list> WHEN MATCHED [AND <boolean expression3>] THEN DELETE WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>
建表语句
CREATE DATABASE merge_data; CREATE TABLE merge_data.transactions( ID int, TranValue string, last_update_user string) PARTITIONED BY (tran_date string) CLUSTERED BY (ID) into 5 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true'); CREATE TABLE merge_data.merge_source( ID int, TranValue string, tran_date string) STORED AS ORC;
导入数据
INSERT INTO merge_data.merge_source VALUES (1, 'value_01', '20170410'), (4, NULL, '20170410'), (7, 'value_77777', '20170413'), (8, NULL, '20170413'), (8, 'value_08', '20170415'), (11, 'value_11', '20170415'); INSERT INTO merge_data.transactions PARTITION (tran_date) VALUES (1, 'value_01', 'creation', '20170410'), (2, 'value_02', 'creation', '20170410'), (3, 'value_03', 'creation', '20170410'), (4, 'value_04', 'creation', '20170410'), (5, 'value_05', 'creation', '20170413'), (6, 'value_06', 'creation', '20170413'), (7, 'value_07', 'creation', '20170413'), (8, 'value_08', 'creation', '20170413'), (9, 'value_09', 'creation', '20170413'), (10, 'value_10','creation', '20170413');
在导入目标表数据会遇到以下问题
错误一:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one
static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
解决方法: set hive.exec.dynamic.partition.mode=nonstrict;
错误二:
FAILED: SemanticException [Error 10265]: This command is not allowed on an ACID table merge_data.
transactions with a non-ACID transaction manager. Failed command: INSERT INTO merge_data.
transactions PARTITION (tran_date) VALUES
解决方法: set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
错误三:
FAILED: RuntimeException [Error 10264]: To use DbTxnManager you must set hive.support.concurrency=true Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1186) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1324) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
解决方法:
set hive.support.concurrency=true;
修改完以上三个参数之后,导入数据成功
Merge操作
MERGE INTO merge_data.transactions AS T USING merge_data.merge_source AS S ON T.ID = S.ID and T.tran_date = S.tran_date WHEN MATCHED AND (T.TranValue != S.TranValue AND S.TranValue IS NOT NULL) THEN UPDATE SET TranValue = S.TranValue ,last_update_user = 'merge_update' WHEN MATCHED AND S.TranValue IS NULL THEN DELETE WHEN NOT MATCHED THEN INSERT VALUES ( S.ID , S.TranValue , 'merge_insert' , S.tran_date );
查看结果
至此,我们就完美实现了Hive的Merge操作!
扩展delete操作
既然我们已经是事务表了那让我们试一试删除操作
从临时表导入数据
insert into table merge_data.transactions PARTITION (tran_date) select ID,TranValue,last_update_user,tran_date from merge_data.transactions_tmp cluster by id;
cluster by (id) - 指定getPartition以哪个字段来进行hash散列,并且排序字段也是指定的字段,默认以正序进行排序 distribute by(uid) – 指定getPartition以哪个字段来进行hash散列 sort by(uid asc) – 指定排序字段,以及排序规则
标签:value,java,merge,hive,Merge,详解,Hive,apache,org From: https://www.cnblogs.com/sx66/p/17605051.html