首页 > 其他分享 >Hive Merge详解

Hive Merge详解

时间:2023-08-04 09:46:36浏览次数:31  
标签:value java merge hive Merge 详解 Hive apache org

说明
Hive在2.2版本之后开始支持Merge操作,并且Merge只能在支持ACID的表上执行

语法

MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>

建表语句

CREATE DATABASE merge_data;

CREATE TABLE merge_data.transactions(
ID int,
TranValue string,
last_update_user string)
PARTITIONED BY (tran_date string)
CLUSTERED BY (ID) into 5 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true');

CREATE TABLE merge_data.merge_source(
ID int,
TranValue string,
tran_date string)
STORED AS ORC;

导入数据

INSERT INTO merge_data.merge_source VALUES
(1, 'value_01', '20170410'),
(4, NULL, '20170410'),
(7, 'value_77777', '20170413'),
(8, NULL, '20170413'),
(8, 'value_08', '20170415'),
(11, 'value_11', '20170415');

INSERT INTO merge_data.transactions PARTITION (tran_date) VALUES
(1, 'value_01', 'creation', '20170410'),
(2, 'value_02', 'creation', '20170410'),
(3, 'value_03', 'creation', '20170410'),
(4, 'value_04', 'creation', '20170410'),
(5, 'value_05', 'creation', '20170413'),
(6, 'value_06', 'creation', '20170413'),
(7, 'value_07', 'creation', '20170413'),
(8, 'value_08', 'creation', '20170413'),
(9, 'value_09', 'creation', '20170413'),
(10, 'value_10','creation', '20170413');

在导入目标表数据会遇到以下问题

错误一:

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one 
static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
解决方法:
set hive.exec.dynamic.partition.mode=nonstrict;

错误二:

FAILED: SemanticException [Error 10265]: This command is not allowed on an ACID table merge_data.
transactions with a non-ACID transaction manager. Failed command: INSERT INTO merge_data.
transactions PARTITION (tran_date) VALUES
解决方法:
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

错误三:

FAILED: RuntimeException [Error 10264]: To use DbTxnManager you must set hive.support.concurrency=true
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1186)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1324)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
解决方法:
set hive.support.concurrency=true;

修改完以上三个参数之后,导入数据成功

Merge操作

MERGE INTO merge_data.transactions AS T
USING merge_data.merge_source AS S
ON T.ID = S.ID and T.tran_date = S.tran_date
WHEN MATCHED AND (T.TranValue != S.TranValue AND S.TranValue IS NOT NULL) THEN UPDATE SET
TranValue = S.TranValue
,last_update_user = 'merge_update'
WHEN MATCHED AND S.TranValue IS NULL THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (
S.ID
, S.TranValue
, 'merge_insert'
, S.tran_date
);

查看结果

至此,我们就完美实现了Hive的Merge操作!


扩展delete操作
既然我们已经是事务表了那让我们试一试删除操作

从临时表导入数据

insert into table merge_data.transactions PARTITION (tran_date)
select ID,TranValue,last_update_user,tran_date from merge_data.transactions_tmp
cluster by id;
cluster by (id) - 指定getPartition以哪个字段来进行hash散列,并且排序字段也是指定的字段,默认以正序进行排序
distribute by(uid) – 指定getPartition以哪个字段来进行hash散列
sort by(uid asc) – 指定排序字段,以及排序规则

 

标签:value,java,merge,hive,Merge,详解,Hive,apache,org
From: https://www.cnblogs.com/sx66/p/17605051.html

相关文章

  • 9.2.Config Server 配置规则详解
    9.2.ConfigServer配置规则详解在上面,我们用于测试的配置文件:futurecloud.ymlfuturecloud-dev.ymlfuturecloud-test.ymlfuturecloud-pre.ymlfuturecloud-stable.ymlfuturecloud-apigetway-zuul.yml“-”前面的部分可以随便定义,一般我们用应用名来定义,后面的deb、test…也可以随......
  • spring-mvc系列:详解@RequestMapping注解(value、method、params、header等)
    目录一、@RequestMapping注解的功能二、@RequestMapping注解的位置三、@RequestMapping注解的value属性四、@RequestMapping注解的method属性五、@RequestMapping注解的params属性六、@RequestMapping注解的header属性七、SpringMVC支持ant分格的路径八、SpringMVC支持路径中的占......
  • 【jmeter系列】仅一次控制器+正则表达式用法详解
    一、仅一次控制器做性能测试经常遇到这样的问题:要做接口压力测试,但是需要登录接口返回的token,但是不需要对登录接口进行压测,这个时候jmeter仅一次控制器就起到了关键性的作用。具体操作如下:1、添加仅一次控制器,并在控制器下添加登录接口 2、登录接口返回响应数据-Responesb......
  • 大数据不就是写sql吗?—— Hive:把sql解析后用MapReduce跑 SparkSQL:把sql解析后
    应届生小祖参加了个需求分析会回来后跟我说被产品怼了一句: "不就是写SQL吗,要那么久吗" 我去,欺负我小弟,这我肯定不能忍呀,于是我写了一篇文章发在了公司的wiki: 贴出来给大家看看,省略了一些敏感的内容。当然内部版言辞也会温和一点,嘻嘻在哪里写SQL? 这个问题高级点的问法是用哪种SQ......
  • update-alternatives 使用详解
    alternatives管理方式$ls-l/usr/bin/pythonlrwxrwxrwx1rootroot2411202017/usr/bin/python->/etc/alternatives/python$ls-l/etc/alternatives/pythonlrwxrwxrwx1rootroot1811212017/etc/alternatives/python->/usr/bin/python2.7python这个可执行命......
  • JAVA系列之类加载机制详解
    文章目录一、类加载过程二、类生命周期1、加载2、验证3、准备4、解析5、初始化三、双亲委派机制四、自定义类加载器的应用1、源代码加密2、隔离加载类3、热加载/热部署4、扩展加载源类的加载机制?双亲委派机制?什么是类加载器?自定义类加载器有哪些应用场景?通常,在关于Java的......
  • mysql事务和索引详解
    mysql事务和索引详解1.事务注:事务就是一组操作的集合,成功一起成功,失败一起失败事务控制:开启事务:starttransaction;/begin;提交事务:commit;回滚事务:rollback;使用方法:首先graphTBid1[开启事务]-->id2{执行mysql}id2-->|语句全部执行成功|id3>提交事务]id2-->|语......
  • 《Ext详解与实践》节选:自定义单元格的显示格式
    有时候,需要在Grid中显示一个链接,或根据数字的大小显示不同的颜色,又或者使用固定格式显示记录编号等,这就需要自定义单元格的显示格式。在Ext的Grid可以很容易的实现自定义单元格的显示格式。在定义Grid的ColumnModel的时候,在需要自定义显示格式的列中加入参数renderer,该参数指向一个......
  • 《Ext详解与实践》节选:文件上传
    rel="File-List"href="file:///C:%5CDOCUME%7E1%5CADMINI%7E1%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml">rel="Edit-Time-Data"href="file:///C:%5CDOCUME%7E1%5CADMINI%7E1%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01......
  • java mybatis一级缓存二级缓存三级缓存详解
    在MyBatis中,存在一级缓存、二级缓存和三级缓存的概念。下面我将为你详细解释这些缓存的含义和作用:一级缓存:一级缓存是MyBatis默认开启的缓存机制,它是指在同一个SqlSession对象中,执行相同的SQL语句时,会将查询结果缓存到内存中。一级缓存的作用范围仅限于当前的SqlSession,当......