首页 > 其他分享 >hive安装配置

hive安装配置

时间:2023-09-21 10:05:31浏览次数:42  
标签:val 08 配置 hadoop hive taken 2008 安装


requirement:
    Java 1.6
    Hadoop 0.20.x.

ref:https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallingHivefromaStableRelease

1)download hive
http://hive.apache.org/releases.html2)Installing Hive
tar zxvf hive-0.7.0-bin.tar.gz

root@hadoop1:/opt#
root@hadoop1:/opt# ln -sf /opt/hadoop/hive-0.7.0-bin/ /opt/hadoop/hive

export HIVE_HOME=/opt/hadoop/hive
export PATH=/opt/hadoop/hive/bin:$PATH
5)running hive

  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

root@hadoop1:/opt/hadoop/hive/bin# ./hive
Hive history file=/tmp/root/hive_job_log_root_201107121412_939983567.txt


DDL Operations
hive> CREATE TABLE pokes (foo INT, bar STRING);  
OK
Time taken: 0.42 seconds
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);  
OK
Time taken: 0.099 seconds
hive> SHOW TABLES;
OK
invites
pokes
Time taken: 0.222 seconds
hive> SHOW TABLES '.*s';
OK
invites
pokes
Time taken: 0.134 seconds
hive> DESCRIBE invites;
OK
foo    int    
bar    string    
ds    string    
Time taken: 0.174 seconds
hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
OK
Time taken: 0.147 seconds
hive>  ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
OK
Time taken: 0.115 seconds
hive> DROP TABLE pokes;
OK
Time taken: 1.054 seconds
hive> show tables;
OK
invites
Time taken: 0.131 seconds

DML Operations
从本地加载文件  
hive> LOAD DATA LOCAL INPATH '/opt/hadoop/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Copying data from file:/opt/hadoop/hive/examples/files/kv1.txt
Copying file: file:/opt/hadoop/hive/examples/files/kv1.txt
Loading data to table default.pokes
Deleted hdfs://hadoop1:9000/user/hive/warehouse/pokes
OK
Time taken: 0.318 seconds

hive> select * from pokes limit 10;
OK
238    val_238
86    val_86
311    val_311
27    val_27
165    val_165
409    val_409
255    val_255
278    val_278
98    val_98
484    val_484
Time taken: 0.137 seconds

--分区表加载
hive> LOAD DATA LOCAL INPATH '/opt/hadoop/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
Copying data from file:/opt/hadoop/hive/examples/files/kv2.txt
Copying file: file:/opt/hadoop/hive/examples/files/kv2.txt
Loading data to table default.invites partition (ds=2008-08-15)
OK
Time taken: 0.394 seconds
hive> select * from invites limit 10;
OK
474    val_475    NULL    2008-08-15
281    val_282    NULL    2008-08-15
179    val_180    NULL    2008-08-15
291    val_292    NULL    2008-08-15
62    val_63    NULL    2008-08-15
271    val_272    NULL    2008-08-15
217    val_218    NULL    2008-08-15
135    val_136    NULL    2008-08-15
167    val_168    NULL    2008-08-15
468    val_469    NULL    2008-08-15
Time taken: 0.217 seconds

hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');root@hadoop1:/opt/hadoop/hive/bin#
root@hadoop1:/opt/hadoop/hive/bin# ./hive
Hive history file=/tmp/root/hive_job_log_root_201107121431_842989549.txt
hive> LOAD DATA LOCAL INPATH '/opt/hadoop/hive/examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');
Copying data from file:/opt/hadoop/hive/examples/files/kv3.txt
Copying file: file:/opt/hadoop/hive/examples/files/kv3.txt
Loading data to table default.invites partition (ds=2008-08-08)
OK
Time taken: 6.787 seconds
hive> select * from invites limit 10;
OK
238    val_238    NULL    2008-08-08
NULL        NULL    2008-08-08
311    val_311    NULL    2008-08-08
NULL    val_27    NULL    2008-08-08
NULL    val_165    NULL    2008-08-08
NULL    val_409    NULL    2008-08-08
255    val_255    NULL    2008-08-08
278    val_278    NULL    2008-08-08
98    val_98    NULL    2008-08-08
NULL    val_484    NULL    2008-08-08
Time taken: 0.589 seconds

SQL Operations
hive>
    >
    >
    > SELECT a.foo FROM invites a WHERE a.ds='2008-08-15' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/root/root_20110712144040_e058253d-bb7f-45b9-97b8-f6c78c5483b1.log
Job running in-process (local Hadoop)
2011-07-12 14:40:52,786 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
OK
474
281
179
291
62
271
217
135
167
468
Time taken: 3.62 seconds

hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/root/root_20110712144141_7b6e4021-a419-42b5-a6eb-c45010872c0a.log
Job running in-process (local Hadoop)
2011-07-12 14:41:39,056 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Ended Job = -1864542964, job is filtered out (removed at runtime).
Moving data to: hdfs://hadoop1:9000/tmp/hive-root/hive_2011-07-12_14-41-36_001_2590472032748705056/-ext-10000
Moving data to: /tmp/hdfs_out
OK
Time taken: 3.247 seconds

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/root/root_20110712144242_920398b9-3c37-431b-b088-dcffe1c54aa2.log
Job running in-process (local Hadoop)
2011-07-12 14:42:19,666 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Copying data to local directory /tmp/local_out
Copying data to local directory /tmp/local_out
OK
Time taken: 3.189 seconds


lpxuan@hadoop1:/tmp/local_out$ more 000000_0
238val_238
86val_86
311val_311
27val_27
165val_165

--group by operation
hive>
    >
    > SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110712144545_01fc3105-f98f-4d77-841f-61c5d65f80fc.log
Job running in-process (local Hadoop)
2011-07-12 14:45:45,313 null map = 0%,  reduce = 0%
2011-07-12 14:45:53,745 null map = 100%,  reduce = 0%
2011-07-12 14:45:55,748 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
    3
val_100    1
val_101    2
val_79    1
val_81    1
val_83    2
val_86    1
val_87    1
val_88    2
val_9    1
val_90    3
val_92    1
val_94    3
val_95    1
val_98    3
..
Time taken: 18.354 seconds


--join
hive>  SELECT t1.bar, t1.foo, t2.foo FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20110712144848_0aa68f57-4d70-4cdc-abb7-fa48a5e379dd.log
Job running in-process (local Hadoop)
2011-07-12 14:48:55,650 null map = 0%,  reduce = 0%
2011-07-12 14:48:56,653 null map = 100%,  reduce = 0%
2011-07-12 14:48:57,659 null map = 100%,  reduce = 100%
Ended Job = job_local_0001
OK
val_100    100    99
val_100    100    99
val_103    103    102
val_103    103    102
val_105    105    104
val_105    105    104
val_105    105    104
val_11    11    10
val_111    111    110
val_118    118    117
Time taken: 8.686 seconds

============================================================

搭建Hive平台

Hive是一个基于Hadoop的数据仓库平台。通过hive,我们可以方便地进行ETL的工作。hive定义了一个类似于SQL的查询语言:HQL,能够将用户编写的QL转化为相应的Mapreduce程序基于Hadoop执行。

本文讲解如何搭建一个Hive平台。假设我们有3台机器:hadoop1,hadoop2,hadoop3。并且都安装好了Hadoop-0.19.2(hive支持的Hadoop版本很多),hosts文件配置正确。Hive部署在hadoop1机器上。

最简单,最快速的部署方案

在Hadoop-0.19.2中自带了hive的文件。版本为0.3.0。

我们首先启动Hadoop:sh $HADOOP_HOME/bin/start-all.sh

然后启动hive即可:sh $HADOOP_HOME/contrib/hive/bin/hive

这个时候,我们的Hive的命令行接口就启动起来了,你可以直接输入命令来执行相应的hive应用了。

这种部署方式使用derby的嵌入式模式,虽然简单快速,但是无法提供多用户同时访问,所以只能用于简单的测试,无法实际应用于生产环境。所以,我们要修改hive的默认配置,提高可用性。

搭建多用户的,提供web界面的部署方案

目前只用比较多hive版本是hive-0.4.1。我们将使用这个版本来搭建hive平台。

首先,下载hive-0.4.1:svn co http://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1/ hive-0.4.1

然后,修改下载文件里面的编译选项文件shims/ivy.xml,将其修改为如下内容(对应的Hadoop版本为0.19.2)

<ivy-module version="2.0"> 
    <info organisation="org.apache.hadoop.hive" module="shims"/> 
    <dependencies> 
        <dependency org="hadoop" name="core" rev="0.19.2"> 
          <artifact name="hadoop" type="source" ext="tar.gz"/> 
        </dependency> 
        <conflict manager="all" /> 
    </dependencies> 
</ivy-module>

接下来,我们使用ant去编译hive: ant package

编译成功后,我们会发现在build/dist目录中就是编译成功的文件。将这个目录设为$HIVE_HOME

修改conf/hive-default.xml文件,主要修改内容如下:

<property> 
  <name>javax.jdo.option.ConnectionURL</name> 
  <value>jdbc:derby://hadoop1:1527/metastore_db;create=true</value> 
  <description>JDBC connect string for a JDBC metastore</description> 
</property> 
<property> 
  <name>javax.jdo.option.ConnectionDriverName</name> 
  <value>org.apache.derby.jdbc.ClientDriver</value> 
  <description>Driver class name for a JDBC metastore</description> 
</property>

在hadoop1机器上下载并安装apache derby数据库:wget http://labs.renren.com/apache-mirror/db/derby/db-derby-10.5.3.0/db-derby-10.5.3.0-bin.zip

解压derby后,设置$DERBY_HOME

然后启动derby的network Server:sh $DERBY_HOME/bin/startNetworkServer -h 0.0.0.0

接下来,将$DERBY_HOME/lib目录下的derbyclient.jar与derbytools.jar文件copy到$HIVE_HOME/lib目录下。

启动Hadoop:sh $HADOOP_HOME/bin/start-all.sh

最后,启动hive的web界面:sh $HIVE_HOME/bin/hive --service hwi

这样,我们的hive就部署完成了。我们可以直接在浏览器中输入: http://hadoop1:9999/hwi/ 进行访问了(如果不行话,请将hadoop1替换为实际的ip地址,如:http://10.210.152.17:9999/hwi/)。

这种部署方式使用derby的c/s模式,允许多用户同时访问,同时提供web界面,方便使用。推荐使用这种部署方案。

关注Hive的schema

我们上面谈到的2中部署方案都是使用derby数据库来保存hive中的schema信息。我们也可以使用其他的数据库来保存schema信息,如mysql。

可以参考这篇文章了解如果使用mysql来替换derby:http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx

我们也可以使用HDFS来保存schema信息,具体的做法是修改conf/hive-default.xml,修改内容如下:

<property> 
  <name>hive.metastore.rawstore.impl</name> 
  <value>org.apache.hadoop.hive.metastore.FileStore</value> 
  <description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description> 
</property>

标签:val,08,配置,hadoop,hive,taken,2008,安装
From: https://blog.51cto.com/u_16255870/7548648

相关文章

  • HADOOP集群、hive、derby安装部署详细说明
    一、创建用户groupaddanalyzer-fuseraddanalyzer-d/opt/analyzer-ganalyzer-psearchanalyzer二、处理/etc/hosts文件三、设置免密码登录(多台机器的id_rsa.pub,相互拷贝)生成公钥、密钥:ssh-keygen-trsa复制本地id_rsa.pub到远程服务器,使远程服务器登录本地可以免密码scp s......
  • 利用sqoop将hive数据导入导出数据到mysql
    运行环境 centos5.6  hadoop hivesqoop是让hadoop技术支持的clouder公司开发的一个在关系数据库和hdfs,hive之间数据导入导出的一个工具在使用过程中可能遇到的问题:sqoop依赖zookeeper,所以必须配置ZOOKEEPER_HOME到环境变量中。sqoop-1.2.0-CDH3B4依赖hadoop-core-0.20.2-......
  • hadoop,hbase,hive安装全记录
    操作系统:CentOS5.5Hadoop:hadoop-0.20.203.0jdk1.7.0_01namenode主机名:master,namenode的IP:10.10.102.15datanode主机名:slave1,datanode的IP:10.10.106.8datanode主机名:slave2,datanode的IP:10.10.106.9一、hadoop安装1、建立用户useraddhadooppasswdhadoop2.安装JDK*先查......
  • 微信服务直达配置问题
     鲜花同城配送玫淳自从8月22号前夕花海上去了之后,节日后几天聚联订花又上去了,前后做了很多每天1000,2000都试过排名没任务相应现在想修改服务直达看看,下面是备份  ......
  • nginx+uwsgi+django配置
    单点没有负载的nginx配置http{    server{        listen8900; location/{root/var/www/html;#指定网站根目录的路径 indexindex.html;#指定默认的索引文件为index.html}        location/api {        ......
  • Docker loki+promtail+grafana安装
    docker-compose.yamlversion:"3"networks:loki:services:loki:image:grafana/loki:2.9.0ports:-"3100:3100"command:-config.file=/etc/loki/local-config.yamlnetworks:-lokipromtail:imag......
  • Linux系统中如何安装rz、sz命令
    rz、sz是用来在windows和Linux上互转文件的一个命令,lrzsz在linux里可代替ftp上传和下载。一、rz、sz简介:rz命令(ReceiveZMODEM),使用ZMODEM协议,将本地文件批量上传到远程Linux/Unix服务器,注意不能上传文件夹。sz命令(SendZMODEM)通过ZMODEM协议,可将多个文件从远程服务器下载......
  • 宝塔安装教程
    安装教程地址:https://blog.csdn.net/SoloVersion/article/details/123984445?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522169522366316800192282506%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=169522366316800192282506&......
  • jwt配置及代码模板
    jwt配置及代码模板jwt工具类的使用依赖<dependency><groupId>io.jsonwebtoken</groupId><artifactId>jjwt</artifactId><version>0.6.0</version></dependency>application.properties配置jwt.config.key=userlogin......
  • hive导出到mysql以及mysql导入到hive
    hive导出到mysql:/export/server/sqoop-1.4.7.bin__hadoop-2.6.0/bin/sqoopexport--connectjdbc:mysql://10.99.118.207:3306/db_msg--usernameroot--password1003392478--tabletb_rs_sender_phone--export-dir/user/hive/warehouse/db_msg.db/tb_rs_sender_phone--......