学习hadoop的第三天
hive介绍
hive的基本信息
Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的SQL查询功能,可以将SQL语句转换为MapReduce任务进行运行。其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
Hive定义了一种简单的类SQL查询语言,称为HQL(Hive Query Language),它允许熟悉SQL的用户查询数据。同时,这个语言也允许熟悉MapReduce开发者的开发自定义的mapper和reducer来处理内建的mapper和reducer无法完成的复杂的分析工作。
Hive的构建离不开Hadoop提供的基础设施,其采用的数据模型是关系型数据模型,并且提供SQL接口,数据存储在HDFS中,大部分的查询由MapReduce完成(包含*的查询,比如select * from table不会生成MapRedcue任务)。
hive的优势
Hive有以下优势:
- 提供了SQL接口,可以直接通过SQL语句来查询存储在HDFS上的数据,方便数据分析师进行数据分析和挖掘。
- 可以处理大规模的数据集,因为Hive是建立在Hadoop之上的,可以利用Hadoop的分布式计算能力来处理海量数据。
- 提供了丰富的数据模型和数据操作语言,可以方便地定义和管理数据表,以及进行各种数据查询和操作。
- 可以通过扩展HiveQL(一种类SQL语言)来支持更复杂的分析和处理需求。
- 与Hadoop的其他组件(如Pig)相比,Hive的学习成本更低,更适合数据分析师等非编程人员使用。
总的来说,Hive是一个强大而灵活的数据仓库工具,它可以帮助用户更高效地处理和分析大规模的数据集。
Hive如何利用MapReduce进行复杂查询
Hive利用MapReduce进行复杂查询的过程涉及几个关键步骤。Hive是一个建立在Hadoop之上的数据仓库工具,它可以将SQL查询转换为MapReduce作业来执行。以下是Hive利用MapReduce进行复杂查询的主要步骤:
-
查询解析与优化:
- 当用户向Hive提交一个SQL查询时,Hive首先会解析这个查询,检查其语法和语义的正确性。
- 解析后的查询会经过优化器的处理,优化器会尝试找到执行查询的最有效方式。
-
生成执行计划:
- 经过优化后的查询会生成一个逻辑执行计划,这个计划描述了如何执行查询的每一步。
- 逻辑执行计划接着会被转换成物理执行计划,这个计划包含了具体的MapReduce作业和其他必要的操作。
-
转换为MapReduce作业:
- Hive会将物理执行计划中的每个步骤转换为相应的MapReduce作业。对于复杂的查询,这可能会涉及多个MapReduce作业的串联。
- 每个MapReduce作业都由Mapper和Reducer组成,Mapper负责处理输入数据并生成中间键值对,Reducer负责合并Mapper的输出并生成最终结果。
-
执行MapReduce作业:
- Hive将MapReduce作业提交到Hadoop集群上执行。Hadoop会根据集群的当前状态和资源可用性来调度和执行这些作业。
- Mapper和Reducer会在Hadoop的DataNode上并行运行,处理分布在集群中的数据。
-
处理中间数据和最终结果:
- 在执行过程中,Mapper生成的中间数据会被写入到HDFS的临时目录中。
- Reducer会读取这些中间数据,进行合并和计算,并生成最终结果。这些结果也可以被写入到HDFS中供后续使用。
-
返回查询结果:
- 一旦所有的MapReduce作业都执行完毕,Hive会收集并整理这些作业的输出,然后将最终的查询结果返回给用户。
需要注意的是,对于某些复杂的查询,Hive可能需要进行多个阶段的MapReduce处理。这包括在查询执行过程中进行数据的重新分区、排序和合并等操作,以确保数据的正确性和查询的效率。
此外,Hive还提供了一些优化机制来提高查询的性能,如列式存储、数据压缩、索引等。这些机制可以在不同程度上减少数据的读取量、降低计算的复杂度,从而加快查询的执行速度。
hive搭建
- 安装软件包
- 初始化数据库
- 配置hive的配置文件
安装软件包
- 解压压缩包
tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src/ #解压软件包
mv /usr/local/src/apache-hive-2.0.0-bin /usr/local/src/hive #移动到目录
chown -R hadoop:hadoop /usr/local/src/hive #给hadoop用户所有权限
- 卸载之前的mysql软件包
rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64
- 安装新的软件包
[root@master ~]# cd /opt/software/mysql-5.7.18/
[root@master mysql-5.7.18]# ls
mysql-community-client-5.7.18-1.el7.x86_64.rpm
mysql-community-common-5.7.18-1.el7.x86_64.rpm
mysql-community-devel-5.7.18-1.el7.x86_64.rpm
mysql-community-libs-5.7.18-1.el7.x86_64.rpm
mysql-community-server-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-
mysql-community-client-5.7.18-1.el7.x86_64.rpm
mysql-community-common-5.7.18-1.el7.x86_64.rpm
mysql-community-devel-5.7.18-1.el7.x86_64.rpm
mysql-community-libs-5.7.18-1.el7.x86_64.rpm
mysql-community-server-5.7.18-1.el7.x86_64.rpm
[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql-community-common-5.7.18-1.e################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql-community-libs-5.7.18-1.el7################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql-community-client-5.7.18-1.e################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql-community-server-5.7.18-1.e################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-devel-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-devel-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql-community-devel-5.7.18-1.el################################# [100%]
[root@master mysql-5.7.18]#
初始化数据库
[root@master ~]# cat /var/log/mysqld.log |grep password
2024-04-07T01:33:10.307521Z 1 [Note] A temporary password is generated for root@localhost: ikwD5rXxmj*m
2024-04-07T01:36:19.453456Z 3 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2024-04-07T01:36:27.839546Z 4 [Note] Your password has expired. To log in you must change it using a client that supports expired passwords.
2024-04-07T01:54:37.744901Z 6 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2024-04-07T01:54:50.535552Z 7 [Note] Access denied for user 'root'@'localhost' (using password: YES)
2024-04-07T01:55:20.016078Z 8 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2024-04-07T01:55:24.878580Z 9 [Note] Access denied for user 'root'@'localhost' (using password: YES)
[root@master ~]# mysql_secure_installation
Securing the MySQL server deployment.
Enter password for user root:
The 'validate_password' plugin is installed on the server.
The subsequent steps will run with the existing configuration
of the plugin.
Using existing password for root.
Estimated strength of the password: 100
Change the password for root ? ((Press y|Y for Yes, any other key for No) : n
... skipping.
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
Success.
Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
Success.
All done!
添加 root 用户从本地和远程访问 MySQL 数据库表单的授权。
[root@master ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> grant all privileges on *.* to root@'localhost'
-> identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all privileges on *.* to root@'%' identified by
-> 'Password123$'
-> ;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> select user,host from mysql.user where user='root';
+------+-----------+
| user | host |
+------+-----------+
| root | % |
| root | localhost |
+------+-----------+
2 rows in set (0.00 sec)
mysql>
配置hive组件
一.环境变量
- 设置 Hive 环境变量
# set hive environment
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
- 使用环境变量
[root@master ~]# source /etc/profile
二.hive配置文件
3. 重命名
[root@master ~]# su - hadoop
[hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template
/usr/local/src/hive/conf/hive-site.xml
- 设定Hive 临时文件存储路径
#设置 MySQL 数据库连接
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us
eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
# 配置mysql数据库root密码
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
<description>password to use against s database</description>
</property>
#验证元数据存储版本一致性,若为false,则不用修改
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from
Hive jars. Also disable automatic
False: Warn if the version information stored in metastore doesn't match
with one from in Hive jars.
</description>
</property>、
#配置数据库驱动
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
# 配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
#将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为
#“/usr/local/src/hive/tmp”目录及其子目录。
需要替换 4 处配置内容:
<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
<description>Location of Hive run time structured log
file</description>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp</value>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp/resources</value>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>
#在 Hive 安装目录中创建临时文件夹 tmp。
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
#搭建成功
标签:5.7,18,hadoop,hive,rpm,mysql,root,搭建
From: https://blog.csdn.net/m0_74752717/article/details/137449938