Hive 组件（进阶中级）

标签：进阶 weight grade hadoop hive height Hive 组件 170

1. 实验一：Hive 组件部署

1. 实验任务一：环境需求

1.1. 步骤一：解压安装文件

[root@master ~]# ls
anaconda-ks.cfg               jdk-8u152-linux-x64.tar.gz
apache-hive-2.0.0-bin.tar.gz  mysql-5.7.18.zip
hadoop-2.7.1.tar.gz           student.csv
hbase-1.2.1-bin.tar.gz        zookeeper-3.4.8.tar.gz
[root@master ~]# tar xf apache-hive-2.0.0-bin.tar.gz -C /usr/local/src

[root@master ~]# cd /usr/local/src
[root@master src]# ls
apache-hive-2.0.0-bin  hadoop  hbase  jdk  zookeeper
[root@master src]# chown -R hadoop:hadoop /usr/local/src
[root@master src]# mv apache-hive-2.0.0-bin/ hive
[root@master src]# ll
total 4
drwxr-xr-x. 11 hadoop hadoop  172 May 23 23:21 hadoop
drwxr-xr-x. 10 hadoop hadoop  194 Jun  3 04:29 hbase
drwxr-xr-x.  8 hadoop hadoop  159 Jun  6 22:07 hive
drwxr-xr-x.  8 hadoop hadoop  255 Sep 14  2017 jdk
drwxr-xr-x. 12 hadoop hadoop 4096 May 23 22:08 zookeeper
#配置环境变量
[root@master src]# vi /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/src/hive
export PATH=$HIVE_HOME/bin:$PATH 
export HIVE_CONF_DIR=$HIVE_HOME/conf

.2. 实验任务二：MySQL 安装与启动

2.1. 步骤一：卸载 MariaDB

[root@master ~]# rpm -qa | grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
[root@master ~]# rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64

ps：

rpm -qa 列出所有已安装软件包

rpm -e packagename 删除软件包

rpm -e --nodeps packagename 强制删除软件和依赖包

rpm -q 包名查询包是否安装

rpm -ivh 包名安装软件包

2.2. 步骤二：使用 rpm 安装 mysql

[root@master ~]# yum -y install unzip
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cqu.edu.cn
 * extras: mirrors.aliyun.com
 * updates: mirrors.cqu.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package unzip.x86_64 0:6.0-24.el7_9 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================
 Package       Arch           Version                Repository       Size
===========================================================================
Installing:
 unzip         x86_64         6.0-24.el7_9           updates         172 k

Transaction Summary
===========================================================================
Install  1 Package

Total download size: 172 k
Installed size: 369 k
Downloading packages:
warning: /var/cache/yum/x86_64/7/updates/packages/unzip-6.0-24.el7_9.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Public key for unzip-6.0-24.el7_9.x86_64.rpm is not installed
unzip-6.0-24.el7_9.x86_64.rpm                         | 172 kB   00:00     
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Importing GPG key 0xF4A80EB5:
 Userid     : "CentOS-7 Key (CentOS 7 Official Signing Key) <security@centos.org>"
 Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
 Package    : centos-release-7-5.1804.el7.centos.x86_64 (@anaconda)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows:
2:postfix-2.10.1-6.el7.x86_64 has missing requires of libmysqlclient.so.18()(64bit)
2:postfix-2.10.1-6.el7.x86_64 has missing requires of libmysqlclient.so.18(libmysqlclient_18)(64bit)
  Installing : unzip-6.0-24.el7_9.x86_64                               1/1 
  Verifying  : unzip-6.0-24.el7_9.x86_64                               1/1 

Installed:
  unzip.x86_64 0:6.0-24.el7_9                                              

Complete!
[root@master ~]# ls
anaconda-ks.cfg               mysql-5.7.18.zip
apache-hive-2.0.0-bin.tar.gz  mysql-connector-java-5.1.46.jar
hadoop-2.7.1.tar.gz           student.csv
hbase-1.2.1-bin.tar.gz        zookeeper-3.4.8.tar.gz
jdk-8u152-linux-x64.tar.gz
[root@master ~]# unzip mysql-5.7.18.zip 
Archive:  mysql-5.7.18.zip
   creating: mysql-5.7.18/
  inflating: mysql-5.7.18/mysql-community-client-5.7.18-1.el7.x86_64.rpm  
  inflating: mysql-5.7.18/mysql-community-common-5.7.18-1.el7.x86_64.rpm  
  inflating: mysql-5.7.18/mysql-community-devel-5.7.18-1.el7.x86_64.rpm  
  inflating: mysql-5.7.18/mysql-community-libs-5.7.18-1.el7.x86_64.rpm  
  inflating: mysql-5.7.18/mysql-community-server-5.7.18-1.el7.x86_64.rpm  
[root@master ~]# ls
anaconda-ks.cfg               mysql-5.7.18
apache-hive-2.0.0-bin.tar.gz  mysql-5.7.18.zip
hadoop-2.7.1.tar.gz           mysql-connector-java-5.1.46.jar
hbase-1.2.1-bin.tar.gz        student.csv
jdk-8u152-linux-x64.tar.gz    zookeeper-3.4.8.tar.gz
#安装顺序为
[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm 
warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql-community-common-5.7.18-1.e################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm 
warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql-community-libs-5.7.18-1.el7################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm 
warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql-community-client-5.7.18-1.e################################# [100%]
[root@master mysql-5.7.18]# rpm -ivh mysql-community-devel-5.7.18-1.el7.x86_64.rpm 
warning: mysql-community-devel-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql-community-devel-5.7.18-1.el################################# [100%]
[root@master mysql-5.7.18]# yum -y install mysql-community-server-5.7.18-1.el7.x86_64.rpm 
Loaded plugins: fastestmirror
Examining mysql-community-server-5.7.18-1.el7.x86_64.rpm: mysql-community-server-5.7.18-1.el7.x86_64
Marking mysql-community-server-5.7.18-1.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-server.x86_64 0:5.7.18-1.el7 will be installed
--> Processing Dependency: net-tools for package: mysql-community-server-5.7.18-1.el7.x86_64
Loading mirror speeds from cached hostfile
 * base: mirrors.cqu.edu.cn
 * extras: mirrors.aliyun.com
 * updates: mirrors.cqu.edu.cn
......
#再通过#rpm -qa | grep mysql 检查 mysql 安装情况
[root@master mysql-5.7.18]# rpm -qa | grep mysql
mysql-community-libs-5.7.18-1.el7.x86_64
mysql-community-devel-5.7.18-1.el7.x86_64
mysql-community-common-5.7.18-1.el7.x86_64
mysql-community-client-5.7.18-1.el7.x86_64
mysql-community-server-5.7.18-1.el7.x86_64
#进入my.cnf添加
[root@master mysql-5.7.18]# vi /etc/my.cnf

default-storage-engine=innodb
innodb_file_per_table
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server=utf8
#启动 mysql 服务和查看其状态
[root@master mysql-5.7.18]# systemctl start mysqld
[root@master mysql-5.7.18]# systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-06-06 22:24:38 EDT; 7s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 1570 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 1497 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 1573 (mysqld)
   CGroup: /system.slice/mysqld.service
           └─1573 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

Jun 06 22:24:34 master systemd[1]: Starting MySQL Server...
Jun 06 22:24:38 master systemd[1]: Started MySQL Server.
#查看初始密码
[root@master ~]# cat /var/log/mysqld.log | grep password
2023-06-07T02:24:35.479906Z 1 [Note] A temporary password is generated for root@localhost: 1(2ly2sR=sjl
#重新设定密码和配置
[root@master ~]# mysql_secure_installation 

Securing the MySQL server deployment.

Enter password for user root: 

The existing password for the user account root has expired. Please set a new password.

New password: 

Re-enter new password: 
The 'validate_password' plugin is installed on the server.
The subsequent steps will run with the existing configuration
of the plugin.
Using existing password for root.

Estimated strength of the password: 100 
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y

New password: 

Re-enter new password: 

Estimated strength of the password: 100 
Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
Success.


Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n

 ... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.


Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
 - Dropping test database...
Success.

 - Removing privileges on test database...
Success.

Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
Success.

All done! 
#注：允许远程连接设定为 n，表示允许远程连接，其它设定为 y

.2.3. 步骤三：新建 Hive 用户与元数据

[root@master ~]# mysql -uroot -p
Enter password:     #密码是Passwd123!
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database hive_db;
Query OK, 1 row affected (0.01 sec)

mysql> create user hive identified by 'Passwd123!';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all privileges on *.* to hive@'%' identified by 'Passwd123!';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all privileges on *.* to 'root'@'%'identified by 'Passwd123!';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

create database hive_db：新建 hive_db 数据库作为 Hive 的元数据存储地址。

create user hive identified by 'Passwd123!'：创建 Hive 访问用户，用户名为 h ive，密码为 hive。 grant all privileges on . to hive@'%' identified by 'Password123$' with g rant option：grant 权限 1,权限 2,...权限 n on 数据库名称.表名称 to 用户名@用户地址。赋予 hive 用户对所有数据库所有表的所有权限且任何地址都能建立连接“%”，并具有授予权。

flush privileges ：刷新 MySQL 的系统权限相关表。

3. 实验任务三：配置 Hive 参数

3.1. 步骤一：配置 hive-site.xml

#复制源文件 hive-default.xml.template 并修改为 hive-site.xml。修改对应参数的值。
[hadoop@master src]$ cp /usr/local/src/hive/conf/hive-default.xml.template /usr/local/src/hive/conf/hive-site.xml
[hadoop@master src]$ cd hive/conf/
[hadoop@master conf]$ ls
beeline-log4j2.properties.template    hive-log4j2.properties.template
hive-default.xml.template             hive-site.xml
hive-env.sh.template                  ivysettings.xml
hive-exec-log4j2.properties.template  llap-daemon-log4j2.properties.template
[hadoop@master conf]$ vi hive-site.xml 

configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive_db?createDatabaseIfNotExist=true</va
lue>
</property>
<!--mysql 用户名-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<!--mysql 中 hive 用户密码-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
</property>
<!--mysql 驱动-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp/${hive.session.id}_resources</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>master</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
</configuration>

javax.jdo.option.ConnectionURL: JDBC 连接字符串，默认为自带的 metastore_db 数据库，现以 MySQL 作为元数据存储地址，需建立 JDBC 连接，且 hive_db 处为实验二中创建的 MySQL 数据库，存储元数据。

javax.jdo.option.ConnectionUserName：连接 MySQL 的 hive 操作用户，实验二创建。 javax.jdo.option.ConnectionPassword：连接 MySQL 的 hive 操作用户密码，实验二创建。 javax.jdo.option.ConnectionDriverName：配置数据库连接驱动。 hive.downloaded.resources.dir：远程资源下载的临时目录。

hive.server2.webui.host：hiveserver2 WebUI 页面访问地址。

hive.server2.webui.port: hiveserver2 WebUI 页面访问端口。

除此之外，Hive 默认配置文件为

hive-default.xml.template。若用户没有对相关配置参数进行修改，Hive 将读取默认配置文件参数进行启动，

其他重要配置信息如： hive.metastore.warehouse.dir：HDFS 上存储配置数据仓库的目录。 hive.exec.scratchdir：hive 用来存储不同阶段的 map/reduce 的执行计划的目录，同时也存储中间输出结果。

#master
[hadoop@master ~]$ jps
1905 DFSZKFailoverController
1205 QuorumPeerMain
2021 ResourceManager
1707 JournalNode
2140 NodeManager
3037 Jps
1390 NameNode
1503 DataNode
#slave1
[hadoop@slave1 ~]$ jps
1442 DataNode
1556 JournalNode
1732 NodeManager
1333 NameNode
1643 DFSZKFailoverController
1230 QuorumPeerMain
1902 ResourceManager
2030 Jps
#slave2
[hadoop@slave2 ~]$ jps
1282 QuorumPeerMain
1592 NodeManager
1385 DataNode
1485 JournalNode
1725 Jps

[hadoop@master ~]$ hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@master ~]$ hdfs dfs -ls /user
Found 2 items
drwx------   - hadoop supergroup          0 2023-05-30 23:23 /user/hadoop
drwxr-xr-x   - hadoop supergroup          0 2023-06-06 23:20 /user/hive
[hadoop@master ~]$ hdfs dfs -ls /user/hive
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2023-06-06 23:20 /user/hive/warehouse
[hadoop@master ~]$ hdfs dfs -ls /user/hive/warehouse
[hadoop@master ~]$ hdfs dfs -chmod g+w /user/hive/warehouse
[hadoop@master ~]$ hdfs dfs -ls /user/hive
Found 1 items
drwxrwxr-x   - hadoop supergroup          0 2023-06-06 23:20 /user/hive/warehouse
[hadoop@master ~]$ mkdir -p /usr/local/src/hive/tmp

3.2. 步骤二：配置 hive-env.xml

[hadoop@master conf]$ vi hive-env.sh
#在末尾添加
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export HIVE_CONF_DIR=/usr/local/src/hive/conf
export HIVE_AUX_JARS_PATH=/usr/local/src/hive/lib
#需要将 Hive 连接 MySQL 的驱动器文件上传至 Hive 的 lib 文件夹下。
[hadoop@master ~]$ cp mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/ 
#数据库的初始化
[hadoop@master ~]$ schematool -initSchema -dbType mysql 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://master:3306/hive_db?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Thu Jun 08 05:13:46 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.mysql.sql
Thu Jun 08 05:13:46 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Thu Jun 08 05:13:47 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
[hadoop@master ~]$ 
#启动数据库
[hadoop@master ~]$ mysql -u root -pPasswd123!
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use hive_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive_db         |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
+---------------------------+
55 rows in set (0.00 sec)

mysql> quit
Bye

3.3. 步骤三：启动 Hive

启动 Hadoop 集群，MySQL 后，命令框输入 hive。

[hadoop@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
......
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
#创建新表并验证 MySQL
hive> show databases; 
OK
default
Time taken: 0.571 seconds, Fetched: 1 row(s)
hive> create database hive_test_db; 
OK
Time taken: 0.14 seconds
hive> use hive_test_db; 
OK
Time taken: 0.018 seconds
hive>  create table t_user(id int, name string); 
OK
Time taken: 0.164 seconds
hive>  show tables; 
OK
t_user
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive> exit;
#打开 MySQL 数据库,查看hive创建的表
[hadoop@master ~]$ mysql -u root -pPasswd123!
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use hive_db; 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from TBLS;
+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+
|      1 |  1686215906 |     2 |                0 | hadoop |         0 |     1 | t_user   | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)

mysql> quit
Bye

实验任务四：beeline CLI 远程访问 Hive

4.1. 步骤一：修改 Hadoop 的 core-site.xml 文件

使用 beeline CLI 访问 Hive，需要在 Hadoop 集群中为 Hive 赋予代理用户权限。

[hadoop@master ~]$  vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
[hadoop@slave1 ~]$  vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
[hadoop@slave2 ~]$  vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
#在其中都添加
     <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
     </property>
     <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
     </property>
#启动 hiveserver2 服务，等待时间较长，此页面不要关闭。

4.2. 步骤二：添加远程访问节点

假设在 slaves1 节点进行远程访问，首先需要在 slaves1 节点上添加 Hive 组件，从 ma ster 节点复制即可。

[root@master ~]# scp -r /usr/local/src/hive root@slave1:/usr/local/src/

[root@master ~]# chown -R hadoop:hadoop /usr/local/src
#slaves1 配置 hadoop 用户环境变量，可以使用 Hive 命令。 
[root@slave1 ~]# vi /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/src/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$HIVE_HOME/bin:$PATH

[root@slave1 ~]# su - hadoop
Last login: Sat Jun 10 02:11:42 EDT 2023 from 192.168.100.1 on pts/0

4.3. 步骤三：远程访问

通过 jdbc 连接到 beeline 就可以对 hive 进行操作了。在 slaves1 节点 hadoop 用户下执行以下命令。

[hadoop@slave1 ~]$ beeline -u jdbc:hive2://master:10000
Connecting to jdbc:hive2://master:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connected to: Apache Hive (version 2.0.0)
Driver: Hive JDBC (version 2.0.0)
23/06/10 02:25:16 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.0.0 by Apache Hive
0: jdbc:hive2://master:10000>

实验二 Hive 库操作与表操作

1. 实验任务一：Hive 库操作

1.1. 步骤一：创建数据库

[hadoop@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Sat Jun 10 04:22:04 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:05 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:05 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:05 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:06 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:06 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:06 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Jun 10 04:22:06 EDT 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> create database school;
OK
Time taken: 0.596 seconds
hive> show databases;
OK
default
hive_test_db
school
Time taken: 0.178 seconds, Fetched: 3 row(s)

语法：CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name

[COMMENT database_comment] //关于数据块的描述

[LOCATION hdfs_path] //指定数据库在 HDFS 上的存储位置

[WITH DBPROPERTIES (property_name=property_value, ...)];

1.2. 步骤二：修改数据库并显示数据库详细信息

#添加自定义属性，创建者为 H3C。
hive> alter database school set dbproperties('creater'='H3C');
OK
Time taken: 0.039 seconds
hive> desc database extended school;
OK
school		hdfs://mycluster/user/hive/warehouse/school.db	hadoop	USER	{creater=H3C}
Time taken: 0.014 seconds, Fetched: 1 row(s)
#修改数据库的所有者为 root。
hive> alter database school set owner user root;
OK
Time taken: 0.024 seconds
hive> desc database extended school;
OK
school		hdfs://mycluster/user/hive/warehouse/school.db	root	USER	{creater=H3C}
Time taken: 0.009 seconds, Fetched: 1 row(s)
#语法：desc database [extended] database_name;

1.3. 步骤三：删除数据库并显示全部数据库

hive> drop database school;
Moved: 'hdfs://mycluster/user/hive/warehouse/school.db' to trash at: hdfs://mycluster/user/hadoop/.Trash/Current
OK
Time taken: 0.17 seconds
hive> show databases;
OK
default
hive_test_db
Time taken: 0.006 seconds, Fetched: 2 row(s)
#语法：show <database_name>
#drop database <database_name>

2. 实验任务二：Hive 表操作

2.1. 步骤一：创建表

在 school 数据库中建立 teacher 表，具有工号、姓名、学科，授课年级四项属性。创建之前需要使用 use 语言切换操作数据库。

hive> create database school;
OK
Time taken: 0.031 seconds
hive> use school;
OK
Time taken: 0.014 seconds
hive> create table teacher(
    > num int,
    > name string,
    > email map<string,int>,
    > class array<string>);
OK
Time taken: 0.193 seconds

Hive 默认创建的普通表被称为管理表或内部表。表的数据由 Hive 进行统一管理，默认存储于数据仓库目录中。可通过 Hive 的配置文件 hive-site.xml 对进行修改。可以通过 Hadoop web UI 进行查看创建结果。

分区表：目的是为了避免暴力扫描，一个分区就是 HDFS 上的一个独立文件夹；Hive 的分区就是 HDFS 的目录分割；注意的是，创建表时指定的表的列中不应该包含分区列，分区列需要使用关键词 partitio ned by 在后面单独指定。

hive> create table teacher(
    > num int,
    > name string,
    > email map<string,int>,
    > class array<string>)
    > partitioned by (age int)
    > row format delimited fields terminated by ",";
OK
Time taken: 0.042 seconds

分桶表：可以将表或者分区进一步细化成桶，是对数据进行更细粒度的划分，以便获得更高的查询效率。桶在数据存储上与分居不同的是，一个分区会存储于一个目录数据文件存储于该目录中，而一个桶将存储为一个文件，数据内容存储于文件中。注意的是，创建桶表时指定桶列需要提前创建，使用关键词 cl ustered by 在后面单独指定，并指定分为多少个桶（buckets）。

hive> drop table teacher;
Moved: 'hdfs://mycluster/user/hive/warehouse/school.db/teacher' to trash at: hdfs://mycluster/user/hadoop/.Trash/Current
OK
Time taken: 0.093 seconds
hive> create table teacher(
    > num int,
    > name string,
    > email map<string,int>,
    > class array<string>)
    > clustered by (num) into 4 buckets
    > row format delimited fields terminated by ",";
OK
Time taken: 0.058 seconds

2.2. 步骤二：查看所有表与详细信息

hive> show tables;
OK
teacher
Time taken: 0.019 seconds, Fetched: 1 row(s)
hive> desc teacher;
OK
num                 	int                 	                    
name                	string              	                    
email               	map<string,int>     	                    
class               	array<string>       	                    
Time taken: 0.045 seconds, Fetched: 4 row(s)
hive> desc formatted teacher;
OK
# col_name            	data_type           	comment             
	 	 
num                 	int                 	                    
name                	string              	                    
email               	map<string,int>     	                    
class               	array<string>       	                    
	 	 
# Detailed Table Information	 	 
Database:           	school              	 
Owner:              	hadoop              	 
CreateTime:         	Sat Jun 10 04:33:22 EDT 2023	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://mycluster/user/hive/warehouse/school.db/teacher	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1686386002          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
Compressed:         	No                  	 
Num Buckets:        	4                   	 
Bucket Columns:     	[num]               	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.032 seconds, Fetched: 29 row(s)
#语法：desc <table_name>
#也可以复制一个已经存在的表。
hive> create table teacher2 like teacher;
OK
Time taken: 0.071 seconds
#语法：create <table_name> like <table_copy_name>;

2.3. 步骤三：修改表

#修改表名。
hive> alter table teacher rename to new_teacher;
OK
Time taken: 0.068 seconds
#语法：alter table <table_name> rename to <new_table_name>
#修改表的列名，数据类型，列注释和列所在的位置
hive> alter table new_teacher change num number string comment 'the num of teacher, change datatype to string ' after name;
OK
Time taken: 0.064 seconds
#语法：alter table <table_name> change [cloumn] <col_old_name> <col_new_name>column_type [conmment col_conmment] [first|after column_name];
#增加/更新列
hive> alter table new_teacher add columns(age int);
OK
Time taken: 0.051 seconds
#语法：alter table <table_name> add|replace columns (<col_name> data_type [conmment col_comment], ...);

2.4. 步骤四：删除表退出

hive> drop table teacher2;
Moved: 'hdfs://mycluster/user/hive/warehouse/school.db/teacher2' to trash at: hdfs://mycluster/user/hadoop/.Trash/Current
OK
Time taken: 0.076 seconds
#语法：drop table <table_name>
hive> truncate table new_teacher;
OK
Time taken: 0.055 seconds
#语法：truncate table <table_name>
hive> exit;

实验三 Hive 查询

1. 实验任务一：数据导入

1.1. 步骤一：插入单条数据并查询

hive> use school;
OK
Time taken: 0.443 seconds
hive> create table student(
    > num int,
    > name string,
    > class string,
    > body map<string,int>,
    > exam array<string>)
    > row format delimited
    > fields terminated by '|'
    > collection items terminated by ','
    > map keys terminated by ':'
    > lines terminated by '\n';
OK
Time taken: 0.418 seconds
hive> create table lib(
    > num int,
    > book string)
    > row format delimited
    > fields terminated by '|'
    > collection items terminated by ','
    > map keys terminated by ':'
    > lines terminated by '\n';
OK
Time taken: 0.069 seconds
hive> create table price(
    > book string,
    > price int)
    > row format delimited
    > fields terminated by '|'
    > collection items terminated by ','
    > map keys terminated by ':'
    > lines terminated by '\n';
OK
Time taken: 0.054 seconds
hive> insert into student (num,name,class,body,exam) select 20200101,'Wlen','grade 2',map('height',175,'weight',60),array('80','70');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0001, Tracking URL = http://master:8088/proxy/application_1686377752035_0001/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2023-06-10 04:45:24,467 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:45:34,734 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.23 sec
MapReduce Total cumulative CPU time: 1 seconds 230 msec
Ended Job = job_1686377752035_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/user/hive/warehouse/school.db/student/.hive-staging_hive_2023-06-10_04-44-58_884_4811180455081269010-1/-ext-10000
Loading data to table school.student
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.23 sec   HDFS Read: 5145 HDFS Write: 119 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 230 msec
OK
Time taken: 37.224 seconds

1.2. 步骤二：批量导入

在插入数据过程中调用 MR 操作，效率低下，既 Hive 的数据通常为 Load 批量导入。准备数据，数据可以为本地存储，也可以是 HDFS 上。

hive>  load data local inpath '/home/hadoop/student.txt' into table student;
Loading data to table school.student
OK
Time taken: 0.228 seconds
hive>  load data local inpath '/home/hadoop/lib.txt' into table lib;
Loading data to table school.lib
OK
Time taken: 0.16 seconds
hive>  load data local inpath '/home/hadoop/price.txt' into table price;
Loading data to table school.price
OK
Time taken: 0.137 seconds
hive> select * from student;
OK
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
Time taken: 0.245 seconds, Fetched: 8 row(s)
hive> select * from lib;
OK
20200102	War and Peace
20200104	Math
20200105	Notre Dame DE Paris
20200107	Romance of The Three Kingdoms
20200117	Shuihu Quanchuan
20200117	Feng Menglong
20200117	Waking World Hengyan
20200104	how to use hive?
20200104	hbase
20200102	Chronicles
20200102	Hadoop
Time taken: 0.04 seconds, Fetched: 11 row(s)
hive> select * from price;
OK
War and Peace	55
Math	40
Notre Dame DE Paris	36
Romance of The Three Kingdoms	22
Shuihu Quanchuan	202
Feng Menglong	100
Waking World Hengyan	40
how to use hive?	40
hbase	66
Chronicles	22
Hadoop	45
Time taken: 0.048 seconds, Fetched: 11 row(s

语法：load data local inpath [overwrite] into table < table_name>;

当数据存储地址为 HDFS 时，相当于对数据在 HDFS 上进行迁移。

语法：load data inpath [overwrite] into table ;

实验任务二：select 查询

2.1. 步骤一：简单查询

Hive 中的 SELECT 基础语法和标准 SQL 语法基本一致，支持 WHERE、DISTINCT、GROUP B Y、ORDER BY、HAVING、LIMIT、子查询等；在所有的数据库系统中，SELECT 语句是使用最多，也最复杂的一块。

hive> select * from student;
OK
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
Time taken: 0.245 seconds, Fetched: 8 row(s)
hive> select * from lib;
OK
20200102	War and Peace
20200104	Math
20200105	Notre Dame DE Paris
20200107	Romance of The Three Kingdoms
20200117	Shuihu Quanchuan
20200117	Feng Menglong
20200117	Waking World Hengyan
20200104	how to use hive?
20200104	hbase
20200102	Chronicles
20200102	Hadoop
Time taken: 0.04 seconds, Fetched: 11 row(s)
hive> select * from price;
OK
War and Peace	55
Math	40
Notre Dame DE Paris	36
Romance of The Three Kingdoms	22
Shuihu Quanchuan	202
Feng Menglong	100
Waking World Hengyan	40
how to use hive?	40
hbase	66
Chronicles	22
Hadoop	45
Time taken: 0.048 seconds, Fetched: 11 row(s)

2.2. 步骤二：条件查询

1.where 语句

Where 语句是属于布尔表达式，其后可以与“且”，“或”进行联动进行复杂查询。注意特殊字段的查找方式。

hive> select * from student where class = 'grade 4';
OK
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
Time taken: 0.245 seconds, Fetched: 2 row(s)
hive> select * from student where exam[0] = 96 or exam[1]=77;
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
Time taken: 0.07 seconds, Fetched: 3 row(s)
hive> select * from student where body['height'] = 170;
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
Time taken: 0.04 seconds, Fetched: 7 row(s)
#2. distinct 和 all 语句
#distinct 用途为去除重复的项，不加参数则默认为 all。distinct 参数执行时调用 MR进行排序，对大规模数据效率较高。
hive> select class from student;
OK
grade 2
grade 1
grade 3
grade 4
grade 5
grade 2
grade 4
grade 2
Time taken: 0.054 seconds, Fetched: 8 row(s)
hive> select distinct class from student;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0002, Tracking URL = http://master:8088/proxy/application_1686377752035_0002/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-06-10 04:54:40,739 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:54:50,973 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.68 sec
2023-06-10 04:54:57,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.68 sec
MapReduce Total cumulative CPU time: 2 seconds 680 msec
Ended Job = job_1686377752035_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.68 sec   HDFS Read: 9093 HDFS Write: 40 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 680 msec
OK
grade 1
grade 2
grade 3
grade 4
grade 5
Time taken: 36.816 seconds, Fetched: 5 row(s)
#3. group by 与 having 语句
#group by 是对列进行分组查询，having 可以对 group by 结果进行进一步的过滤。Having 与 where 的区别是，where 后面不能写分组函数，而 having 后面可以使用分组函数。
hive> select class ,count(*) from student group by class;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0003, Tracking URL = http://master:8088/proxy/application_1686377752035_0003/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-06-10 04:55:39,728 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:55:48,359 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2023-06-10 04:55:55,613 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.26 sec
MapReduce Total cumulative CPU time: 2 seconds 260 msec
Ended Job = job_1686377752035_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.26 sec   HDFS Read: 9761 HDFS Write: 50 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 260 msec
OK
grade 1	1
grade 2	3
grade 3	1
grade 4	2
grade 5	1
Time taken: 36.105 seconds, Fetched: 5 row(s)
hive> select class ,count(*) num from student group by class having num >=2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0004, Tracking URL = http://master:8088/proxy/application_1686377752035_0004/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-06-10 04:56:17,054 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:56:24,305 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.38 sec
2023-06-10 04:56:32,490 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.73 sec
MapReduce Total cumulative CPU time: 2 seconds 730 msec
Ended Job = job_1686377752035_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.73 sec   HDFS Read: 10154 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 730 msec
OK
grade 2	3
grade 4	2
Time taken: 32.365 seconds, Fetched: 2 row(s)

2.3. 步骤三：limit 限制语句与 union 联合

语法:

Select * from limit select UNION [ALL | DISTINCT]

select UNION [ALL | DISTINCT] select …

Limit 限制查询的说明范围，当大数据查询时，行出过多需要限制查询。union 把多个 se lect 的结果并集展示出来，可以多表联动。

hive> select * from student limit 2,4;
OK
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
Time taken: 0.096 seconds, Fetched: 4 row(s)
hive> select class from student union select num from student;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0005, Tracking URL = http://master:8088/proxy/application_1686377752035_0005/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-06-10 04:57:37,641 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:57:45,827 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.88 sec
2023-06-10 04:57:49,955 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.79 sec
MapReduce Total cumulative CPU time: 1 seconds 790 msec
Ended Job = job_1686377752035_0005
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 1.79 sec   HDFS Read: 10465 HDFS Write: 112 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 790 msec
OK
20200101
20200102
20200103
20200104
20200105
20200106
20200107
20200117
grade 1
grade 2
grade 3
grade 4
grade 5
Time taken: 30.427 seconds, Fetched: 13 row(s)

2.4. 步骤四：order by 排序与 sort by 排序

order by 为全局排序，后面可以有多列进行排序，默认按字典排序。对于大规模的数据集 order by 的效率非常低。在很多情况下，并不需要全局排序，此时可以使用 sort by。以下对成绩进行排序，结果虽然一样。当遇到大规模数据时，sort by 可以通过修改 reducer 个数，为每个 reducer 产生一个排序文件。每个 Reducer 内部进行排序，对全局结果集来说不是排序。而 order by 只调用一个 Reducer 进行计算。

hive> set mapreduce.job.reduces=3;
hive> select * from student sort by exam[0];
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0006, Tracking URL = http://master:8088/proxy/application_1686377752035_0006/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2023-06-10 04:58:55,198 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:59:01,623 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.16 sec
2023-06-10 04:59:06,792 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2.26 sec
2023-06-10 04:59:09,909 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3.25 sec
2023-06-10 04:59:11,981 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.92 sec
MapReduce Total cumulative CPU time: 4 seconds 920 msec
Ended Job = job_1686377752035_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 4.92 sec   HDFS Read: 19744 HDFS Write: 403 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 920 msec
OK
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]
Time taken: 30.54 seconds, Fetched: 8 row(s)
hive> select * from student order by exam[0];
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0007, Tracking URL = http://master:8088/proxy/application_1686377752035_0007/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-06-10 04:59:41,697 Stage-1 map = 0%,  reduce = 0%
2023-06-10 04:59:46,858 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.76 sec
2023-06-10 04:59:55,017 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.66 sec
MapReduce Total cumulative CPU time: 1 seconds 660 msec
Ended Job = job_1686377752035_0007
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 1.66 sec   HDFS Read: 10215 HDFS Write: 403 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 660 msec
OK
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
Time taken: 27.343 seconds, Fetched: 8 row(s)

2.5. 步骤五：JOIN 多表查询

语法：select from join on =< table.clo > join on =< table.clo >…

join 可以连接多表联合查询，要求查询的条件在连个表中，而且连接 n 个表，至少需要 n-1 个连接条件。例如：连接三个表，至少需要两个连接条件。本例有两个表，以为学生表具有学号 num,一个图书馆表，具有学号以及借书名称。

#查询每个人的借书的名称。
hive>  select * from student join lib on student.num =lib.num;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0.log
2023-06-10 05:00:46	Starting to launch local task to process map join;	maximum memory = 477626368
2023-06-10 05:00:47	Dump the side-table for tag: 1 with group count: 5 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-00-40_633_7839578738073197683-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2023-06-10 05:00:47	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-00-40_633_7839578738073197683-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (570 bytes)
2023-06-10 05:00:47	End of local task; Time Taken: 1.637 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0008, Tracking URL = http://master:8088/proxy/application_1686377752035_0008/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0008
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2023-06-10 05:01:06,116 Stage-3 map = 0%,  reduce = 0%
2023-06-10 05:01:13,517 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.24 sec
MapReduce Total cumulative CPU time: 1 seconds 240 msec
Ended Job = job_1686377752035_0008
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 1.24 sec   HDFS Read: 8649 HDFS Write: 819 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	War and Peace
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Chronicles
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Hadoop
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	Math
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	how to use hive?
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	hbase
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]	20200105	Notre Dame DE Paris
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]	20200107	Romance of The Three Kingdoms
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Shuihu Quanchuan
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Feng Menglong
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Waking World Hengyan
Time taken: 34.973 seconds, Fetched: 11 row(s)
#左连接，查询每个人的借书的名称。区别是左表的信息都显示，例如：下图中学号 20200103 与 20200106 没有借书记录，显示为 NULL。
hive>  select * from student left outer join lib on student.num =lib.num;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0.log
2023-06-10 05:02:07	Starting to launch local task to process map join;	maximum memory = 477626368
2023-06-10 05:02:08	Dump the side-table for tag: 1 with group count: 5 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-02-02_274_1444802490687647390-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2023-06-10 05:02:08	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-02-02_274_1444802490687647390-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (570 bytes)
2023-06-10 05:02:08	End of local task; Time Taken: 1.381 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0009, Tracking URL = http://master:8088/proxy/application_1686377752035_0009/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0009
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2023-06-10 05:02:22,255 Stage-3 map = 0%,  reduce = 0%
2023-06-10 05:02:30,440 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.85 sec
MapReduce Total cumulative CPU time: 850 msec
Ended Job = job_1686377752035_0009
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 0.85 sec   HDFS Read: 8324 HDFS Write: 988 SUCCESS
Total MapReduce CPU Time Spent: 850 msec
OK
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]	NULL	NULL
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	War and Peace
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Chronicles
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Hadoop
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]	NULL	NULL
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	Math
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	how to use hive?
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	hbase
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]	20200105	Notre Dame DE Paris
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"NULL	NULL
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]	20200107	Romance of The Three Kingdoms
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Shuihu Quanchuan
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Feng Menglong
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Waking World Hengyan
Time taken: 29.213 seconds, Fetched: 14 row(s)
#与左连接对应的是右连接，右表全部显示。
hive> select * from student right outer join lib on student.num =lib.num;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0.log
2023-06-10 05:03:57	Starting to launch local task to process map join;	maximum memory = 477626368
2023-06-10 05:03:58	Dump the side-table for tag: 0 with group count: 8 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-03-52_769_7025334573066874629-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile20--.hashtable
2023-06-10 05:03:58	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-03-52_769_7025334573066874629-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile20--.hashtable (842 bytes)
2023-06-10 05:03:58	End of local task; Time Taken: 1.274 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0010, Tracking URL = http://master:8088/proxy/application_1686377752035_0010/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0010
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2023-06-10 05:04:12,244 Stage-3 map = 0%,  reduce = 0%
2023-06-10 05:04:22,726 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.58 sec
MapReduce Total cumulative CPU time: 1 seconds 580 msec
Ended Job = job_1686377752035_0010
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 1.58 sec   HDFS Read: 7853 HDFS Write: 819 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 580 msec
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	War and Peace
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	Math
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]	20200105	Notre Dame DE Paris
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]	20200107	Romance of The Three Kingdoms
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Shuihu Quanchuan
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Feng Menglong
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Waking World Hengyan
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	how to use hive?
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	hbase
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Chronicles
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Hadoop
Time taken: 32.02 seconds, Fetched: 11 row(s)

#全连接是左右连接的综合使用，显示两张表的所有信息。若没有关联项目则放到最后显示为 NULL。
hive> select * from student full outer join lib on student.num =lib.num; 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1686377752035_0011, Tracking URL = http://master:8088/proxy/application_1686377752035_0011/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0011
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 3
2023-06-10 05:04:58,995 Stage-1 map = 0%,  reduce = 0%
2023-06-10 05:05:10,224 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.04 sec
2023-06-10 05:05:15,439 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.24 sec
2023-06-10 05:05:16,474 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.46 sec
2023-06-10 05:05:21,686 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.93 sec
MapReduce Total cumulative CPU time: 6 seconds 930 msec
Ended Job = job_1686377752035_0011
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 3   Cumulative CPU: 6.93 sec   HDFS Read: 30074 HDFS Write: 988 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 930 msec
OK
20200101	Wlen	grade 2	{"height":175,"weight":60}	["80","70"]	NULL	NULL
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	hbase
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	how to use hive?
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	Math
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]	20200107	Romance of The Three Kingdoms
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Hadoop
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Chronicles
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	War and Peace
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]	20200105	Notre Dame DE Paris
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Waking World Hengyan
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Feng Menglong
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Shuihu Quanchuan
20200103	Will	grade 3	{"height":170,"weight":61}	["66","77"]	NULL	NULL
20200106	ZhangSan	grade 2	{"height":170,"weight":61}	["85","63"NULL	NULL
Time taken: 36.909 seconds, Fetched: 14 row(s)

#半连接只显示左表内容，即显示跟连接的右表有关系的左表内容。
hive> select * from student left semi join lib on student.num =lib.num;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0.log
2023-06-10 05:06:04	Starting to launch local task to process map join;	maximum memory = 477626368
2023-06-10 05:06:05	Dump the side-table for tag: 1 with group count: 5 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-05-58_966_5774898529423073407-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable
2023-06-10 05:06:05	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-05-58_966_5774898529423073407-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable (370 bytes)
2023-06-10 05:06:05	End of local task; Time Taken: 1.653 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0012, Tracking URL = http://master:8088/proxy/application_1686377752035_0012/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0012
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2023-06-10 05:06:23,378 Stage-3 map = 0%,  reduce = 0%
2023-06-10 05:06:31,065 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec
MapReduce Total cumulative CPU time: 1 seconds 80 msec
Ended Job = job_1686377752035_0012
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 1.08 sec   HDFS Read: 8455 HDFS Write: 252 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 80 msec
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]
Time taken: 33.165 seconds, Fetched: 5 row(s)

#多表（两个以上）查询，例如：查询学生租借的书以及书的价格。大多数情况下，Hive 会对每对 jion 连接对象启动一个 MapReduce 任务。
hive> select * from student join lib on student.num =lib.num join price on lib.book=price.book;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20230610044241_78508f96-4a85-413b-a35c-84298837bce0.log
2023-06-10 05:07:17	Starting to launch local task to process map join;	maximum memory = 477626368
2023-06-10 05:07:19	Dump the side-table for tag: 1 with group count: 11 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-07-13_115_3330847160014567255-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile41--.hashtable
2023-06-10 05:07:19	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-07-13_115_3330847160014567255-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile41--.hashtable (635 bytes)
2023-06-10 05:07:19	Dump the side-table for tag: 1 with group count: 5 into file: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-07-13_115_3330847160014567255-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile51--.hashtable
2023-06-10 05:07:19	Uploaded 1 File to: file:/usr/local/src/hive/tmp/0f163d4a-8541-44a1-ae51-595fc9d5b7e4_resources/0f163d4a-8541-44a1-ae51-595fc9d5b7e4/hive_2023-06-10_05-07-13_115_3330847160014567255-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile51--.hashtable (570 bytes)
2023-06-10 05:07:19	End of local task; Time Taken: 1.663 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1686377752035_0013, Tracking URL = http://master:8088/proxy/application_1686377752035_0013/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1686377752035_0013
Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 0
2023-06-10 05:07:35,005 Stage-5 map = 0%,  reduce = 0%
2023-06-10 05:07:45,310 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 1.88 sec
MapReduce Total cumulative CPU time: 1 seconds 880 msec
Ended Job = job_1686377752035_0013
MapReduce Jobs Launched: 
Stage-Stage-5: Map: 1   Cumulative CPU: 1.88 sec   HDFS Read: 10712 HDFS Write: 1016 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 880 msec
OK
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	War and Peace	War and Peace	55
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Chronicles	Chronicles	22
20200102	Michael	grade 1	{"height":170,"weight":61}	["81","77"]	20200102	Hadoop	Hadoop	45
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	Math	Math	40
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	how to use hive?	how to use hive?	40
20200104	Shelley	grade 4	{"height":170,"weight":61}	["63","79"]	20200104	hbase	hbase	66
20200105	Lucy	grade 5	{"height":170,"weight":61}	["96","72"]	20200105	Notre Dame DE Paris	Notre Dame DE Paris	36
20200107	WangWu	grade 4	{"height":170,"weight":61}	["0","71"]	20200107	Romance of The Three Kingdoms	Romance of The Three Kingdoms	22
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Shuihu Quanchuan	Shuihu Quanchuan	202
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Feng Menglong	Feng Menglong	100
20200117	LiSi	grade 2	{"height":170,"weight":61}	["55","70"]	20200117	Waking World Hengyan	Waking World Hengyan	40
Time taken: 34.267 seconds, Fetched: 11 row(s)

标签：进阶,weight,grade,hadoop,hive,height,Hive,组件,170
From： https://www.cnblogs.com/shuangmu668/p/17533036.html

Hive 组件（进阶中级）