首页 > 其他分享 >Hadoop大数据平台搭建

Hadoop大数据平台搭建

时间:2023-09-04 16:44:07浏览次数:32  
标签:src root 平台 hadoop Hadoop master usr local 搭建

Hadoop平台搭建

了解Hadoop

一、Hadoop是什么?

Hadoop是一个由Apache基金会所开发的分布式系统基础架构, 是一个存储系统+计算框架的软件框架。主要解决海量数据存储与计算的问题,是大数据技术中的基石。Hadoop以一种可靠、高效、可伸缩的方式进行数据处理,用户可以在不了解分布式底层细节的情况下,开发分布式程序,用户可以轻松地在Hadoop上开发和运行处理海量数据的应用程序。

二、Hadoop能解决什么问题

1、海量数据存储

HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(High throughput)来访问数据,适合那些有着超大数据集(large data set)的应用程序,它由n台运行着DataNode的机器组成和1台(另外一个standby)运行NameNode进程一起构成。每个DataNode 管理一部分数据,然后NameNode负责管理整个HDFS 集群的信息(存储元数据)。

2、资源管理,调度和分配

Apache Hadoop YARN(Yet Another Resource Negotiator,另一种资源协调者)是一种新的 Hadoop 资源管理器,它是一个通用资源管理系统和调度平台,可为上层应用提供统 一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨 大好处。

一、基础环境准备

1.master、slave1、slave2三台主机上配置以下信息

[root@localhost ~]# cd /etc/sysconfig/network-scripts
[root@localhost network-scripts]# ls
ifcfg-ens33  ifdown-isdn      ifdown-tunnel  ifup-isdn    ifup-Team
ifcfg-lo     ifdown-post      ifup           ifup-plip    ifup-TeamPort
ifdown       ifdown-ppp       ifup-aliases   ifup-plusb   ifup-tunnel
ifdown-bnep  ifdown-routes    ifup-bnep      ifup-post    ifup-wireless
ifdown-eth   ifdown-sit       ifup-eth       ifup-ppp     init.ipv6-global
ifdown-ippp  ifdown-Team      ifup-ippp      ifup-routes  network-functions
ifdown-ipv6  ifdown-TeamPort  ifup-ipv6      ifup-sit     network-functions-ipv6
[root@localhost network-scripts]# vi ifcfg-ens33 
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="d34f3ba3-f9a4-4669-8b12-6b8712e5e3f0"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.10.10
NETMASK=255.255.255.0
GATEWAY=192.168.10.2
DNS1=8.8.8.8
DNS2=114.114.114.114

在resolv.conf文件中添加如下两行:

vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4

保存后执行下列命令

[root@localhost network-scripts]# systemctl restart NetworkManager
[root@localhost network-scripts]# ifdown ens33;ifup ens33
重启网络: service network restart

2.设置主机名(在master、slave1、slave2三台主机上配置以下信息)

[root@localhost ~]# hostnamectl set-hostname master.example.com
[root@localhost ~]# bash
[root@master ~]# hostname
master.example.com

3.配置主机映射关系(在master、slave1、slave2三台主机上配置以下信息)

[root@master ~]# vi /etc/hosts
[root@master ~]# cat /etc/hosts
192.168.10.10 master master.example.com
192.168.10.20 slave1 slave1.example.com
192.168.10.30 slave2 slave2.example.com

保存以上配置以后执行以下命令

ping master
ping slave1
ping slave2

关闭防火墙与SELinux(在所有节点执行以下命令)

systemctl disable --now firewalld
vi /etc/selinux/config
SELINUX=disabled

保存配置后执行以下命令

setenforce 0

4.创建用户hadoop(在每个节点上执行)

[root@master ~]# useradd hadoop
[root@master ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.

5.配置ssh服务配置文件,开启公钥登录功能(在每个节点上执行)

[root@master ~]# vi /etc/ssh/sshd_config
PubkeyAuthentication yes

保存以上配置以后执行以下命令

[root@master ~]# systemctl restart sshd

6.配置自我免密登录(在每个节点上运行)

切换到hadoop用户并生成密钥对

执行代码
su - hadoop
ssh-keygen -t rsa -P ''
ls -l ~/.ssh
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
cd
ssh localhost
exit
================================================================================================
[root@master ~]# su - hadoop
Last login: Tue Aug  1 00:59:22 CST 2023 on pts/0
[hadoop@master ~]$ ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:iKQ+XKgHmsIvaKcFIWQ/H9s5Ep+Fi4TCvxGF9pfwXuo [email protected]
The key's randomart image is:
+---[RSA 2048]----+
| o  ..           |
|+ .oo.  .        |
|.+.=o+o...       |
|. ++=oB==.       |
|..ooo=+BS        |
|o*..o .o.        |
|*.=o  .          |
|o+oo   E         |
|..+.             |
+----[SHA256]-----+
[hadoop@master ~]$ ls -l ~/.ssh
total 8
-rw-------. 1 hadoop hadoop 1675 Aug  1 01:05 id_rsa
-rw-r--r--. 1 hadoop hadoop  407 Aug  1 01:05 id_rsa.pub
[hadoop@master ~]$ cd ~/.ssh
[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@master .ssh]$ chmod 600 authorized_keys
[hadoop@master .ssh]$ cd
[hadoop@master ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Tue Aug  1 01:05:26 2023
[hadoop@master ~]$ exit
logout
Connection to localhost closed.

配置ssh免密登录(root用户)

master节点配置
ssh-keygen -t rsa
ssh-copy-id root@slave1
ssh-copy-id root@slave2

slave1节点配置
ssh-keygen -t rsa
ssh-copy-id root@master
ssh-copy-id root@slave2

slave2节点配置
ssh-keygen -t rsa
ssh-copy-id root@master
ssh-copy-id root@slave1

验证
root模式下,ssh不需要输入各个主机的密码可以成功登录,则ssh配置成功

7.配置master免密登录slave1和slave2节点(以下是hadoop用户免密登录)

在master节点上执行以下命令

su - hadoop
scp ~/.ssh/id_rsa.pub hadoop@slave1:~/
scp ~/.ssh/id_rsa.pub hadoop@slave2:~/
在所有slave节点执行以下命令
su - hadoop
cat id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/authorized_keys
rm -f ~/id_rsa.pub
================================================================================================
示例
[hadoop@slave1 ~]$ cat id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@slave1 ~]$ cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4PgWS5nfgez0QTt27/0KhCEZqtFWoa7zcceDrb+tAKWJDuDrvtzLbumIU29a6Y2ZpjzqwJppnoLSILbra4zNs8x1s32k+xcpUtOGqgdb5UyJUzhuM1qwyB3D8eCZJ4nN8N5GtmiSyqIcz64VLBIanVSZsPFak5xvXZFbdbd7dhKJb64EV4TiExPHmHSMs/0jucp4LgCvNTGalF4WHogpmvyN2ZKNHf4EARutiRSoIV3rxhXeS80p0RSX7Xzik0UhYMUc6VGfnbS4qrbfyEzM9pVRxkZhfnfXaoLnWg8sCj1vXlNS8Z7gT13hIoulw1GZZOsQAVX+DowGlof1T11wB [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCgMLqBYATCZW0zD0tRAVUG8BkIZLq6TzM91YGiT6jNrqkdkF2+qyJymwNIQNodPsqxyEkbr0zMwKTsCXkLA/tbsP1q53Zth6uW0Ls8sT7bqWkVFfXt7MMS7F2cb9Z81pudU0Ze2WvYTHDsqOemapaU6Ux9+bPIU5KTJlEmyfpvSLKSf9zSTAwTZmfmZZOmZdFB0iDH77sM2xXGXUuyTlhXC2wpTtEkwHIMn3kYuVmZvHSm9CN3skbO0x3c8AFyKWcTc13tFgVwGMgaxa+ajyp8Pt1xnub72D0pzOKpEzZmFYCzAiK1luitAZiWbMtRpCmLKkMfFl+dyyvkfWLZrxI7 [email protected]
[hadoop@slave1 ~]$ rm -f ~/id_rsa.pub

在master节点上执行以下命令
su - hadoop
ssh slave1
exit
ssh slave2
exit

[hadoop@master ~]$ ssh slave1
Last login: Tue Aug  1 01:15:25 2023 from localhost
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.
[hadoop@master ~]$ ssh slave2
Last login: Tue Aug  1 01:16:10 2023 from localhost
[hadoop@slave2 ~]$ exit
logout
Connection to slave2 closed.

8.配置slave1免密登录master和slave2(在slave1节点上执行)

su - hadoop
ssh-copy-id hadoop@master
ssh-copy-id hadoop@slave2
ssh master
exit
ssh slave2
exit

[hadoop@slave1 ~]$ ssh-copy-id hadoop@master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (192.168.10.10)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@master'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@slave1 ~]$ ssh-copy-id hadoop@slave2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (192.168.10.30)' can't be established.
ECDSA key fingerprint is SHA256:QBjEXeaZrT85SImnqv6pCTcenLldfKXWqfZ3RbA8F1g.
ECDSA key fingerprint is MD5:9f:30:fe:d3:da:9a:30:cc:da:2a:28:8e:e5:9c:85:b5.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@slave2'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@slave1 ~]$ ssh master
Last login: Tue Aug  1 01:06:44 2023 from localhost
[hadoop@master ~]$ exit
logout
Connection to master closed.
[hadoop@slave1 ~]$ ssh slave2
Last login: Tue Aug  1 01:19:01 2023 from master
[hadoop@slave2 ~]$ exit
logout
Connection to slave2 closed.

9.配置slave2免密登录master和slave1(在slave2节点上执行)

su - hadoop
ssh-copy-id hadoop@master
ssh-copy-id hadoop@slave1
ssh master
exit
ssh slave1
exit

[hadoop@slave2 ~]$ ssh-copy-id hadoop@master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (192.168.10.10)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@master'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@slave2 ~]$ ssh-copy-id hadoop@slave1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (192.168.10.20)' can't be established.
ECDSA key fingerprint is SHA256:TqRUR8sR0+eS1KPxHT8o5+f63+2ev8QNxitrtrTXUhQ.
ECDSA key fingerprint is MD5:43:0a:1e:e3:e4:a8:df:7e:98:03:8d:e9:0f:16:d7:1c.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave1's password: 


Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@slave1'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@slave2 ~]$ ssh master
Last login: Tue Aug  1 01:16:12 2023 from slave1
[hadoop@master ~]$ exit
logout
Connection to master closed.
[hadoop@slave2 ~]$ ssh slave1
Last login: Tue Aug  1 01:18:52 2023 from master
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.

二、Hadoop 全分布配置

1.安装hadoop(在master节点上执行)

[root@master ~]# tar xf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/
[root@master ~]# tar xf hadoop-2.7.1.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# ls
hadoop-2.7.1  jdk1.8.0_152
[root@master src]# mv jdk1.8.0_152 jdk
[root@master src]# mv hadoop-2.7.1 hadoop
[root@master src]# vi /etc/profile.d/hadoop.sh
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

保存以上配置后执行以下命令
[root@master src]# source /etc/profile.d/hadoop.sh
[root@master src]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/src/jdk

2.配置hdfs-site.xml文件参数(在master上执行)

[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/usr/local/src/hadoop/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/usr/local/src/hadoop/dfs/data</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
</configuration>

保存以上配置后执行以下命令
[root@master src]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}

3.配置core-site.xml文件参数(在master上执行)

[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master:9000</value>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/usr/local/src/hadoop/tmp</value>
	</property>
</configuration>
保存以上配置后执行以下命令
[root@master src]# mkdir -p /usr/local/src/hadoop/tmp

4.配置mapred-site.xml文件参数(在master上执行)

[root@master src]# cd /usr/local/src/hadoop/etc/hadoop
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>master:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>master:19888</value>
	</property>
</configuration>

5.配置yarn-site.xml文件参数(在master上执行)

[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
<configuration>
	<property>
		<name>arn.resourcemanager.address</name>
		<value>master:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>master:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>master:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>master:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>master:8033</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>

6.在master上执行以下命令并保存

[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/masters
192.168.10.10
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/slaves
192.168.10.20
192.168.10.30

保存后执行以下命令

[root@master hadoop]# chown -R hadoop.hadoop /usr/local/src
[root@master hadoop]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Aug  1 17:45 hadoop
drwxr-xr-x  8 hadoop hadoop 255 Sep 14  2017 jdk

7.同步/usr/local/src/目录下所有文件至所有slave节点

scp -r /usr/local/src/* root@slave1:/usr/local/src/
scp -r /usr/local/src/* root@slave2:/usr/local/src/
[root@master hadoop]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
hadoop.sh                                                  100%  151   125.0KB/s   00:00    
[root@master hadoop]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
hadoop.sh                                                  100%  151   103.8KB/s   00:00  

在所有slave节点上执行以下命令

[root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src
[root@slave1 ~]# ll /usr/local/src/
total 0
drwxr-xr-x. 11 hadoop hadoop 171 Aug  1 17:55 hadoop
drwxr-xr-x.  8 hadoop hadoop 255 Aug  1 17:55 jdk
[root@slave1 ~]# source /etc/profile.d/hadoop.sh
[root@slave1 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

三、Hadoop 集群运行

将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式
化,否则会缺失 DataNode 进程。另外,只要运行过 HDFS,Hadoop 的工作目录(本书设置为
/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删
除工作目录下的数据,否则格式化时会出问题

执行如下命令,格式化 NameNode

[root@master ~]# su – hadoop
[hadoop@master ~]# cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ bin/hdfs namenode –format
出现以下界面则表示成功
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.10.10
************************************************************/

master启动 NameNode,并用jps查看进程

[hadoop@master hadoop]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
[hadoop@master hadoop]$ jps
2084 NameNode
2125 Jps

slave 启动 DataNode

[root@slave1 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave1.example.com.out
[root@slave1 ~]# jps
9857 DataNode
9925 Jps

[root@slave2 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave2.example.com.out
[root@slave2 ~]# jps
1925 Jps
1836 DataNode

master启动 SecondaryNameNode

[hadoop@master hadoop]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
[hadoop@master hadoop]$ jps
2161 SecondaryNameNode
2201 Jps

查看到有 NameNode 和 SecondaryNameNode 两个进程,就表明 HDFS 启动成功

查看 HDFS 数据存放位置

[hadoop@master hadoop]$ ll dfs/
total 0
drwxr-xr-x 2 hadoop hadoop 6 Aug  1 17:43 data
drwxr-xr-x 2 hadoop hadoop 6 Aug  1 18:09 name

可以看出 HDFS 的数据保存在/usr/local/src/hadoop/dfs 目录下,NameNode、DataNode
和/usr/local/src/hadoop/tmp/目录下,SecondaryNameNode 各有一个目录存放数据

使用浏览器查看节点状态(需要在windows中配置host文件)

在host文件中添加以下域名
# hadoop
192.168.10.10 master

然后web访问以下网址
http://master:50070,进入页面可以查看NameNode和DataNode信息
http://master:50090,进入页面可以查看 SecondaryNameNode信息

四、Hive组件部署

基础环境和安装准备

Hive组件需要基于Hadoop系统进行安装。因此,在安装Hive组件前,需要确保Hadoop系统能够正常运行。本章节内容是基于之前已部署完毕的Hadoop全分布系统,在master节点上实现Hive组件安装。

Hive组件的部署规划和软件包路径如下:

(1)当前环境中已安装Hadoop全分布系统。

(2)本地安装MySQL数据库(账号root,密码Password123$),软件包在/opt/software/mysql-5.7.18路径下。

(3)MySQL端口号(3306)。

(4)MySQL的JDBC驱动包/opt/software/mysql-connector-java-5.1.47.jar,在此基础上更新Hive元数据存储。

(5)Hive软件包/opt/software/apache-hive-2.0.0-bin.tar.gz。

安装MySQL

卸载 MariaDB 数据库

查询已安装的 mariadb 软件包

[root@master ~]# rpm -qa | grep mariadb

mariadb-libs-5.5.68-1.el7.x86_64

卸载 mariadb 软件包

[root@master ~]# rpm -e --nodeps mariadb-libs-5.5.68-1.el7.x86_64
warning: /etc/my.cnf saved as /etc/my.cnf.rpmsave

创建/opt/software/目录,并上传所需要的安装包

#安装数据库(在master上操作)
yum -y install unzip
cd /opt/software/
unzip mysql-5.7.18.zip
cd mysql-5.7.18
yum -y install *.rpm

vi /etc/my.cnf
# 将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方

default-storage-engine = innodb
innodb_file_per_table
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8

# 保存后执行以下命令
systemctl enable --now mysqld
systemctl status mysqld
# 执行完以上命令后要看到绿色的running和enabled则表示启动成功
ss -antl
# 执行完以上命令后能看到3306端口号表示成功

查询 MySQL 数据库默认密码。
MySQL 数据库安装后的默认密码保存在/var/log/mysqld.log 文件中,在该文件中以
password 关键字搜索默认密码。

[root@master mysql-5.7.18]# grep 'password' /var/log/mysqld.log
2023-08-02T14:22:36.151487Z 1 [Note] A temporary password is generated for root@localhost: CefpP!lVu0%o

MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不相同。

MySQL 数据库初始化。
执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定
数据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号,
可设定密码为 Password123$。

[root@master mysql-5.7.18]# mysql_secure_installation

在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for
No)表示是否更改 root 用户密码,在键盘输入 y 和回车。
2)Do you wish to continue with the password provided?(Press y|Y for Yes,
any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是
否删除匿名用户,在键盘输入 y 和回车。
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No)
表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示允许 root 用户远程登录。
5)Remove test database and access to it? (Press y|Y for Yes, any other key
for No)表示是否删除测试数据库,在键盘输入 y 和回车。
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No)
表示是否重新加载授权表,在键盘输入 y 和回车。

添加 root 用户从本地和远程访问 MySQL 数据库表单的授权。

[root@master mysql-5.7.18]# mysql -uroot -p
Enter password: 		(输入新密码:Password123$)

mysql> grant all on *.* to root@'localhost' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all on *.* to root@'%' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

到slave1上执行以下命令

yum -y install mariadb
mysql -uroot -p'新密码' -h'master的IP'

如果执行以上命令后可以远程登录成功就表示数据库部署成功

[root@slave1 ~]# mysql -uroot -p'Password123$' -h'192.168.10.10'
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> 

在master上部署hive,解压安装文件

[root@master software]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src 
[root@master software]# cd /usr/local/src/
[root@master src]# mv apache-hive-2.0.0-bin hive
[root@master src]# chown -R hadoop.hadoop /usr/local/src/

vi /etc/profile.d/hive.sh
export HIVE_HOME=/usr/local/src/hive
export PATH=${HIVE_HOME}/bin:$PATH

保存以上配置后执行以下命令

source /etc/profile.d/hive.sh
echo $PATH
su - hadoop
cd /usr/local/src/hive/conf/
cp hive-default.xml.template hive-site.xml

根据官方PDF文档找到对应的配置进行修改

vi hive-site.xml

1)设置 MySQL 数据库连接。
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;us
eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>

2)配置 MySQL 数据库 root 的密码。
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
<description>password to use against s database</description>
</property>

3)验证元数据存储版本一致性。若默认 false,则不用修改。
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
 Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from 
Hive jars. Also disable automatic
 False: Warn if the version information stored in metastore doesn't match 
with one from in Hive jars.
</description>
</property>

4)配置数据库驱动。
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>

6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为
“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
 <name>hive.querylog.location</name>
 <value>/usr/local/src/hive/tmp</value>
 <description>Location of Hive run time structured log 
file</description>
 <name>hive.exec.local.scratchdir</name>
 <value>/usr/local/src/hive/tmp</value>
 <name>hive.downloaded.resources.dir</name>
 <value>/usr/local/src/hive/tmp/resources</value>
 <name>hive.server2.logging.operation.log.location</name>
 <value>/usr/local/src/hive/tmp/operation_logs</value>
 
7)在 Hive 安装目录中创建临时文件夹 tmp。
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
至此,Hive 组件安装和配置完成。

初始化 hive 元数据
1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到
Hive 安装目录的 lib 下;

[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/

2)重新启动 hadooop 即可

[hadoop@master lib]$ stop-all.sh
[hadoop@master lib]$ start-all.sh

执行以上命令后需要在master上看到NameNode、SecondaryNameNode、ResourceManager三个进程,在所有的slave节点上要看到DataNode、NodeManager进程,然后执行以下命令

3)初始化数据库

[hadoop@master ~]$schematool -initSchema -dbType mysql
如果看到schemaTool completed则表示初始化成功,此时可以连接到数据库中查看是否有hive仓库

[root@master ~]# schematool -initSchema -dbType mysql
which: no hbase in (/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
································
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.mysql.sql
Initialization script completed
schemaTool completed

执行完以下命令后看到hive>则表示成功部署
[root@master ~]# mysql -uroot -p'Password123$' -e 'show databases;'

mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

4)启动 hive

[hadoop@master ~]$ hive

[root@master ~]# hive
which: no hbase in (/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
································
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

五、(zookeeper)实验步骤:

1.配置时间同步(在所有节点上执行)

[root@master ~]# yum -y install chrony
[root@master ~]# vi /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool time1.aliyun.com iburst			#添加阿里云时间服务器

保存以上配置后执行以下命令

执行以下命令后如果看到running则表示成功

[root@master ~]# systemctl enable --now chronyd
[root@master ~]# systemctl status chronyd

2.部署zookeeper(在master上操作)

使用xftp上传软件包至/opt/software

[root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv zookeeper-3.4.8 zookeeper
[root@master src]# cd /usr/local/src/zookeeper/
[root@master zookeeper]# mkdir data logs
[root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid
[root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
[root@master conf]# cp zoo_sample.cfg zoo.cfg
[root@master conf]# vi zoo.cfg
dataDir=/usr/local/src/zookeeper/data
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

保存后执行以下命令

[root@master conf]# vi /etc/profile.d/zookeeper.sh
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH

保存后执行以下命令

chown -R hadoop.hadoop /usr/local/src/
scp -r /usr/local/src/zookeeper slave1:/usr/local/src/
scp -r /usr/local/src/zookeeper slave2:/usr/local/src/
[root@master conf]# scp /etc/profile.d/zookeeper.sh slave1:/etc/profile.d/
zookeeper.sh                                                      100%   87    79.8KB/s   00:00    
[root@master conf]# scp /etc/profile.d/zookeeper.sh slave2:/etc/profile.d/
zookeeper.sh                                                      100%   87    82.3KB/s   00:00    

在所有slave节点上执行以下命令

chown -R hadoop.hadoop /usr/local/src/
ll /usr/local/src/

在slave1上执行以下命令

echo '2' > /usr/local/src/zookeeper/data/myid

在slave2上执行以下命令

echo '3' > /usr/local/src/zookeeper/data/myid

3.启动zookeeper(在所有节点上执行)

su - hadoop
jps
zkServer.sh start

执行完以上命令后要在每个节点上看到QuorumPeerMain进程才表示成功部署

zkServer.sh status

要确保能够看到1个leader,2个follower才表示启动成功

六、HBase实验步骤:

部署HBase(在master上操作)

使用xftp上传软件包至/opt/software

[root@master ~]$ tar xf /opt/software/hbase-1.2.1-bin.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv hbase-1.2.1 hbase
[root@master src]# ls
hadoop  hbase  hive  jdk  zookeeper
[root@master src]# vi /etc/profile.d/hbase.sh
export HBASE_HOME=/usr/local/src/hbase
export PATH=${HBASE_HOME}/bin:$PATH

保存以上配置后执行以下命令

执行以下命令后如果能看到环境变量中有hbase的路径则表示成功

[root@master src]# source /etc/profile.d/hbase.sh
[root@master src]# echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

3.配置HBase(在master上操作)

[root@master src]# cd /usr/local/src/hbase/conf/
[root@master conf]# vi hbase-env.sh
export JAVA_HOME=/usr/local/src/jdk
export HBASE_MANAGES_ZK=false
export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/

保存以上配置后执行以下命令

vi hbase-site.xml
<property>
	<name>hbase.rootdir</name>
	<value>hdfs://master:9000/hbase</value>
</property>
<property>
	<name>hbase.master.info.port</name>
	<value>60010</value>
</property>
<property>
	<name>hbase.zookeeper.property.clientPort</name>
	<value>2181</value>
</property>
<property>
	<name>zookeeper.session.timeout</name>
	<value>10000</value>
</property>
<property>
	<name>hbase.zookeeper.quorum</name>
	<value>master,slave1,slave2</value>
</property>
<property>
	<name>hbase.tmp.dir</name>
	<value>/usr/local/src/hbase/tmp</value>
</property>
<property>
	<name>hbase.cluster.distributed</name>
	<value>true</value>
</property>

保存以上配置后执行以下命令

mkdir -p /usr/local/src/hbase/tmp
vi regionservers
192.168.10.20
192.168.10.30

保存以上配置后执行以下命令

scp -r /usr/local/src/hbase slave1:/usr/local/src/
scp -r /usr/local/src/hbase slave2:/usr/local/src/
[root@master conf]# scp /etc/profile.d/hbase.sh slave1:/etc/profile.d/
hbase.sh                                                          100%   75    38.9KB/s   00:00    
[root@master conf]# scp /etc/profile.d/hbase.sh slave2:/etc/profile.d/
hbase.sh                                                          100%   75    58.5KB/s   00:00   

在所有节点(包括master)上执行以下命令

chown -R hadoop.hadoop /usr/local/src
ll /usr/local/src/
su - hadoop

4.启动hbase(在master上执行)

在所有节点上执行以下命令

zkServer.sh start

执行以上命令后要在看到有QuorumPeerMain进程

在master上启动分布式hadoop集群

start-all.sh

master
[root@master conf]# jps
1538 QuorumPeerMain
2181 ResourceManager
1818 NameNode
2443 Jps
2012 SecondaryNameNode

slave1
[root@slave1 ~]# jps
1911 NodeManager
1639 QuorumPeerMain
1801 DataNode
2042 Jps

slave2
[root@slave2 ~]# jps
1825 DataNode
2066 Jps
1640 QuorumPeerMain
1935 NodeManager

执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程

start-hbase.sh

master
[root@master conf]# jps
1538 QuorumPeerMain
2181 ResourceManager
1818 NameNode
2810 Jps
2587 HMaster
2012 SecondaryNameNode

slave1
[root@slave1 ~]# jps
1911 NodeManager
1639 QuorumPeerMain
1801 DataNode
2089 HRegionServer
2300 Jps

slave2
[root@slave2 ~]# jps
2112 HRegionServer
1825 DataNode
2321 Jps
1640 QuorumPeerMain
1935 NodeManager

执行以上命令后要确保master上有QuorumPeerMain、HMaster进程,slave节点上要有QuorumPeerMain、HRegionServer进程

在windows主机上执行:
在C:\windows\system32\drivers\etc\下面把hosts文件拖到桌面上,然后编辑它加入master的主机名与IP地址的映射关系后在浏览器上输入http://master:60010访问hbase的web界面

1691393022770

启动Hbase

[root@master ~]# su hadoop
[hadoop@master root]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016

hbase(main):001:0> 

5.hbase语法应用(在master上执行)

su - hadoop
hbase shell
# 创建一张名为scores的表,表内有两个列簇

hbase(main):001:0> create 'scores','grade','course'

# 查看hbase状态

hbase(main):002:0> status

# 查看数据库版本

hbase(main):003:0> version

# 查看表

hbase(main):004:0> list

# 插入记录 1:jie,grade: 143cloud

hbase(main):005:0> put 'scores','jie','grade:','146cloud'

# 插入记录 2:jie,course:math,86

hbase(main):006:0> put 'scores','jie','course:math','86'

# 插入记录 3:jie,course:cloud,92

hbase(main):007:0> put 'scores','jie','course:cloud','92'

# 插入记录 4:shi,grade:133soft

hbase(main):008:0> put 'scores','shi','grade:','133soft'

# 插入记录 5:shi,grade:math,87

hbase(main):009:0> put 'scores','shi','course:math','87'

# 插入记录 6:shi,grade:cloud,96

hbase(main):010:0>  put 'scores','shi','course:cloud','96'

# 读取jie的记录

hbase(main):011:0> get 'scores','jie'

# 读取jie的班级

hbase(main):012:0> get 'scores','jie','grade'

# 查看整个表记录

hbase(main):013:0> scan 'scores'

# 按例查看表记录

hbase(main):014:0> scan 'scores',{COLUMNS=>'course'}

# 删除指定记录

hbase(main):016:0> delete 'scores','shi','grade'

# 增加新的名为age的列簇

hbase(main):019:0> alter 'scores',NAME=>'age'

# 查看表结构

hbase(main):021:0> describe 'scores'

# 删除名为age的列簇

hbase(main):023:0> alter 'scores',NAME=>'age',METHOD=>'delete'

# 删除表

hbase(main):025:0> disable 'scores'
hbase(main):026:0> drop 'scores'
hbase(main):027:0> list

# 退出hbase

hbase(main):028:0> quit

关闭hbase

stop-hbase.sh
jps

七、sqoop组件部署

使用xftp将软件包上传到/opt/software

部署sqoop(在master上执行以下命令)

[root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
[root@master src]# ls
hadoop  hbase  hive  jdk  sqoop  zookeeper

# 创建 Sqoop 的配置文件 sqoop-env.sh。
[root@master src]# cd /usr/local/src/sqoop/conf/
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh

# 在文件末添加以下环境变量
[root@master conf]# vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
export HBASE_HOME=/usr/local/src/hbase
export HIVE_HOME=/usr/local/src/hive

# 保存以上配置后执行以下命令
[root@master conf]# vi /etc/profile.d/sqoop.sh
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

# 保存以上配置后执行以下命令
[root@master conf]# source /etc/profile.d/sqoop.sh
[root@master conf]# echo $PATH
/usr/local/src/sqoop/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/

# 启动sqoop集群(在master上执行)
su - hadoop
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程

# 启动sqoop集群(在master上执行)
su - hadoop
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程
sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
# 如果以上命令执行后能看到以下5个数据库则表示启动成功,且能正常连接数据库
information_schema
hive
mysql
performance_schema
sys

# 连接hive配置(在master上执行)
配置Sqoop能够连接Hive,需要将hive组件/usr/local/src/hive/lib 目录下的hive-common-1.1.0.jar也放入Sqoop安装路径的lib目录中。
[hadoop@master ~]$ cp /usr/local/src/hive/lib/hive-common-2.0.0.jar  /usr/local/src/sqoop/lib/

[hadoop@master ~]$ mysql -uroot -pPassword123$

mysql> create database sample;
Query OK, 1 row affected (0.00 sec)

mysql> use sample;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)

mysql> quit;
Bye
# 如果能看到以上三条记录则表示数据库中表创建成功

# 在 Hive 中创建 sample 数据库和 student 数据表
hive> create database sample;
OK
Time taken: 0.718 seconds
hive> use sample;
OK
Time taken: 0.019 seconds
hive> create table student(number STRING,name STRING);
OK
Time taken: 0.273 seconds
hive> exit;

# 从 Hive 导出数据,导入到 MySQL
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student

八、Flume 组件安装配置

1.部署flume组件(在master上执行)

# 使用xftp上传软件包到/opt/software

[root@master ~]# tar xf /opt/software/apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src
[root@master src]# mv apache-flume-1.6.0-bin flume
[root@master src]# chown -R hadoop.hadoop /usr/local/src/
[root@master src]# vi /etc/profile.d/flume.sh
export FLUME_HOME=/usr/local/src/flume
export PATH=${FLUME_HOME}/bin:$PATH

# 保存以上配置后执行以下命令
[root@master src]# su - hadoop
Last login: Mon Sep  4 16:05:51 CST 2023 on pts/0
[hadoop@master ~]$ echo $PATH
/usr/local/src/zookeeper/bin:/usr/local/src/sqoop/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/flume/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin
# 如果能看到flume的安装路径则表示环境变量设置没问题

# 保存以上配置后执行以下命令
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,在slave节点上要能看到DataNode、NodeManager进程

# 保存以上配置后执行以下命令
hdfs dfs -rm -r /tmp
hdfs dfs -mkdir -p /tmp/flume
hdfs dfs -ls /
# 使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启
动 flume 传输数据(传输几秒钟即可)
flume-ng agent --conf-file simple-hdfs-flume.conf --name a1
hdfs dfs -ls /tmp/flume
# 执行以上命令后如果能看到flumedata的文件则表示成功

标签:src,root,平台,hadoop,Hadoop,master,usr,local,搭建
From: https://www.cnblogs.com/rainlike/p/17677490.html

相关文章

  • 视频监控/视频汇聚/视频云存储EasyCVR平台接入国标GB协议后出现断流情况,该如何解决?
    视频监控汇聚平台EasyCVR可拓展性强、视频能力灵活、部署轻快,可支持的主流标准协议有国标GB28181、RTSP/Onvif、RTMP等,以及支持厂家私有协议与SDK接入,包括海康Ehome、海大宇等设备的SDK等。安防监控平台EasyCVR既具备传统安防视频监控的能力,也具备接入AI智能分析的能力,包括对人、......
  • 一键播放功能LiteCVR视频汇聚平台视频调阅模块优化新增可选指定设备播放
    在LiteCVR项目现场中,使用者经常使用视频调阅左侧分组栏的一键播放功能来快速查看指定设备的视频。然而,最近他们发现当他们点击一键播放时,播放的视频并不是他们所期望的指定设备。为了解决这个问题,我们进行了详尽的排查。我们首先检查了代码,并发现了一个错误的判断条件。原来,当使用......
  • 项目开发环境搭建手记(2.Jdk安装——替代原有的OpenJDK)
    作者:fbysss前言:Centos下,Java已经安装好了,1.7版本的,但是并没有JAVA_HOME等环境变量。于是:exportJAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9exportCLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexportPATH=$PATH:$JAVA_HOME......
  • 远光天骏智能研发管理平台之测试管理
    如今面对复杂程度越来越高的软件系统,随之而来的测试要求和任务也越来越繁重,而测试更多的是对产品满足需求情况的测试,因此,在高强度、高频度的测试过程中,难免有需求遗漏、回归测试不充分、缺陷管理不合理等一系列的问题。 测试管理是一个管理测试活动的过程,以确保软件应用被高质......
  • 云原生环境搭建第三篇:Ceph存储
    云原生环境搭建第三篇:Ceph存储原创 杨同港 者行花 2023-07-1811:43 发表于山东收录于合集#云原生5个GlusterFS在k8s1.25中被移除,所以开源的,社区活跃度高的分布式存储最优选就是ceph,本文使用rook-ceph部署ceph集群。ceph是一个开源的存储平台。它的架构如下:Ceph存......
  • RTMP中视频直播点播平台LntonMedia该如何使用接口调用实现截取视频的功能
    LntonMedia视频直播点播平台是一款集视频直播、点播、转码、管理、录像、检索、时移回看等功能于一体的平台。该平台可以提供音视频采集、视频推拉流、播放H.265编码视频、存储和分发功能,适用于各种终端和平台。近期有用户询问关于LntonMedia该如何通口实现视频截取功能,今天我们来......
  • uniapp小程序隐私协议弹窗组件。自2023年9月15日起,对于涉及处理用户个人信息的小程序
    上代码 隐私组件代码直接复制就能用 <template> <viewclass="zero-privacy":class="[{'zero-bottom':position=='bottom'}]"v-if="showPrivacy"> <viewclass="zero-privacy-container":style="{&#......
  • 平台工程动态 Monthly News 2023-8
    了解最新行业动态,洞察平台工程本质。平台工程月度动态2023-8   注:您所阅读的内容来自平台工程社区基于网络公开资料整理推荐,如您希望自己的内容也出现在月度动态,欢迎一起参与,详见文末。本期内容预览:新闻速递|中国信通院发布铸基计划TISC企业级平台工程综合能力要......
  • 全开源风车im源码(前端uniapp可发布H5及app/后端java含视频搭建教程)
    互联网彻底改变了我们的沟通方式,电子邮件是迄今为止采用最快的通信形式。不到二十年前,还没有多少人听说过它。现在,我们中的许多人都用电子邮件而不是写信,甚至打电话给别人,世界各地的人们每天发送数十亿封电子邮件。源码:ms.jstxym.top但有时甚至电子邮件也不够快。您可能不知道您......
  • Windows下平台release debug下内存释放的差异
    今天遇到了这个问题,代码如下: inttest1(int*n,int**constbodys){   if(n==nullptr||bodys==nullptr)   {      return-1;   }   *n=3;   std::vector<int>nums={1,2,3};   *bodys=&nums[0];   return0;}intmain(voi......