Hadoop平台搭建
了解Hadoop
一、Hadoop是什么?
Hadoop是一个由Apache基金会所开发的分布式系统基础架构, 是一个存储系统+计算框架的软件框架。主要解决海量数据存储与计算的问题,是大数据技术中的基石。Hadoop以一种可靠、高效、可伸缩的方式进行数据处理,用户可以在不了解分布式底层细节的情况下,开发分布式程序,用户可以轻松地在Hadoop上开发和运行处理海量数据的应用程序。
二、Hadoop能解决什么问题
1、海量数据存储
HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(High throughput)来访问数据,适合那些有着超大数据集(large data set)的应用程序,它由n台运行着DataNode的机器组成和1台(另外一个standby)运行NameNode进程一起构成。每个DataNode 管理一部分数据,然后NameNode负责管理整个HDFS 集群的信息(存储元数据)。
2、资源管理,调度和分配
Apache Hadoop YARN(Yet Another Resource Negotiator,另一种资源协调者)是一种新的 Hadoop 资源管理器,它是一个通用资源管理系统和调度平台,可为上层应用提供统 一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨 大好处。
一、基础环境准备
1.master、slave1、slave2三台主机上配置以下信息
[root@localhost ~]# cd /etc/sysconfig/network-scripts
[root@localhost network-scripts]# ls
ifcfg-ens33 ifdown-isdn ifdown-tunnel ifup-isdn ifup-Team
ifcfg-lo ifdown-post ifup ifup-plip ifup-TeamPort
ifdown ifdown-ppp ifup-aliases ifup-plusb ifup-tunnel
ifdown-bnep ifdown-routes ifup-bnep ifup-post ifup-wireless
ifdown-eth ifdown-sit ifup-eth ifup-ppp init.ipv6-global
ifdown-ippp ifdown-Team ifup-ippp ifup-routes network-functions
ifdown-ipv6 ifdown-TeamPort ifup-ipv6 ifup-sit network-functions-ipv6
[root@localhost network-scripts]# vi ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="d34f3ba3-f9a4-4669-8b12-6b8712e5e3f0"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.10.10
NETMASK=255.255.255.0
GATEWAY=192.168.10.2
DNS1=8.8.8.8
DNS2=114.114.114.114
在resolv.conf文件中添加如下两行:
vi /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
保存后执行下列命令
[root@localhost network-scripts]# systemctl restart NetworkManager
[root@localhost network-scripts]# ifdown ens33;ifup ens33
重启网络: service network restart
2.设置主机名(在master、slave1、slave2三台主机上配置以下信息)
[root@localhost ~]# hostnamectl set-hostname master.example.com
[root@localhost ~]# bash
[root@master ~]# hostname
master.example.com
3.配置主机映射关系(在master、slave1、slave2三台主机上配置以下信息)
[root@master ~]# vi /etc/hosts
[root@master ~]# cat /etc/hosts
192.168.10.10 master master.example.com
192.168.10.20 slave1 slave1.example.com
192.168.10.30 slave2 slave2.example.com
保存以上配置以后执行以下命令
ping master
ping slave1
ping slave2
关闭防火墙与SELinux(在所有节点执行以下命令)
systemctl disable --now firewalld
vi /etc/selinux/config
SELINUX=disabled
保存配置后执行以下命令
setenforce 0
4.创建用户hadoop(在每个节点上执行)
[root@master ~]# useradd hadoop
[root@master ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
5.配置ssh服务配置文件,开启公钥登录功能(在每个节点上执行)
[root@master ~]# vi /etc/ssh/sshd_config
PubkeyAuthentication yes
保存以上配置以后执行以下命令
[root@master ~]# systemctl restart sshd
6.配置自我免密登录(在每个节点上运行)
切换到hadoop用户并生成密钥对
执行代码
su - hadoop
ssh-keygen -t rsa -P ''
ls -l ~/.ssh
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
cd
ssh localhost
exit
================================================================================================
[root@master ~]# su - hadoop
Last login: Tue Aug 1 00:59:22 CST 2023 on pts/0
[hadoop@master ~]$ ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:iKQ+XKgHmsIvaKcFIWQ/H9s5Ep+Fi4TCvxGF9pfwXuo [email protected]
The key's randomart image is:
+---[RSA 2048]----+
| o .. |
|+ .oo. . |
|.+.=o+o... |
|. ++=oB==. |
|..ooo=+BS |
|o*..o .o. |
|*.=o . |
|o+oo E |
|..+. |
+----[SHA256]-----+
[hadoop@master ~]$ ls -l ~/.ssh
total 8
-rw-------. 1 hadoop hadoop 1675 Aug 1 01:05 id_rsa
-rw-r--r--. 1 hadoop hadoop 407 Aug 1 01:05 id_rsa.pub
[hadoop@master ~]$ cd ~/.ssh
[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@master .ssh]$ chmod 600 authorized_keys
[hadoop@master .ssh]$ cd
[hadoop@master ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Tue Aug 1 01:05:26 2023
[hadoop@master ~]$ exit
logout
Connection to localhost closed.
配置ssh免密登录(root用户)
master节点配置
ssh-keygen -t rsa
ssh-copy-id root@slave1
ssh-copy-id root@slave2
slave1节点配置
ssh-keygen -t rsa
ssh-copy-id root@master
ssh-copy-id root@slave2
slave2节点配置
ssh-keygen -t rsa
ssh-copy-id root@master
ssh-copy-id root@slave1
验证
root模式下,ssh不需要输入各个主机的密码可以成功登录,则ssh配置成功
7.配置master免密登录slave1和slave2节点(以下是hadoop用户免密登录)
在master节点上执行以下命令
su - hadoop
scp ~/.ssh/id_rsa.pub hadoop@slave1:~/
scp ~/.ssh/id_rsa.pub hadoop@slave2:~/
在所有slave节点执行以下命令
su - hadoop
cat id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/authorized_keys
rm -f ~/id_rsa.pub
================================================================================================
示例
[hadoop@slave1 ~]$ cat id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@slave1 ~]$ cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4PgWS5nfgez0QTt27/0KhCEZqtFWoa7zcceDrb+tAKWJDuDrvtzLbumIU29a6Y2ZpjzqwJppnoLSILbra4zNs8x1s32k+xcpUtOGqgdb5UyJUzhuM1qwyB3D8eCZJ4nN8N5GtmiSyqIcz64VLBIanVSZsPFak5xvXZFbdbd7dhKJb64EV4TiExPHmHSMs/0jucp4LgCvNTGalF4WHogpmvyN2ZKNHf4EARutiRSoIV3rxhXeS80p0RSX7Xzik0UhYMUc6VGfnbS4qrbfyEzM9pVRxkZhfnfXaoLnWg8sCj1vXlNS8Z7gT13hIoulw1GZZOsQAVX+DowGlof1T11wB [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCgMLqBYATCZW0zD0tRAVUG8BkIZLq6TzM91YGiT6jNrqkdkF2+qyJymwNIQNodPsqxyEkbr0zMwKTsCXkLA/tbsP1q53Zth6uW0Ls8sT7bqWkVFfXt7MMS7F2cb9Z81pudU0Ze2WvYTHDsqOemapaU6Ux9+bPIU5KTJlEmyfpvSLKSf9zSTAwTZmfmZZOmZdFB0iDH77sM2xXGXUuyTlhXC2wpTtEkwHIMn3kYuVmZvHSm9CN3skbO0x3c8AFyKWcTc13tFgVwGMgaxa+ajyp8Pt1xnub72D0pzOKpEzZmFYCzAiK1luitAZiWbMtRpCmLKkMfFl+dyyvkfWLZrxI7 [email protected]
[hadoop@slave1 ~]$ rm -f ~/id_rsa.pub
在master节点上执行以下命令
su - hadoop
ssh slave1
exit
ssh slave2
exit
[hadoop@master ~]$ ssh slave1
Last login: Tue Aug 1 01:15:25 2023 from localhost
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.
[hadoop@master ~]$ ssh slave2
Last login: Tue Aug 1 01:16:10 2023 from localhost
[hadoop@slave2 ~]$ exit
logout
Connection to slave2 closed.
8.配置slave1免密登录master和slave2(在slave1节点上执行)
su - hadoop
ssh-copy-id hadoop@master
ssh-copy-id hadoop@slave2
ssh master
exit
ssh slave2
exit
[hadoop@slave1 ~]$ ssh-copy-id hadoop@master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (192.168.10.10)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@master'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@slave1 ~]$ ssh-copy-id hadoop@slave2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (192.168.10.30)' can't be established.
ECDSA key fingerprint is SHA256:QBjEXeaZrT85SImnqv6pCTcenLldfKXWqfZ3RbA8F1g.
ECDSA key fingerprint is MD5:9f:30:fe:d3:da:9a:30:cc:da:2a:28:8e:e5:9c:85:b5.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@slave2'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@slave1 ~]$ ssh master
Last login: Tue Aug 1 01:06:44 2023 from localhost
[hadoop@master ~]$ exit
logout
Connection to master closed.
[hadoop@slave1 ~]$ ssh slave2
Last login: Tue Aug 1 01:19:01 2023 from master
[hadoop@slave2 ~]$ exit
logout
Connection to slave2 closed.
9.配置slave2免密登录master和slave1(在slave2节点上执行)
su - hadoop
ssh-copy-id hadoop@master
ssh-copy-id hadoop@slave1
ssh master
exit
ssh slave1
exit
[hadoop@slave2 ~]$ ssh-copy-id hadoop@master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (192.168.10.10)' can't be established.
ECDSA key fingerprint is SHA256:RNa9WLNMVMS6PEYWrlAQrPerMU3UoVdL/C3e9rMgDh8.
ECDSA key fingerprint is MD5:60:4c:35:59:7d:76:45:d0:f8:42:51:1b:6f:f8:a8:ce.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@master'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@slave2 ~]$ ssh-copy-id hadoop@slave1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (192.168.10.20)' can't be established.
ECDSA key fingerprint is SHA256:TqRUR8sR0+eS1KPxHT8o5+f63+2ev8QNxitrtrTXUhQ.
ECDSA key fingerprint is MD5:43:0a:1e:e3:e4:a8:df:7e:98:03:8d:e9:0f:16:d7:1c.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@slave1'"
and check to make sure that only the key(s) you wanted were added.
[hadoop@slave2 ~]$ ssh master
Last login: Tue Aug 1 01:16:12 2023 from slave1
[hadoop@master ~]$ exit
logout
Connection to master closed.
[hadoop@slave2 ~]$ ssh slave1
Last login: Tue Aug 1 01:18:52 2023 from master
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.
二、Hadoop 全分布配置
1.安装hadoop(在master节点上执行)
[root@master ~]# tar xf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/
[root@master ~]# tar xf hadoop-2.7.1.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# ls
hadoop-2.7.1 jdk1.8.0_152
[root@master src]# mv jdk1.8.0_152 jdk
[root@master src]# mv hadoop-2.7.1 hadoop
[root@master src]# vi /etc/profile.d/hadoop.sh
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
保存以上配置后执行以下命令
[root@master src]# source /etc/profile.d/hadoop.sh
[root@master src]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/src/jdk
2.配置hdfs-site.xml文件参数(在master上执行)
[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
保存以上配置后执行以下命令
[root@master src]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}
3.配置core-site.xml文件参数(在master上执行)
[root@master src]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/src/hadoop/tmp</value>
</property>
</configuration>
保存以上配置后执行以下命令
[root@master src]# mkdir -p /usr/local/src/hadoop/tmp
4.配置mapred-site.xml文件参数(在master上执行)
[root@master src]# cd /usr/local/src/hadoop/etc/hadoop
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
5.配置yarn-site.xml文件参数(在master上执行)
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>arn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
6.在master上执行以下命令并保存
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/masters
192.168.10.10
[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/slaves
192.168.10.20
192.168.10.30
保存后执行以下命令
[root@master hadoop]# chown -R hadoop.hadoop /usr/local/src
[root@master hadoop]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Aug 1 17:45 hadoop
drwxr-xr-x 8 hadoop hadoop 255 Sep 14 2017 jdk
7.同步/usr/local/src/目录下所有文件至所有slave节点
scp -r /usr/local/src/* root@slave1:/usr/local/src/
scp -r /usr/local/src/* root@slave2:/usr/local/src/
[root@master hadoop]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
hadoop.sh 100% 151 125.0KB/s 00:00
[root@master hadoop]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
hadoop.sh 100% 151 103.8KB/s 00:00
在所有slave节点上执行以下命令
[root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src
[root@slave1 ~]# ll /usr/local/src/
total 0
drwxr-xr-x. 11 hadoop hadoop 171 Aug 1 17:55 hadoop
drwxr-xr-x. 8 hadoop hadoop 255 Aug 1 17:55 jdk
[root@slave1 ~]# source /etc/profile.d/hadoop.sh
[root@slave1 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
三、Hadoop 集群运行
将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式
化,否则会缺失 DataNode 进程。另外,只要运行过 HDFS,Hadoop 的工作目录(本书设置为
/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删
除工作目录下的数据,否则格式化时会出问题
执行如下命令,格式化 NameNode
[root@master ~]# su – hadoop
[hadoop@master ~]# cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ bin/hdfs namenode –format
出现以下界面则表示成功
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.10.10
************************************************************/
master启动 NameNode,并用jps查看进程
[hadoop@master hadoop]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
[hadoop@master hadoop]$ jps
2084 NameNode
2125 Jps
slave 启动 DataNode
[root@slave1 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave1.example.com.out
[root@slave1 ~]# jps
9857 DataNode
9925 Jps
[root@slave2 ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave2.example.com.out
[root@slave2 ~]# jps
1925 Jps
1836 DataNode
master启动 SecondaryNameNode
[hadoop@master hadoop]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
[hadoop@master hadoop]$ jps
2161 SecondaryNameNode
2201 Jps
查看到有 NameNode 和 SecondaryNameNode 两个进程,就表明 HDFS 启动成功
查看 HDFS 数据存放位置
[hadoop@master hadoop]$ ll dfs/
total 0
drwxr-xr-x 2 hadoop hadoop 6 Aug 1 17:43 data
drwxr-xr-x 2 hadoop hadoop 6 Aug 1 18:09 name
可以看出 HDFS 的数据保存在/usr/local/src/hadoop/dfs 目录下,NameNode、DataNode
和/usr/local/src/hadoop/tmp/目录下,SecondaryNameNode 各有一个目录存放数据
使用浏览器查看节点状态(需要在windows中配置host文件)
在host文件中添加以下域名
# hadoop
192.168.10.10 master
然后web访问以下网址
http://master:50070,进入页面可以查看NameNode和DataNode信息
http://master:50090,进入页面可以查看 SecondaryNameNode信息
四、Hive组件部署
基础环境和安装准备
Hive组件需要基于Hadoop系统进行安装。因此,在安装Hive组件前,需要确保Hadoop系统能够正常运行。本章节内容是基于之前已部署完毕的Hadoop全分布系统,在master节点上实现Hive组件安装。
Hive组件的部署规划和软件包路径如下:
(1)当前环境中已安装Hadoop全分布系统。
(2)本地安装MySQL数据库(账号root,密码Password123$),软件包在/opt/software/mysql-5.7.18路径下。
(3)MySQL端口号(3306)。
(4)MySQL的JDBC驱动包/opt/software/mysql-connector-java-5.1.47.jar,在此基础上更新Hive元数据存储。
(5)Hive软件包/opt/software/apache-hive-2.0.0-bin.tar.gz。
安装MySQL
卸载 MariaDB 数据库
查询已安装的 mariadb 软件包
[root@master ~]# rpm -qa | grep mariadb
mariadb-libs-5.5.68-1.el7.x86_64
卸载 mariadb 软件包
[root@master ~]# rpm -e --nodeps mariadb-libs-5.5.68-1.el7.x86_64
warning: /etc/my.cnf saved as /etc/my.cnf.rpmsave
创建/opt/software/目录,并上传所需要的安装包
#安装数据库(在master上操作)
yum -y install unzip
cd /opt/software/
unzip mysql-5.7.18.zip
cd mysql-5.7.18
yum -y install *.rpm
vi /etc/my.cnf
# 将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方
default-storage-engine = innodb
innodb_file_per_table
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
# 保存后执行以下命令
systemctl enable --now mysqld
systemctl status mysqld
# 执行完以上命令后要看到绿色的running和enabled则表示启动成功
ss -antl
# 执行完以上命令后能看到3306端口号表示成功
查询 MySQL 数据库默认密码。
MySQL 数据库安装后的默认密码保存在/var/log/mysqld.log 文件中,在该文件中以
password 关键字搜索默认密码。
[root@master mysql-5.7.18]# grep 'password' /var/log/mysqld.log
2023-08-02T14:22:36.151487Z 1 [Note] A temporary password is generated for root@localhost: CefpP!lVu0%o
MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不相同。
MySQL 数据库初始化。
执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定
数据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号,
可设定密码为 Password123$。
[root@master mysql-5.7.18]# mysql_secure_installation
在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for
No)表示是否更改 root 用户密码,在键盘输入 y 和回车。
2)Do you wish to continue with the password provided?(Press y|Y for Yes,
any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是
否删除匿名用户,在键盘输入 y 和回车。
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No)
表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示允许 root 用户远程登录。
5)Remove test database and access to it? (Press y|Y for Yes, any other key
for No)表示是否删除测试数据库,在键盘输入 y 和回车。
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No)
表示是否重新加载授权表,在键盘输入 y 和回车。
添加 root 用户从本地和远程访问 MySQL 数据库表单的授权。
[root@master mysql-5.7.18]# mysql -uroot -p
Enter password: (输入新密码:Password123$)
mysql> grant all on *.* to root@'localhost' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all on *.* to root@'%' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye
到slave1上执行以下命令
yum -y install mariadb
mysql -uroot -p'新密码' -h'master的IP'
如果执行以上命令后可以远程登录成功就表示数据库部署成功
[root@slave1 ~]# mysql -uroot -p'Password123$' -h'192.168.10.10'
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.18 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
在master上部署hive,解压安装文件
[root@master software]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src
[root@master software]# cd /usr/local/src/
[root@master src]# mv apache-hive-2.0.0-bin hive
[root@master src]# chown -R hadoop.hadoop /usr/local/src/
vi /etc/profile.d/hive.sh
export HIVE_HOME=/usr/local/src/hive
export PATH=${HIVE_HOME}/bin:$PATH
保存以上配置后执行以下命令
source /etc/profile.d/hive.sh
echo $PATH
su - hadoop
cd /usr/local/src/hive/conf/
cp hive-default.xml.template hive-site.xml
根据官方PDF文档找到对应的配置进行修改
vi hive-site.xml
1)设置 MySQL 数据库连接。
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us
eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
2)配置 MySQL 数据库 root 的密码。
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
<description>password to use against s database</description>
</property>
3)验证元数据存储版本一致性。若默认 false,则不用修改。
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from
Hive jars. Also disable automatic
False: Warn if the version information stored in metastore doesn't match
with one from in Hive jars.
</description>
</property>
4)配置数据库驱动。
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为
“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
<description>Location of Hive run time structured log
file</description>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp</value>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp/resources</value>
<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>
7)在 Hive 安装目录中创建临时文件夹 tmp。
[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
至此,Hive 组件安装和配置完成。
初始化 hive 元数据
1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到
Hive 安装目录的 lib 下;
[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
2)重新启动 hadooop 即可
[hadoop@master lib]$ stop-all.sh
[hadoop@master lib]$ start-all.sh
执行以上命令后需要在master上看到NameNode、SecondaryNameNode、ResourceManager三个进程,在所有的slave节点上要看到DataNode、NodeManager进程,然后执行以下命令
3)初始化数据库
[hadoop@master ~]$schematool -initSchema -dbType mysql
如果看到schemaTool completed则表示初始化成功,此时可以连接到数据库中查看是否有hive仓库
[root@master ~]# schematool -initSchema -dbType mysql
which: no hbase in (/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
································
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.mysql.sql
Initialization script completed
schemaTool completed
执行完以下命令后看到hive>则表示成功部署
[root@master ~]# mysql -uroot -p'Password123$' -e 'show databases;'
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
4)启动 hive
[hadoop@master ~]$ hive
[root@master ~]# hive
which: no hbase in (/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
································
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
五、(zookeeper)实验步骤:
1.配置时间同步(在所有节点上执行)
[root@master ~]# yum -y install chrony
[root@master ~]# vi /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool time1.aliyun.com iburst #添加阿里云时间服务器
保存以上配置后执行以下命令
执行以下命令后如果看到running则表示成功
[root@master ~]# systemctl enable --now chronyd
[root@master ~]# systemctl status chronyd
2.部署zookeeper(在master上操作)
使用xftp上传软件包至/opt/software
[root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv zookeeper-3.4.8 zookeeper
[root@master src]# cd /usr/local/src/zookeeper/
[root@master zookeeper]# mkdir data logs
[root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid
[root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
[root@master conf]# cp zoo_sample.cfg zoo.cfg
[root@master conf]# vi zoo.cfg
dataDir=/usr/local/src/zookeeper/data
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
保存后执行以下命令
[root@master conf]# vi /etc/profile.d/zookeeper.sh
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH
保存后执行以下命令
chown -R hadoop.hadoop /usr/local/src/
scp -r /usr/local/src/zookeeper slave1:/usr/local/src/
scp -r /usr/local/src/zookeeper slave2:/usr/local/src/
[root@master conf]# scp /etc/profile.d/zookeeper.sh slave1:/etc/profile.d/
zookeeper.sh 100% 87 79.8KB/s 00:00
[root@master conf]# scp /etc/profile.d/zookeeper.sh slave2:/etc/profile.d/
zookeeper.sh 100% 87 82.3KB/s 00:00
在所有slave节点上执行以下命令
chown -R hadoop.hadoop /usr/local/src/
ll /usr/local/src/
在slave1上执行以下命令
echo '2' > /usr/local/src/zookeeper/data/myid
在slave2上执行以下命令
echo '3' > /usr/local/src/zookeeper/data/myid
3.启动zookeeper(在所有节点上执行)
su - hadoop
jps
zkServer.sh start
执行完以上命令后要在每个节点上看到QuorumPeerMain进程才表示成功部署
zkServer.sh status
要确保能够看到1个leader,2个follower才表示启动成功
六、HBase实验步骤:
部署HBase(在master上操作)
使用xftp上传软件包至/opt/software
[root@master ~]$ tar xf /opt/software/hbase-1.2.1-bin.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv hbase-1.2.1 hbase
[root@master src]# ls
hadoop hbase hive jdk zookeeper
[root@master src]# vi /etc/profile.d/hbase.sh
export HBASE_HOME=/usr/local/src/hbase
export PATH=${HBASE_HOME}/bin:$PATH
保存以上配置后执行以下命令
执行以下命令后如果能看到环境变量中有hbase的路径则表示成功
[root@master src]# source /etc/profile.d/hbase.sh
[root@master src]# echo $PATH
/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hive/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
3.配置HBase(在master上操作)
[root@master src]# cd /usr/local/src/hbase/conf/
[root@master conf]# vi hbase-env.sh
export JAVA_HOME=/usr/local/src/jdk
export HBASE_MANAGES_ZK=false
export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/
保存以上配置后执行以下命令
vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>10000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/src/hbase/tmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
保存以上配置后执行以下命令
mkdir -p /usr/local/src/hbase/tmp
vi regionservers
192.168.10.20
192.168.10.30
保存以上配置后执行以下命令
scp -r /usr/local/src/hbase slave1:/usr/local/src/
scp -r /usr/local/src/hbase slave2:/usr/local/src/
[root@master conf]# scp /etc/profile.d/hbase.sh slave1:/etc/profile.d/
hbase.sh 100% 75 38.9KB/s 00:00
[root@master conf]# scp /etc/profile.d/hbase.sh slave2:/etc/profile.d/
hbase.sh 100% 75 58.5KB/s 00:00
在所有节点(包括master)上执行以下命令
chown -R hadoop.hadoop /usr/local/src
ll /usr/local/src/
su - hadoop
4.启动hbase(在master上执行)
在所有节点上执行以下命令
zkServer.sh start
执行以上命令后要在看到有QuorumPeerMain进程
在master上启动分布式hadoop集群
start-all.sh
master
[root@master conf]# jps
1538 QuorumPeerMain
2181 ResourceManager
1818 NameNode
2443 Jps
2012 SecondaryNameNode
slave1
[root@slave1 ~]# jps
1911 NodeManager
1639 QuorumPeerMain
1801 DataNode
2042 Jps
slave2
[root@slave2 ~]# jps
1825 DataNode
2066 Jps
1640 QuorumPeerMain
1935 NodeManager
执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程
start-hbase.sh
master
[root@master conf]# jps
1538 QuorumPeerMain
2181 ResourceManager
1818 NameNode
2810 Jps
2587 HMaster
2012 SecondaryNameNode
slave1
[root@slave1 ~]# jps
1911 NodeManager
1639 QuorumPeerMain
1801 DataNode
2089 HRegionServer
2300 Jps
slave2
[root@slave2 ~]# jps
2112 HRegionServer
1825 DataNode
2321 Jps
1640 QuorumPeerMain
1935 NodeManager
执行以上命令后要确保master上有QuorumPeerMain、HMaster进程,slave节点上要有QuorumPeerMain、HRegionServer进程
在windows主机上执行:
在C:\windows\system32\drivers\etc\下面把hosts文件拖到桌面上,然后编辑它加入master的主机名与IP地址的映射关系后在浏览器上输入http://master:60010访问hbase的web界面
启动Hbase
[root@master ~]# su hadoop
[hadoop@master root]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
hbase(main):001:0>
5.hbase语法应用(在master上执行)
su - hadoop
hbase shell
# 创建一张名为scores的表,表内有两个列簇
hbase(main):001:0> create 'scores','grade','course'
# 查看hbase状态
hbase(main):002:0> status
# 查看数据库版本
hbase(main):003:0> version
# 查看表
hbase(main):004:0> list
# 插入记录 1:jie,grade: 143cloud
hbase(main):005:0> put 'scores','jie','grade:','146cloud'
# 插入记录 2:jie,course:math,86
hbase(main):006:0> put 'scores','jie','course:math','86'
# 插入记录 3:jie,course:cloud,92
hbase(main):007:0> put 'scores','jie','course:cloud','92'
# 插入记录 4:shi,grade:133soft
hbase(main):008:0> put 'scores','shi','grade:','133soft'
# 插入记录 5:shi,grade:math,87
hbase(main):009:0> put 'scores','shi','course:math','87'
# 插入记录 6:shi,grade:cloud,96
hbase(main):010:0> put 'scores','shi','course:cloud','96'
# 读取jie的记录
hbase(main):011:0> get 'scores','jie'
# 读取jie的班级
hbase(main):012:0> get 'scores','jie','grade'
# 查看整个表记录
hbase(main):013:0> scan 'scores'
# 按例查看表记录
hbase(main):014:0> scan 'scores',{COLUMNS=>'course'}
# 删除指定记录
hbase(main):016:0> delete 'scores','shi','grade'
# 增加新的名为age的列簇
hbase(main):019:0> alter 'scores',NAME=>'age'
# 查看表结构
hbase(main):021:0> describe 'scores'
# 删除名为age的列簇
hbase(main):023:0> alter 'scores',NAME=>'age',METHOD=>'delete'
# 删除表
hbase(main):025:0> disable 'scores'
hbase(main):026:0> drop 'scores'
hbase(main):027:0> list
# 退出hbase
hbase(main):028:0> quit
关闭hbase
stop-hbase.sh
jps
七、sqoop组件部署
使用xftp将软件包上传到/opt/software
部署sqoop(在master上执行以下命令)
[root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
[root@master src]# ls
hadoop hbase hive jdk sqoop zookeeper
# 创建 Sqoop 的配置文件 sqoop-env.sh。
[root@master src]# cd /usr/local/src/sqoop/conf/
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
# 在文件末添加以下环境变量
[root@master conf]# vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
export HBASE_HOME=/usr/local/src/hbase
export HIVE_HOME=/usr/local/src/hive
# 保存以上配置后执行以下命令
[root@master conf]# vi /etc/profile.d/sqoop.sh
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
# 保存以上配置后执行以下命令
[root@master conf]# source /etc/profile.d/sqoop.sh
[root@master conf]# echo $PATH
/usr/local/src/sqoop/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
[root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
# 启动sqoop集群(在master上执行)
su - hadoop
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程
# 启动sqoop集群(在master上执行)
su - hadoop
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,slave节点上要有DataNode、NodeManager进程
sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
# 如果以上命令执行后能看到以下5个数据库则表示启动成功,且能正常连接数据库
information_schema
hive
mysql
performance_schema
sys
# 连接hive配置(在master上执行)
配置Sqoop能够连接Hive,需要将hive组件/usr/local/src/hive/lib 目录下的hive-common-1.1.0.jar也放入Sqoop安装路径的lib目录中。
[hadoop@master ~]$ cp /usr/local/src/hive/lib/hive-common-2.0.0.jar /usr/local/src/sqoop/lib/
[hadoop@master ~]$ mysql -uroot -pPassword123$
mysql> create database sample;
Query OK, 1 row affected (0.00 sec)
mysql> use sample;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
Query OK, 3 rows affected (0.05 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
mysql> quit;
Bye
# 如果能看到以上三条记录则表示数据库中表创建成功
# 在 Hive 中创建 sample 数据库和 student 数据表
hive> create database sample;
OK
Time taken: 0.718 seconds
hive> use sample;
OK
Time taken: 0.019 seconds
hive> create table student(number STRING,name STRING);
OK
Time taken: 0.273 seconds
hive> exit;
# 从 Hive 导出数据,导入到 MySQL
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student
八、Flume 组件安装配置
1.部署flume组件(在master上执行)
# 使用xftp上传软件包到/opt/software
[root@master ~]# tar xf /opt/software/apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/
[root@master ~]# cd /usr/local/src
[root@master src]# mv apache-flume-1.6.0-bin flume
[root@master src]# chown -R hadoop.hadoop /usr/local/src/
[root@master src]# vi /etc/profile.d/flume.sh
export FLUME_HOME=/usr/local/src/flume
export PATH=${FLUME_HOME}/bin:$PATH
# 保存以上配置后执行以下命令
[root@master src]# su - hadoop
Last login: Mon Sep 4 16:05:51 CST 2023 on pts/0
[hadoop@master ~]$ echo $PATH
/usr/local/src/zookeeper/bin:/usr/local/src/sqoop/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/flume/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin
# 如果能看到flume的安装路径则表示环境变量设置没问题
# 保存以上配置后执行以下命令
start-all.sh
# 执行以上命令后要确保master上有NameNode、SecondaryNameNode、ResourceManager进程,在slave节点上要能看到DataNode、NodeManager进程
# 保存以上配置后执行以下命令
hdfs dfs -rm -r /tmp
hdfs dfs -mkdir -p /tmp/flume
hdfs dfs -ls /
# 使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启
动 flume 传输数据(传输几秒钟即可)
flume-ng agent --conf-file simple-hdfs-flume.conf --name a1
hdfs dfs -ls /tmp/flume
# 执行以上命令后如果能看到flumedata的文件则表示成功
标签:src,root,平台,hadoop,Hadoop,master,usr,local,搭建
From: https://www.cnblogs.com/rainlike/p/17677490.html