HDFS集群搭建:伪分布式
参考网址:hadoop官网
前期准备:JAVA环境 + SSH,hadoop用java开发,java移动性好,C++移植性好。
问题:ssh远程登录有个弊端:通过SSH远程登录启动其JVM进程,由于SSH远程执行的时候是不会加载profile文件里面的环境变量的
实操论证:
在node1的profile中创建一个环境变量BIGDATA=hello
,在Node1中打印
在node2节点远程登录node3,并且在node3中创建一个文件夹,去node3查看 ==> 可以查看是OK的
在node2中远程登录node1且打印node1中的环境变量====> 会发现无法打印,即不会加载pfofile环境配置。
那么可以远程链接node1并且加载其环境变量配置:ssh root@192.168.182.111 ‘source etc/profile ; echo $BIGDATA’
意味着虽然每台节点自己虽然可以正常获取环境变量,但是如果是远程的话无法获取环境变量配置信息=>意味着hadoop集群内要手动配置,即java_home变量信息除了告诉操作系统同时也要告诉Hadoop自己。
官网推荐的三种模式:
- 本地独立模式:Local (Standalone) Mode
- 伪分布式模式:Pseudo-Distributed Mode => 角色相同的服务节点
- 全分布式模式:Fully-Distributed Mode => 角色分布在不同的服务节点,企业一般用这个
由于多节点涉及在当前节点操作不同的节点,所以最好在之前为各个节点设置免密操作。
搭建思路:
-
基础设施
-
部署配置
-
初始化运行
-
命令行使用
基础设施
操作系统、环境、网络、必须软件
- 设置IP以及主机名
- 关闭防火墙$selinux
- 设置host映射
- 时间同步
- 安装JDK
- 设置SSH免秘钥
设置IP以及主机名
设置IP地址/网关等
[root@localhost usr]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="c8477fcc-505d-44b4-ae50-fc60d0b43f0d"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.182.111
GATEWAY=192.168.182.2
NETMASK=255.255.255.0
DNS1=114.114.114.114
DNS2=8.8.8.8
设置网卡
[root@localhost usr]# vim /etc/sysconfig/network
# Created by anaconda
NETWORKING=yes
HOSTNAME=node01
设置hosts文件
[root@localhost ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.182.111 node01
192.168.182.112 node02
192.168.182.113 node03
192.168.182.114 node04
关闭防火墙&SElinux
踩坑:
[root@localhost ~]# service iptables stop
Redirecting to /bin/systemctl stop iptables.service
Failed to stop iptables.service: Unit iptables.service not loaded.
[root@localhost ~]# chkconfig iptables off
error reading information on service iptables: No such file or directory
解决方案:
解决方法:
yum install -y iptables-services
实际上,centos7后是使用的基于iptable的systemctl stop firewalld
systemctl stop firewalld
关闭SElinux
SElinux类似就是一种安全机制,假如现在是2023年一关机之后电脑重启可能出现一种现象就是时间回退了,变为2022年,那么根据Linux安全策略模式会触发SELinux,将计算机置为只读模式,将无法修改,SElinux有的版本有有的版本没有,但这毕竟是属于运维的知识,咱先不关注,详情可以参照[SELinux简介](# 附录一、SELinux简介)
SELinux 有三个运行状态,分别是disabled, permissive 和 enforcing
- Disable: 禁用SELinux,不会给任何新资源打Label,如果重新启用的话,将会给资源重新打上Lable,过程会比较缓慢。
- Permissive:如果违反安全策略,并不会真正的执行拒绝操作,替代的方式是记录一条log信息。
- Enforcing: 默认模式,SELinux的正常状态,会实际禁用违反策略的操作
查看当前的运行状态
~]# getenforce
Enforcing
临时改变运行状态为Permissive
# 临时关闭
~]# setenforce 0
~]# getenforce
Permissive
临时改变运行状态为 Enforcing
~]# setenforce 1
~]# getenforce
Enforcing
使用sestatus
可以查看完整的状态信息
~]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 30
长久关闭
[root@localhost usr]# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disable
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
时间同步
由于集群之间的机器涉及到心跳问题,所以必须要做时间同步,不然心跳检测容易出现问题
[root@localhost usr]# yum install -y ntp
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* epel: mirrors.bfsu.edu.cn
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
Resolving Dependencies
........
ntpdate.x86_64 0:4.2.6p5-29.el7.centos.2
Complete!
同步阿里云时间
[root@localhost usr]# vim /etc/ntp.conf
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server ntp1.aliyun.com
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
重启ntpd并且设置为开机运行
[root@localhost ~]# service ntpd start
Redirecting to /bin/systemctl start ntpd.service
[root@localhost ~]# chkconfig ntpd on
Note: Forwarding request to 'systemctl enable ntpd.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
安装JDK
省略,老早之前就装了
记得设置环境变量。
设置SSH免密
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
登录自己都需要密码,肯定不是免密
[root@localhost ~]# ssh localhost
root@localhost's password:
Last login: Sun Feb 26 23:18:48 2023 from ::1
进入ssh目录
[root@localhost ~]# ll -a
total 40
dr-xr-x---. 5 root root 215 Feb 26 23:18 .
dr-xr-xr-x. 17 root root 224 Nov 24 16:57 ..
drwx------. 2 root root 25 Feb 26 23:18 .ssh
创建公钥
[root@localhost .ssh]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:nDSSn3kmPOzwK5iCXmH/Xdh9F6oveIkwXCC1WPDuiHs root@localhost.localdomain
The key's randomart image is:
+---[RSA 2048]----+
| .oo |
| .+o. |
| .+o+ |
| .B * |
| o o.S o . |
| . + oB =o . . .|
| . o = .=.oo.o ..|
|. o +Eo .+.+. . .|
|.. o. o....o. |
+----[SHA256]-----+
[root@localhost .ssh]# ll
total 12
-rw-------. 1 root root 1679 Feb 26 23:25 id_rsa
-rw-r--r--. 1 root root 408 Feb 26 23:25 id_rsa.pub
-rw-r--r--. 1 root root 171 Feb 26 23:18 known_hosts
[root@localhost .ssh]# cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMwB+mHmD89rIP0dIInwHMaToH0ZFWvksw5yfx5uatiRsNzVwk/BihDJYgr7NKZJVgKkCL5TCuTkKFQsL9cf/ypazOOnjTyuFEWm5XOLJbYAVA1T4cOtysSqK9GVC9HeFqk+bz5AGSR4QA3N5UzfQpXBfw5sl1b73qKBmyWkv0LcXRMexSeYYnof9rntOXVyWg7uFR2FTF4Lih+RWnWaMY/alGqjvvQq9lk+cqrvHytn+KNtIDko2PfK9W3K48rHYq27reAxa3YWKAn0qt2/bN2D5OzcbqpOntElvUEcq8uHyUFNTSdkcnawA0zz1IBQH86zms0mCaTKvKY9ZbCnRT root@localhost.localdomain
[root@localhost .ssh]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost .ssh]# ssh localhost
Last login: Sun Feb 26 23:19:00 2023 from ::1
如果A想远程登录B,那么
A本地创建公钥与私钥,同时将公钥追加到B的~/.ssh/authorized_keys
文件里
如果是远程就涉及公钥分发。
部署配置
伪分布式:(单一节点)
-
部署路径
-
配置文件
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- slaves
hadoop配置
[root@localhost opt]# whereis hadoop hadoop: /usr/local/lib/hadoop [root@localhost opt]# cd /usr/local/lib/hadoop [root@localhost hadoop]# ll total 330156 drwxr-xr-x. 9 courage courage 149 Sep 11 2019 hadoop-3.1.3 -rw-rw-r--. 1 courage courage 338075860 Nov 29 04:54 hadoop-3.1.3.tar.gz [root@localhost hadoop]# cd hadoop-3.1.3/ [root@localhost hadoop-3.1.3]# ll total 176 drwxr-xr-x. 2 courage courage 183 Sep 11 2019 bin # hadoop自身运行的一些功能命令 drwxr-xr-x. 3 courage courage 20 Sep 11 2019 etc # 配置 drwxr-xr-x. 2 courage courage 106 Sep 11 2019 include drwxr-xr-x. 3 courage courage 20 Sep 11 2019 lib # 库 drwxr-xr-x. 4 courage courage 288 Sep 11 2019 libexec -rw-rw-r--. 1 courage courage 147145 Sep 4 2019 LICENSE.txt -rw-rw-r--. 1 courage courage 21867 Sep 4 2019 NOTICE.txt -rw-rw-r--. 1 courage courage 1366 Sep 4 2019 README.txt drwxr-xr-x. 3 courage courage 4096 Sep 11 2019 sbin # 与服务器有关的一些服务脚本命令 drwxr-xr-x. 4 courage courage 31 Sep 11 2019 share # 放一些包
为了hadoop到处可以运行,需要为hadoop配置环境变量,即etc/profile,
o
新开一行
[root@localhost bin]# vim /etc/profile
unset i
unset -f pathmunge
export JAVA_HOME=/usr/local/lib/java/jdk1.8.0_212 #jdk安装目录
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export ZK_HOME=/usr/local/lib/zookeeper/apache-zookeeper-3.5.7-bin
export HADOOP_HOME=/usr/local/lib/hadoop/hadoop-3.1.3 # hadoop 1
export HADOOP_BIN=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin # hadoop 2
export ZK_PATH=${ZK_HOME}/bin
export Rust_Home=/root/.cargo
export Rust_PATH=${Rust_Home}/bin
export PATH=$PATH:${JAVA_PATH}:${ZK_PATH}:${Rust_PATH}:${HADOOP_BIN} # hadoop 3
[root@localhost bin]# . /etc/profile
[root@localhost bin]# hdfs
hdfs hdfs.cmd
Hadoop配置
[root@localhost hadoop]# vim hadoop-3.1.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/lib/java/jdk1.8.0_212
# 定义NameNode端口等信息
[root@localhost hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
</configuration>
配置副本个数
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
slaves配置DataNode放的位置,在3.X版本文件名更改为workers
[root@localhost hadoop]# vim workers
node01
secondNameNode配置
[root@localhost hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/var/bigdata/hadoop/local/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/bigdata/hadoop/local/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/var/bigdata/hadoop/local/dfs/secondary</value>
</property>
</configuration>
初始化运行
[root@localhost current]# bin/hdfs namenode -format #初始化hdfs
[root@localhost current]# ll
total 16
-rw-r--r--. 1 root root 391 Feb 28 05:37 fsimage_0000000000000000000
-rw-r--r--. 1 root root 62 Feb 28 05:37 fsimage_0000000000000000000.md5
-rw-r--r--. 1 root root 2 Feb 28 05:37 seen_txid
-rw-r--r--. 1 root root 213 Feb 28 05:37 VERSION
[root@localhost current]# pwd
/var/bigdata/hadoop/local/dfs/name/current
[root@localhost current]# cat VERSION
#Tue Feb 28 05:37:29 PST 2023
namespaceID=1019611069
clusterID=CID-a1b5b123-6b5b-48c4-a876-f27a34349d0b
cTime=1677591449697
storageType=NAME_NODE
blockpoolID=BP-278644475-127.0.0.1-1677591449697
layoutVersion=-64
修改win与hafs的映射
C:\windows\system32\drivers\etc
打开前端界面
创建一个文件,里面填充字符,后面进行
[root@localhost hadoop]# for i in `seq 1000000`;do echo 'hello co│
urage $i' >> data.txt;done
将新建的文件上传到HDFS,同时指定文件块大小
hdfs dfs -D dfs.blocksize=1048576 -put data.txt
[root@localhost subdir0]# pwd
/var/bigdata/hadoop/local/dfs/data/current/BP-278644475-127.0.0.1-1677591449697/current/finalized/subdir0/subdir0
[root@localhost subdir0]# vim blk_1073741828
可以看到hdfs并不关心文件里面的含义,只是根据byte进行切割。
标签:HDFS,rw,hadoop,courage,集群,ssh,root,localhost,分布式 From: https://www.cnblogs.com/Courage129/p/17528364.html