首页 > 其他分享 >我的大数据开发第1章:hadoop 单节点伪集群安装

我的大数据开发第1章:hadoop 单节点伪集群安装

时间:2023-01-27 11:31:52浏览次数:73  
标签:bin hadoop HADOOP hacl 集群 usr HOME 节点


我的大数据开发第1章:hadoop 单节点伪集群安装

在一个节点机器上部署一个hadoop集群,可用于开发环境,简单易行。os为centos7虚拟机(hostname=hacl-node1)。本章包括以下软件的开发(D)或运行环境(R)安装:

0 主机配置

vi /etc/hostname

hacl-node1

vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# add ipv4 and names

192.168.56.111    hacl-node1  zk1  zk2  zk3

vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=hacl-node1

必须能免密码 ssh 登录本机:

ssh localhost

如果提示输入密码,则需要:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 0600 ~/.ssh/authorized_keys

如上操作后,如果 ssh localhost 提示: ssh_exchange_identification: read: Connection reset by peer

增加下面的行到:/etc/hosts.allow

sshd:127.0.0.1:allow

重启服务:service sshd restart

如上操作后,如果 ssh localhost 总是提示输入密码,即使输入正确的密码也报错,检查 /etc/ssh/sshd_config 文件,在最后面添加下句,然后 service sshd reload:

AllowUsers   root

创建程序目录(HADOOP_HOME=/usr/local/apache/hadoop-current):

mkdir -p /usr/local/{java,scala,apache}

创建hadoop数据目录(HADOOP_DATA_PREFIX=/hacl/hadoop):

mkdir -p /hacl/hadoop/{tmp,dfs/{nn,dn,jn},logs}

1 解压安装软件

 

# tar -zxf jdk-8u271-linux-x64.tar.gz -C /usr/local/java/

# tar -zxf scala-2.12.12.tgz -C /usr/local/scala/

# tar -zxf hadoop-3.3.0.tar.gz -C /usr/local/apache/

# tar -zxf apache-zookeeper-3.6.2-bin.tar.gz -C /usr/local/apache/

...

# cp amm-2.12-2.2.0 /usr/local/scala/

# cd /usr/local/java/ && ln -s jdk1.8.0_271 current

# cd /usr/local/scala/ && ln -s scala-2.12.12 current

# cd /usr/local/scala/ && ln -s amm-2.12-2.2.0 amm && chmox +x amm

# cd /usr/local/bin && ln -s /usr/local/scala/amm amm

# cd /usr/local/apache/ && ln -s hadoop-3.3.0 hadoop-current

# cd /usr/local/apache/ && ln -s apache-zookeeper-3.6.2-bin zookeeper-current

...

2 配置环境变量

/etc/profile.d/hacl-env.sh 完整内容如下:

###################################
# hadoop cluster env
# 2020-01-05
###################################
# java8
export JAVA_HOME=/usr/local/java/current
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

# scala
export SCALA_HOME=/usr/local/scala/current
export PATH=$PATH:$SCALA_HOME/bin

#######################
# apache.org softwares
#######################
export APACHE_ROOT=/usr/local/apache

# maven
export M2_HOME=$APACHE_ROOT/maven-current
export M2_CONF_DIR=$M2_HOME/conf
export PATH=$PATH:$M2_HOME/bin

# zookeeper
export ZK_HOME=$APACHE_ROOT/zookeeper-current
export ZK_CONF_DIR=$ZK_HOME/conf
export PATH=$PATH:$ZK_HOME/bin

# kafka
export KAFKA_HOME=$APACHE_ROOT/kafka-current
export KAFKA_CONF_DIR=$KAFKA_HOME/config
export PATH=$PATH:$KAFKA_HOME/bin

# hadoop
export HADOOP_HOME=$APACHE_ROOT/hadoop-current
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_DATA_PREFIX=/hacl/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=`hadoop classpath`

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

# hbase with phoneix
export HBASE_HOME=$APACHE_ROOT/hbase-current
export HBASE_CLI_HOME=$APACHE_ROOT/hbase-client-current
export PHOENIX_HOME=$APACHE_ROOT/phoenix-current
export HBASE_CONF_DIR=$HBASE_HOME/conf
export HBASE_CLI_CONF_DIR=$HBASE_CLI_HOME/conf
export PATH=$PATH:$HBASE_HOME/bin:$HBASE_CLI_HOME/bin:$PHOENIX_HOME/bin

# flink
export FLINK_HOME=$APACHE_ROOT/flink-current
export FLINK_CONF_DIR=$FLINK_HOME/conf
export PATH=$PATH:$FLINK_HOME/bin

使生效:

source /etc/profile.d/hacl-env.sh

3 hadoop单机集群配置

  • $HADOOP_CONF_DIR/hadoop-env.sh

修改日志目录如下:

export HADOOP_LOG_DIR=${HADOOP_DATA_PREFIX}/logs

  • $HADOOP_CONF_DIR/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hacl/hadoop/tmp</value>
</property>
  • $HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hacl/hadoop/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hacl/hadoop/dfs/dn</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hacl/hadoop/dfs/jn</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

4 hadoop单机集群启动

第一次格式化namenode:

hdfs namenode -format

启动:

start-dfs.sh
jps
4101 NameNode
4266 DataNode
4475 SecondaryNameNode

浏览器查看:

​http://hacl-node1:9870​

如果看不到,把防火墙关掉再试:

systemctl stop firewalld.service

关闭:

stop-dfs.sh

5 运行YARN

用yarn运行mapreduce。配置如下:

  • $HADOOP_CONF_DIR/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
  • $HADOOP_CONF_DIR/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

a.启动yarn,查看yarn(b.命令行和c.web),a.关闭yarn:

a. # start-yarn.sh

b. # yarn top

c. # http://hacl-node1:8088

d. # stop-yarn.sh

如何知道服务端口?

首先jps列出服务进程,例如:

[root@hacl-node1 hadoop]# jps
12274 NodeManager
12131 ResourceManager
14020 DataNode
14551 Jps
14217 SecondaryNameNode

查看进程SecondaryNameNode监听的端口:

[root@hacl-node1 hadoop]# ss -tnlp |grep 14217
LISTEN     0      128          *:9868                     *:*                   users:(("java",pid=14217,fd=297))

可见hadoop监听的是9868端口。使用浏览器查看:

​http://hacl-node1:9868​

6 flink on yarn 测试

  • 启动flink

不用作任何配置,直接运行下面的命令:

终端1:

# yarn-session.sh

...

JobManager Web Interface: http://hacl-node1:44818

终端2:

# jps
11206 FlinkYarnSessionCli
11567 YarnSessionClusterEntrypoint

# yarn top

                  APPLICATIONID USER             TYPE      QUEUE PRIOR   #CONT  #RCONT  VCORES RVCORES     MEM    RMEM  VCORESECS    MEMSECS %PROGR       TIME NAME
 application_1609828681311_0002 root       apache flink    default     0       1       0       1       0      2G      0G        556       1113 100.00   00:00:09 Flink session cluster

可见JobManager(YarnSessionClusterEntrypoint)和客户端FlinkYarnSessionCli都启动了。如果想中止yarn-session:

# yarn application -kill

关于flink的完整内容见第2章。

参考

​http://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html​


 

标签:bin,hadoop,HADOOP,hacl,集群,usr,HOME,节点
From: https://blog.51cto.com/mapaware/6024041

相关文章

  • Keepalived高可用集群部署
    KeepAlived目录KeepAlivedKeepAlived安装KeepAlived部署准备工作主备模式节点配置验证正常状态故障故障恢复1+N(一主多备)模式节点配置验证正常状态故障-1故障-2故障恢复......
  • k8s 单节点部署
    k8s单节点部署参考kubeasz:https://github.com/easzlab/kubeasz/blob/master/docs/setup/quickStart.mddashboard:https://github.com/easzlab/kubeasz/blob/master......
  • 不背锅运维:一文搞清楚应用发布到k8s集群的基本流程
    1.使用yaml文件创建资源对象❝每种资源的apiVersion和kind可通过kubectlapi-resources命令进行查看❞tantianran@test-b-k8s-master:~$ kubectl api-resources......
  • ubuntu下安装kafka集群connector
    1.首先安装kafka集群,安装步骤参考链接如下:2.创建安装connector安装目录mkdir-p/kafka/kafka-1/kafka_2.12-2.2.1/connector-pluginmkdir-p/kafka/kafka-2/kafka_2.12......
  • ubuntu下安装zookeeper和kafka伪集群
    1.创建目录mkdir-p/zookeeper/zkp-1/zookeeper/zkp-2/zookeeper/zkp-32.下载zookeeper链接地址:​​​http://mirror.bit.edu.cn/apache/zookeeper/​​​命令下载:c......
  • 刷刷刷 Day 22 | 450. 删除二叉搜索树中的节点
    450.删除二叉搜索树中的节点LeetCode题目要求给定一个二叉搜索树的根节点root和一个值key,删除二叉搜索树中的 key 对应的节点,并保证二叉搜索树的性质不变。返回二......
  • KubeFed集群注册
    下面将cluster01和cluster02这两个成员集群注册至KubeFed,需要先下载和安装kubefedctlCLI。$curl-LOhttps://github.com/kubernetes-sigs/kubefed/releases/download/v......
  • 边缘节点管理
    KubeEdge中对边缘节点的管理有如下3种形式。1)以节点的形式管理边缘计算资源:在云上部署整个系统的控制面,计算资源在边缘都以节点的形式来管理。2)以独立集群的形式管理边缘......
  • Dubbo源码解析-高可用集群
    dubbo源码解析-高可用集群服务集群的概述概述为了避免单点故障,现在的应用通常至少会部署在两台服务器上,这样就组成了集群。集群就是单机的多实例,在多个服务器上部署多个服务......
  • Hadoop MapReduce介绍、官方示例及执行流程Apache Hadoop概述
    Hadoop离线是大数据生态圈的核心与基石,是整个大数据开发的入门。本次分享内容让初学者能高效、快捷掌握Hadoop必备知识,大大缩短Hadoop离线阶段学习时间,下面一起开始今天的学......