标签：opt log 部署 zookeeper kafka 集群 3.1 Kafka 2.13

Kafka集群部署

Kafka集群部署

上一章中已经介绍了Kafka的单一部署，所以这里我们直接开始集群部署
注意事项：
- 集群的数量不是越多越好，最好不要超过 7 个，因为节点越多，集群之间的消息复制需要的时间就越长，整个群组的吞吐量就越低。
- 集群数量最好是单数，因为超过一半故障集群就不能用了，设置为单数容错率更高
本次集群部署在3台服务器上
- 步骤为：
1. 先为所有服务器上安装JDK，配置IP地址和主机名映射(不是新服务器的话，映射一般不用做)
2. 在服务器1上安装配置好kafka
3. 将服务器1上kafka解压后的配置好的整个文件拷贝到其他几台服务器
4. 修改其他几台服务器上的配置
5. 启动各个节点的kafka，测试集群功能

1.1 服务器资源

node	系统	IP	Jdk	Zookeeper	Kafka
node1	centos7	192.168.101.201	jdk1.8.0_333	Zk1	Broker0
node2	centos7	192.168.100.202	jdk1.8.0_333	Zk2	Broker1
node3	centos7	192.168.101.203	jdk1.8.0_333	Zk3	Broker2

1.1.1 安装JDK(所有设备)

安装kafka前需要先安装jdk，按照 kafka的安装和使用中的步骤安装即可

1.1.2 配置ip和主机名映射(所有服务器)(可不做)

一般不是新服务器的话，不用做这步

# node1
vim /etc/hosts
192.168.101.201 node1
# node2
vim /etc/hosts
192.168.100.202 node2
# node3
vim /etc/hosts
192.168.101.203 node3

1.1.3 配置主机名(所有设备)(可不做)

一般不是新服务器的话，不用做这步

两种方法：
方法一：
vim /etc/sysconfig/network

NETWORKING=yes
hostname=主机名

方法二：
hostnamectl set-hostname 主机名


注意：上面两种方法都需重启设备使配置生效：init 6 或 reboot

1.2 在node1上安装、配置kafka

1.2.1 安装kafka

按照 kafka的安装和使用中的步骤，下载kafka，这里下载的kafka版本为kafka_2.13-3.1.1
将下载好的二进制包放到 /opt/ 目录下，解压之后的目录结构为/opt/kafka_2.13-3.1.1

tar -zxvf kafka_2.13-3.1.1.tgz -C /opt/

1.2.2 修改配置文件

按照你自己的业务需求配置，这里只作参考

# cd到下面的目录，修改配置文件server.properties
cd /opt/kafka_2.13-3.1.1/config/

1.2.2.1 修改zookeeper.properties

#dataDir是zookeeper持久化数据存放的目录
dataDir=/opt/var/kafka_2.13-3.1.1/zookeeper/data
#zookeeper日志文件
dataLogDir=/opt/var/log/kafka_2.13-3.1.1/zookeeper-logs
clientPort=2181
maxClientCnxns=100
#配置单元时间。这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔，也就是每个 tickTime 时间就会发送一个心跳。
tickTime=20
#节点的初始化时间。这里指的是Zookeeper服务器集群中连接到Leader的Follower服务器，当已经超过指定的心跳的时间长度后，zookeeper 服务器还没有收到客户端的返回信息，那么表明这个客户端连接失败。该参数是参数tickTime的5倍，也就是说总的时间长度就是 10*2000=20 秒
initLimit=10
#心跳最大延迟周期。这个配置项标识 Leader 与Follower 之间发送消息，请求和应答时间长度，最长不能超过多少个 tickTime 的时间长度，总的时间长度就是5*2000=10秒
syncLimit=5

# 上面1.1.2步骤中，做不做主机和ip的映射都可以直接写ip:2888:3888的格式。只有做了映射的可以简写成如server.1=node1:2888:3888
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

1.2.2.2 配置Zookeeper的id

cd /opt/var/kafka_2.13-3.1.1/zookeeper/data

# 创建myid文件
vim myid


# myid对应配置文件zookeeper.properties里相应的server号
例如下面的配置中：node1的myid就是1，node2是2，node3则是3
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

1.2.2.3 修改server.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# This configuration file is intended for use in ZK-based mode, where Apache ZooKeeper is required.
# See kafka.server.KafkaConfig for additional details and defaults
#

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
#当前机器在集群中的唯一标识，每一台都不一样，和zookeeper的myid性质一样
broker.id=0

############################# Socket Server Settings #############################

# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
# 指定监听的地址及端口号，该配置项是指定内网ip
#listeners=PLAINTEXT://:9092
listeners=PLAINTEXT://192.168.101.201:9092

# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
# 如果需要开放外网访问，则在该配置项指定外网ip
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL


# The number of threads that the server uses for receiving requests from the network and sending responses to the network
#broker通过网络接收请求和发送响应的线程数

num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
#broker进行I/O处理的线程数
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
#发送缓冲区buffer大小，数据不是一下子就发送的，先回存储到缓冲区了到达一定的大小后在发送，能提高性能
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
#kafka接收缓冲区大小，当数据到达一定大小后在序列化到磁盘
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
#这个参数是向kafka请求消息或者向kafka发送消息的请请求的最大数，这个值不能超过java的堆栈大小
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
#消息存放的目录，这个目录可以配置为“，”逗号分割的表达式，上面的num.io.threads要大于这个目录的个数，如果配置多个目录，新创建的topic他把消息持久化的地方是，当前以逗号分割的目录中，那个分区数最少就放那一个
log.dirs=/opt/var/log/kafka_2.13-3.1.1/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
#默认的分区数，一个topic默认1个分区数
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
#用来恢复和刷新data下数据的线程数
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
#每个topic创建时的副本数，默认是1，生产建议大于1，比如3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
#默认消息的最大持久化时间，168小时，7天
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
#kafka的消息是以追加的形式落地到文件，每个segment文件大小，当超过这个值的时候，kafka会新起一个文件，默认是1G
#log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
#每隔300000毫秒去检查上面配置的log失效时间（log.retention.hours=168 ），到目录查看是否有过期的消息如果有，删除
log.retention.check.interval.ms=300000
############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=192.168.101.201:2181,192.168.100.202:2181,192.168.101.203:2181  # 上面1.1.2步骤中，做不做主机和ip的映射都可以直接写ip:port的格式。只有做了映射的可以简写成 zookeeper.connect=node1:2181,node2:2181,node3:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

1.2.2.4 手动创建目录

注意：上面配置文件里的目录，配置完成后要手动创建好

# kafka日志目录
mkdir -p /opt/var/log/kafka_2.13-3.1.1/kafka-logs

# zookeeper数据目录
mkdir -p /opt/var/kafka_2.13-3.1.1/zookeeper/data

# zookeeper日志目录
mkdir -p /opt/var/log/kafka_2.13-3.1.1/zookeeper-logs

1.2.3 创建kafka用户

其实也可以不用创建单独的kafka用户，一般的非root账户安装配置kafka也行，但是注意确保kafka的安装目录和日志目录，还有zookeeper的data目录和日志目录为同一个用户和用户组。
其他服务器上的kafka最好也和node1的用户、用户组相同

sudo groupadd kafka
sudo useradd  -g kafka
chown -R kafka:kafka /opt/kafka_2.13-3.1.1/  # kafka安装目录的授权
chown -R kafka:kafka /opt/var/log/kafka_2.13-3.1.1/  # kafka和zookeeper的日志目录的授权
chown -R kafka:kafka /opt/var/kafka_2.13-3.1.1  # zookeeper数据目录的授权

1.4 配置其他服务器

将上面node1的kafka安装目录和拷贝到node2、node3上

scp -r /opt/kafka_2.13-3.1.1/ [email protected]:/opt/
scp -r /opt/kafka_2.13-3.1.1/ [email protected]:/opt/

node2、node3节点上需要修改的几个地方就是：

zookeeper的myid文件，kafka配置文件中的broker.id，listeners，advertised.listeners

创建用户和目录并授权

1. 创建目录
mkdir -p /opt/var/log/kafka_2.13-3.1.1/kafka-logs

mkdir -p /opt/var/kafka_2.13-3.1.1/zookeeper/data

mkdir -p /opt/var/log/kafka_2.13-3.1.1/zookeeper-logs

2. 创建用户并将目录给用户授权
sudo groupadd kafka
sudo useradd  -g kafka
chown -R kafka:kafka /opt/kafka_2.13-3.1.1/  # kafka安装目录的授权
chown -R kafka:kafka /opt/var/log/kafka_2.13-3.1.1/  # kafka和zookeeper的日志目录的授权
chown -R kafka:kafka /opt/var/kafka_2.13-3.1.1  # zookeeper数据目录的授权

1.5 配置环境变量(所有设备)(可不做)

配置环境变量只是为了让以后的用命令操作kafka更便捷，但是不配置的话，直接写全路径即可

vim /etc/profile

#KAFKA_HOME
export KAFKA_HOME=/opt/kafka_2.13-3.1.1
export PATH=$PATH:$KAFKA_HOME/bin

source /etc/profile  # 重新加载环境变量，使之生效

1.6 启动zookeeper和Kafka(所有设备)

kafka的安装目录/opt/kafka_2.13-3.1.1/bin下，有启动、停止zookeeper和kafka的脚本

1.6.1 启动zookeeper

分别启动（可以先前台启动，看看启动会是否成功，再用后台启动）
注：前台启动第一个的时候会有连接不到后两个的警告，应该是正常的，因为后两个还没启

# 前台启动
cd进入kafka安装目录
bin/zookeeper-server-start.sh config/zookeeper.properties  # 配置了环境变量即可省略/opt/kafka_2.13-3.1.1/bin，但是写全路径肯定不会错

# 后台启动命令，有两种
cd进入kafka安装目录

bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
#或者
nohup bin/zookeeper-server-start.sh config/zookeeper.properties &

1.6.2 启动kafka

分别在每台机器上启动（可以先前台启动，看看启动会是否成功，再用后台启动）

# 前台启动
cd进入kafka安装目录
bin/kafka-server-start.sh config/server.properties  # 配置了环境变量即可省略/opt/kafka_2.13-3.1.1/bin，但是写全路径肯定不会错

# 后台启动命令
cd进入kafka安装目录
bin/kafka-server-start.sh -daemon config/server.properties
#或者
nohup bin/kafka-server-start.sh config/server.properties &

1.7 测试

1.7.1 简单测试

验证Kafka是否启动成功，在所有服务器上执行 jps 命令：

jps  # jps命令是java提供的一个显示当前所有java进程pid的命令，适合在linux/unix平台上简单察看当前java进程的一些简单情况

# 输出如下，有Kafka，表示启动成功
11027 QuorumPeerMain
12263 Kafka
12347 Jps

1.7.2 生产消费测试

添加一个topic
起一个终端创建一个消费者
在任意服务器上新起一个终端创建一个生产者，并输入一个消息
若在刚才的消费者终端看到我们刚才输入的消息，即表示kafka运行正常

1.8 扩展(配置systemctl)

这项可做可不做，主要目的是用来制作开机自启的

1.8.1 制作kafka.service

vim /etc/systemd/system/kafka.service


[Unit]
Description=kafka
After=network.target

[Service]
Type=simple
LimitNOFILE=65535
LimitNPROC=65535
Environment=JAVA_HOME=/usr/local/jdk1.8.0_333  # 写你自己的jdk路径
User=kafka  # 写你自己安装kafka的用户
Group=kafka  # 写你自己安装kafka的用户，其所属用户组
ExecStart=/opt/kafka_2.13-3.1.1/bin/kafka-server-start.sh  /opt/kafka_2.13-3.1.1/config/server.properties
ExecStop=/opt/kafka_2.13-3.1.1/bin/kafka-server-stop.sh
Restart=always
[Install]
WantedBy=multi-user.target

1.8.2 加入开机自启服务

systemctl enable kafka
systemctl start kafka

标签：opt,log,部署,zookeeper,kafka,集群,3.1,Kafka,2.13
From： https://www.cnblogs.com/Mcoming/p/18087677

Kafka集群部署

Kafka集群部署

1.1 服务器资源

1.1.1 安装JDK(所有设备)

1.1.2 配置ip和主机名映射(所有服务器)(可不做)

1.1.3 配置主机名(所有设备)(可不做)

1.2 在node1上安装、配置kafka

1.2.1 安装kafka

1.2.2 修改配置文件

1.2.2.1 修改zookeeper.properties

1.2.2.2 配置Zookeeper的id

1.2.2.3 修改server.properties

1.2.2.4 手动创建目录

1.2.3 创建kafka用户

1.4 配置其他服务器

1.5 配置环境变量(所有设备)(可不做)

1.6 启动zookeeper和Kafka(所有设备)

1.6.1 启动zookeeper

1.6.2 启动kafka

1.7 测试

1.7.1 简单测试

1.7.2 生产消费测试

1.8 扩展(配置systemctl)

1.8.1 制作kafka.service

1.8.2 加入开机自启服务

相关文章

赞助商

阅读排行