首页 > 其他分享 >6.ClickHouse系列之配置分片集群

6.ClickHouse系列之配置分片集群

时间:2022-10-21 21:48:53浏览次数:84  
标签:xml server zoo1 2181 集群 ClickHouse 分片 config clickhouse

副本集对数据进行完整备份,数据高可用,对于分片集群来说,不管是ES还是ClickHouse是为了解决数据横向扩展的问题,ClickHouse在实际应用中一般配置副本集就好了

1. 编写clickhouse-shard.yml文件

具体代码已上传至gitee,可直接克隆使用

# 副本集部署示例
version: '3'
services:
  zoo1:
    image: zookeeper
    restart: always
    hostname: zoo1
    ports:
      - 2181:2181
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
    networks:
      - ckNet

  zoo2:
    image: zookeeper
    restart: always
    hostname: zoo2
    ports:
      - 2182:2181
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
    networks:
      - ckNet
  zoo3:
    image: zookeeper
    restart: always
    hostname: zoo3
    ports:
      - 2183:2181
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
    networks:
      - ckNet
  ck1_1:
    image: clickhouse/clickhouse-server
    container_name: ck1_1
    ulimits:
      nofile:
        soft: "262144"
        hard: "262144"
    volumes:
      - ./1_1/config.d:/etc/clickhouse-server/config.d
      - ./config.xml:/etc/clickhouse-server/config.xml
    ports:
      - 18123:8123 # http接口用
      - 19000:9000 # 本地客户端用
    depends_on:
      - zoo1
      - zoo2
      - zoo3
    networks:
      - ckNet
  ck1_2:
    image: clickhouse/clickhouse-server
    container_name: ck1_2
    ulimits:
      nofile:
        soft: "262144"
        hard: "262144"
    volumes:
      - ./1_2/config.d:/etc/clickhouse-server/config.d
      - ./config.xml:/etc/clickhouse-server/config.xml
    ports:
      - 18124:8123 # http接口用
      - 19001:9000 # 本地客户端用
    depends_on:
      - zoo1
      - zoo2
      - zoo3
    networks:
      - ckNet
  ck2_1:
    image: clickhouse/clickhouse-server
    container_name: ck2_1
    ulimits:
      nofile:
        soft: "262144"
        hard: "262144"
    volumes:
      - ./2_1/config.d:/etc/clickhouse-server/config.d
      - ./config.xml:/etc/clickhouse-server/config.xml
    ports:
      - 18125:8123 # http接口用
      - 19002:9000 # 本地客户端用
    depends_on:
      - zoo1
      - zoo2
      - zoo3
    networks:
      - ckNet
  ck2_2:
    image: clickhouse/clickhouse-server
    container_name: ck2_2
    ulimits:
      nofile:
        soft: "262144"
        hard: "262144"
    volumes:
      - ./2_2/config.d:/etc/clickhouse-server/config.d
      - ./config.xml:/etc/clickhouse-server/config.xml
    ports:
      - 18126:8123 # http接口用
      - 19003:9000 # 本地客户端用
    depends_on:
      - zoo1
      - zoo2
      - zoo3
    networks:
      - ckNet


networks:
  ckNet:
    driver: bridge

修改config.xml导入的文件名称为metrika_shard

 <include_from>/etc/clickhouse-server/config.d/metrika_shard.xml</include_from>

2. 分片副本具体配置

我们配置了了2个分片,每个分片1个副本,目录如下

1

每个目录下metrika_shard.xml配置如下

<?xml version="1.0" encoding="utf-8" ?>
<yandex>
    <remote_servers>
        <shenjian_cluster>
            <!-- 分片1 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <!-- 第一个副本 -->
                    <host>ck1_1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <!-- 第二个副本,生产环境中不同副本分布在不同机器 -->
                    <host>ck1_2</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!-- 分片2 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>ck2_1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>ck2_2</host>
                    <port>9000</port>
                </replica>
            </shard>
        </shenjian_cluster>
    </remote_servers>
    <zookeeper-servers>
        <node index="1">
            <host>zoo1</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>zoo2</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>zoo3</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    <!-- 分片副本标识 -->
    <macros>
        <shard>01</shard>
        <replica>rep-01-01</replica>
    </macros>
</yandex>

对于其他三个,只需要修改shard replica标识即可

<macros>
        <shard>01</shard>
        <replica>rep-01-02</replica>
</macros>
<macros>
        <shard>02</shard>
        <replica>rep-02-01</replica>
</macros>
<macros>
        <shard>02</shard>
        <replica>rep-02-02</replica>
</macros>

OK,至此为止,可以docker-compose -f clickhouse-shard.yml up -d启动了

3. 创建表

请将shenjian_cluster改为自己集群的名称,也就是metrika_shard.xml中remote_servers子标签,shard与replica无需改动,clickhouse会自动查找匹配的配置,为此集群下所有节点创建该表

CREATE TABLE house ON CLUSTER shenjian_cluster
(
    id String,
    city String,
    region String,
    name String,
    price Float32,
    publish_date DateTime
) ENGINE=ReplicatedMergeTree('/clickhouse/table/{shard}/house', '{replica}') PARTITION BY toYYYYMMDD(publish_date) PRIMARY KEY(id) ORDER BY (id, city, region, name)

创建后,可以看到所有节点【127.0.0.1:18123 127.0.0.1:18124 127.0.0.1:18125 127.0.0.1:18126】都存在了该表,正确

4. 创建distribute表

CREATE TABLE distribute_house ON CLUSTER shenjian_cluster
(
    id String,
    city String,
    region String,
    name String,
    price Float32,
    publish_date DateTime
) ENGINE=Distributed(shenjian_cluster, default, house, hiveHash(publish_date))
  • house: 表名
  • hiveHash(city): 分片键

5. 新增数据验证分片集群

INSERT INTO distribute_house(city, name, price, publish_date) VALUES ('上海', '场中小区', 59680, '2022-08-01'),
                                            ('上海', '汤臣一品', 259680, '2022-08-01'),
                                            ('临沂', '滨河名邸', 10000, '2021-08-01'),
                                            ('临沂', '中泰广场', 15000, '2020-08-01');

2

OK,分片集群成功,快给自己鼓掌吧!!!!

欢迎关注公众号算法小生沈健的技术博客

标签:xml,server,zoo1,2181,集群,ClickHouse,分片,config,clickhouse
From: https://www.cnblogs.com/shenjian-online/p/16814840.html

相关文章

  • 7.ClickHouse系列之查询优化(一)
    1.Explain查询计划查看//查看执行计划,默认值EXPLAINPLANSELECTarrayJoin([6,6,7])//AST语法树查看EXPLAINASTSELECTnumbersFROMsystem.numbersLIMIT10;/......
  • 8.ClickHouse系列之查询优化(二)
    本文介绍多表关联查询优化方式1.用IN代替JOIN当多表查询时,查询的数据仅从一张表出时,可考虑用IN操作而不是JOINSELECTa.*FROMhits_v1aWHEREa.CounterIDin(SELEC......
  • 1.ClickHouse系列之Docker本地部署
    本文介绍docker-compose方式部署clickhouse数据库编写docker-compose.yml文件:version:'3'services:elasticsearch:image:clickhouse/clickhouse-server......
  • 2.ClickHouse系列之特点介绍
    1.列式存储采用列式存储时,数据在磁盘上的组织结构为:123张三李四王五182025好处:对于列的聚合、计数、求和等统计操作由于列式存储由于列数据类型相同,更容易......
  • 3.ClickHouse系列之SQL操作
    首先我们建表,表引擎我们后续文章在详细介绍,我们首先了解下基本SQL语法CREATEDATABASEstudy;CREATETABLEstudy.customer(idUInt8,cityString,name......
  • 4.ClickHouse系列之数据类型与表引擎介绍
    上篇文章已经创建过表及熟悉了基本语法,本文介绍CK的数据类型以及表引擎的一些分类与作用1.数据类型类型整型Int8Int16Int32Int64浮点型Float32Flo......
  • ClickHouse(二)优化
    ClickHouse优化执行计划    AST(语法树)、SYNTAX(优化后的SQL语句)、PIPELINE(查看PIPELINE计划,可看线程数)建表优化    数据类型优化:1.限定好数据类型  2.......
  • 云原生分布式 PostgreSQL+Citus 集群在 Sentry 后端的实践
    优化一个分布式系统的吞吐能力,除了应用本身代码外,很大程度上是在优化它所依赖的中间件集群处理能力。如:kafka/redis/rabbitmq/postgresql/分布式存储(CephFS,JuiceFS,Cur......
  • k8s集群初始化
    集群初始化官方手册https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-init/#概要一、命令kubeadminit二、参数说明--apiserver-advertis......
  • 如何搭建自己的CI/CD平台:Gitlab+Jenkins+Docker+Harbor+K8s集群搭建CICD平台(持续集
    如何搭建自己的CI/CD平台:Gitlab+Jenkins+Docker+Harbor+K8s集群搭建CICD平台(持续集成部署Hexo博客Demo)写在前面聊聊CICD的环境搭建以及一个基于Hexo的博客系统在C......