ZooKeeper Installation and Configuration
How to configure Zookeeper to work best with ClickHousePrepare and Start ZooKeeper
Preparation
Before beginning, determine whether ZooKeeper will run in standalone or replicated mode.
- Standalone mode: One zookeeper server to service the entire ClickHouse cluster. Best for evaluation, development, and testing.
- Should never be used for production environments.
- Replicated mode: Multiple zookeeper servers in a group called an ensemble. Replicated mode is recommended for production systems.
- A minimum of 3 zookeeper servers are required.
- 3 servers is the optimal setup that functions even with heavily loaded systems with proper tuning.
- 5 servers is less likely to lose quorum entirely, but also results in longer quorum acquisition times.
- Additional servers can be added, but should always be an odd number of servers.
Precautions
The following practices should be avoided:
- Never deploy even numbers of ZooKeeper servers in an ensemble.
- Do not install ZooKeeper on ClickHouse nodes.
- Do not share ZooKeeper with other applications like Kafka.
- Place the ZooKeeper
dataDir
andlogDir
on fast storage that will not be used for anything else.
Applications to Install
Install the following applications in your servers:
zookeeper
(3.4.9 or later)netcat
Configure ZooKeeper
-
/etc/zookeeper/conf/myid
The
myid
file consists of a single line containing only the text of that machine’s id. Somyid
of server 1 would contain the text “1” and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255. -
/etc/zookeeper/conf/zoo.cfg
Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with a series of lines of the form server.id=host:port:port
# specify all zookeeper servers # The first port is used by followers to connect to the leader # The second one is used for leader election server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
These lines must be the same on every ZooKeeper node
-
/etc/zookeeper/conf/zoo.cfg
This setting MUST be added on every ZooKeeper node:
# The time interval in hours for which the purge task has to be triggered. # Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0. autopurge.purgeInterval=1 autopurge.snapRetainCount=5
Install Zookeeper
Depending on your environment, follow the Apache Zookeeper Getting Started guide, or the Zookeeper Administrator's Guide.
Start ZooKeeper
Depending on your installation, start ZooKeeper with the following command:
sudo -u zookeeper /usr/share/zookeeper/bin/zkServer.sh
Verify ZooKeeper is Running
Use the following commands to verify ZooKeeper is available:
echo ruok | nc localhost 2181
echo mntr | nc localhost 2181
echo stat | nc localhost 2181
Check the following files and directories to verify ZooKeeper is running and making updates:
- Logs:
/var/log/zookeeper/zookeeper.log
- Snapshots:
/var/lib/zookeeper/version-2/
Connect to ZooKeeper
From the localhost, connect to ZooKeeper with the following command to verify access (replace the IP address with your Zookeeper server):
bin/zkCli.sh -server 127.0.0.1:2181
Tune ZooKeeper
The following optional settings can be used depending on your requirements.
Improve Node Communication Reliability
The following settings can be used to improve node communication reliability:
/etc/zookeeper/conf/zoo.cfg
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between sending a request and getting an acknowledgement
syncLimit=5
Reduce Snapshots
The following settings will create fewer snapshots which may reduce system requirements.
/etc/zookeeper/conf/zoo.cfg
# To avoid seeks ZooKeeper allocates space in the transaction log file in blocks of preAllocSize kilobytes.
# The default block size is 64M. One reason for changing the size of the blocks is to reduce the block size
# if snapshots are taken more often. (Also, see snapCount).
preAllocSize=65536
# ZooKeeper logs transactions to a transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default snapCount is 10,000.
snapCount=10000
Documentation
- ZooKeeper Getting Started Guide
- ClickHouse Zookeeper Recommendations
- Running ZooKeeper in Production
Configuring ClickHouse to use ZooKeeper
Once ZooKeeper has been installed and configured, ClickHouse can be modified to use ZooKeeper. After the following steps are completed, a restart of ClickHouse will be required.
To configure ClickHouse to use ZooKeeper, follow the steps shown below. The recommended settings are located on ClickHouse.tech zookeeper server settings.
-
Create a configuration file with the list of ZooKeeper nodes. Best practice is to put the file in
/etc/clickhouse-server/config.d/zookeeper.xml
.<yandex> <zookeeper> <node> <host>example1</host> <port>2181</port> </node> <node> <host>example2</host> <port>2181</port> </node> <session_timeout_ms>30000</session_timeout_ms> <operation_timeout_ms>10000</operation_timeout_ms> <!-- Optional. Chroot suffix. Should exist. --> <root>/path/to/zookeeper/node</root> <!-- Optional. ZooKeeper digest ACL string. --> <identity>user:password</identity> </zookeeper> </yandex>
-
Check the
distributed_ddl
parameter inconfig.xml
. This parameter can be defined in another configuration file, and can change the path to any value that you like. If you have several ClickHouse clusters using the same zookeeper,distributed_ddl
path should be unique for every ClickHouse cluster setup.<!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster. --> <!-- Works only if ZooKeeper is enabled. Comment it out if such functionality isn't required. --> <distributed_ddl> <!-- Path in ZooKeeper to queue with DDL queries --> <path>/clickhouse/task_queue/ddl</path> <!-- Settings from this profile will be used to execute DDL queries --> <!-- <profile>default</profile> --> </distributed_ddl>
-
Check
/etc/clickhouse-server/preprocessed/config.xml
. You should see your changes there. -
Restart ClickHouse. Check ClickHouse connection to ZooKeeper detailed in ZooKeeper Monitoring.
Converting Tables to Replicated Tables
Creating a replicated table
Replicated tables use a replicated table engine, for example ReplicatedMergeTree
. The following example shows how to create a simple replicated table.
This example assumes that you have defined appropriate macro values for cluster, shard, and replica in macros.xml
to enable cluster replication using zookeeper. For details consult the ClickHouse.tech Data Replication guide.
CREATE TABLE test ON CLUSTER '{cluster}'
(
timestamp DateTime,
contractid UInt32,
userid UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (contractid, toDate(timestamp), userid)
SAMPLE BY userid;
The ON CLUSTER
clause ensures the table will be created on the nodes of {cluster}
(a macro value). This example automatically creates a ZooKeeper path for each replica table that looks like the following:
/clickhouse/tables/{cluster}/{replica}/default/test
becomes:
/clickhouse/tables/c1/0/default/test
You can see ZooKeeper replication data for this node with the following query (updating the path based on your environment):
SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/tables/c1/0/default/test'
Removing a replicated table
To remove a replicated table, use DROP TABLE
as shown in the following example. The ON CLUSTER
clause ensures the table will be deleted on all nodes. Omit it to delete the table on only a single node.
DROP TABLE test ON CLUSTER '{cluster}';
As each table is deleted the node is removed from replication and the information for the replica is cleaned up. When no more replicas exist, all ZooKeeper data for the table will be cleared.
Cleaning up ZooKeeper data for replicated tables
- IMPORTANT NOTE: Cleaning up ZooKeeper data manually can corrupt replication if you make a mistake. Raise a support ticket and ask for help if you have any doubt concerning the procedure.
New ClickHouse versions now support SYSTEM DROP REPLICA which is an easier command.
For example:
SYSTEM DROP REPLICA 'replica_name' FROM ZKPATH '/path/to/table/in/zk';
ZooKeeper data for the table might not be cleared fully if there is an error when deleting the table, or the table becomes corrupted, or the replica is lost. You can clean up ZooKeeper data in this case manually using the ZooKeeper rmr command. Here is the procedure:
- Login to ZooKeeper server.
- Run
zkCli.sh
command to connect to the server. - Locate the path to be deleted, e.g.:
ls /clickhouse/tables/c1/0/default/test
- Remove the path recursively, e.g.,
rmr /clickhouse/tables/c1/0/default/test